Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!
In this installment, our Percona Live featured talk with Sugu Sougoumarane, Infrastructure & Storage Engineer at YouTube is about Vitess: The Complete Story. I had a chance to speak with Sugu and learn a bit more about YouTube and Vitess:
Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.
Sugu: My involvement with databases goes back to Informix in the 90s. This was during the 4GL and client-server days. I was part of the development team for a product called NewEra.
I later joined PayPal, where we used Oracle and eventually scaled it to the biggest machine money could buy. I have to say that I’m still a fan of the mighty hash join. During my time there, I wrote the system that balanced the books, which helped me gain some unique perspectives on consistency. Word on the street is that the tool is still in use.
These experiences at PayPal influenced the founders of YouTube to try a different approach: scaling with commodity hardware. When I joined YouTube, the only MySQL database we had was just beginning to run out of steam, and we boldly executed the first resharding in our lives. It took an entire night of master downtime, but we survived. These experiences eventually led to the birth of Vitess.
Percona: Your talk is going to be on “Vitess: The Complete Story.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?
Sugu: This was around 2010. YouTube was growing, not only organically, but also internally. There were more engineers writing code that could potentially impair the database, and our tolerance for downtime was also decreasing. It was obvious that this combination was not sustainable. My colleague (Mike Solomon) and I agreed that we had to come up with something that would leap ahead of the curve instead of just fighting fires. When we finally built the initial feature list, it was obvious that we were addressing problems that are common to all growing organizations.
This led us to make the decision to develop this project as open source, which had a serendipitous payback: every feature that YouTube needed had to be implemented in a generic fashion. App-specific shortcuts were generally not allowed. We still develop every feature in open source first, which we would then import to make it work for YouTube.
Aside from our architectural and design philosophy, our collaboration with Kubernetes over the last two years means anyone can now run Vitess the way YouTube does: in a dynamically-scaled container cluster. We’ve had engineers dedicated to deployment and manageability on a public cloud, making the platform ready for general consumption.
Percona: Why move to a cloud-based storage solution anyway? What are the advantages and disadvantages?
Sugu: In general, a big advantage of cloud solutions is easy horizontal scalability – tuning capacity by simply dumping more commodity servers in the mix. For storage engines, the problem is that application complexity and operational overhead tend to scale up along with the number of database instances. A cloud-native storage solution like Vitess hides the complexity of horizontal scalability from both app developers and database operators. Thousands of servers can look like one to both dev and ops. With Kubernetes, Vitess even becomes agnostic to the underlying choice of cloud platform, providing cloud flexibility with no vendor lock-in.
Percona: What are the roadblocks cloud data becoming the default? What are the issues about cloud data storage that keep you up at night?
Sugu: Cloud technologies are beginning to coalesce around ideas like immutable infrastructure and ephemeral, dynamically-scheduled workloads. Instead of changing a server, you dynamically request a new one, and the old one disappears. These ideas work great for stateless app servers but represent unique challenges for storage engines. It turns out that many of these challenges are ones we faced at YouTube as we moved Vitess from private data centers into Google’s global container cluster. So we know cloud-native data storage works at scale, but now we have to prove that it works just as well on public cloud.
Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?
Sugu: I feel like I still don’t know MySQL well enough. I’m hoping to learn more about its internals and new features. I’m also looking forward to learning more about today’s data challenges that companies are facing, and hear about the creative ways people are solving them.
The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.
The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.
This week, we kick off our new weekly blog updates — bringing you the best of Vitess questions and topics on our Slack discussions. The goal is to show the most interesting topics and requests so those of you just getting started can see highlights of what has been covered.
Since this is our first ever digest, we’re going to go back in time and publish a little more than what happened last week.
Large result setsAlejandro [Jul 2nd at 9:54 AM]
Good morning, we are trying to move away from interacting with Vitess through gRPC and instead using the MySQL binary protocol and I was just wondering if anyone here could let me know if Vitess supports unbuffered/streaming queries (returning more than 10K rows) over the MySQL binary protocol, or if that is just a gRPC feature? Thanks in advance. (edited) sougou [29 days ago]
@Alejandro set workload='olap' should do it (for mysql protocol)
Naming shardsDeepak [9:36 AM]
Hello everyone, I have a small doubt can someone please help me with i…
With Vitess introducing sharding and allowing you to create cross-shard indexes, distributed transactions become unavoidable for certain workloads. Currently, Vitess only supports best-effort distributed transactions. So, it’s possible that a distributed commit only completes partially, leaving data in an inconsistent state.
At this point, 2 Phase Commit (2PC) is the only known protocol that allows you to give atomic guarantees for distributed transactions. For this protocol to work, a database must be able to support the ‘Prepare’ contract. However, not all databases provide such support. Also, some of the engines that do support it either do it incorrectly or inefficiently. Specifically, the pre-5.7 MySQL XA protocol works incorrectly for replication, and is therefore not usable.
The question was asked: Is it possible to build a Prepare protocol on top of a database that does not support it? The answer is: Yes, for an engine like MySQL. The explanation follows. 2PC in very few wordsIf …
Vitess supports a variety of predefined sharding algorithms that can suit different needs. This is achieved by associating a Vindex with your main sharding column. A Vindex essentially provides a mapping function that converts your column value to a keyspace_id. This keyspace_id is then used to decide the target shard. A full description of VSchema and Vindexes can be found here. However, such predefined vindexes will work only if you intend to shard your system using Vitess. What if you're already sharded? Would it be possible to make Vitess accommodate your sharding scheme? This blog intends to cover such a use case. Vitess is indeed capable of accommodating any sharding scheme because of its pluggable Vindex API. In fact, all the predefined vindexes of Vitess are plug-ins themselves. In order for Vitess to accommodate your sharding scheme, all you have to do is define a Vindex that performs such a mapping.
Use Case The following example is inspired from my conversations with Simon …