Posts

Distributed Transactions in Vitess

With Vitess introducing sharding and allowing you to create cross-shard indexes, distributed transactions become unavoidable for certain workloads. Currently, Vitess only supports best-effort distributed transactions. So, it’s possible that a distributed commit only completes partially, leaving data in an inconsistent state. At this point, 2 Phase Commit (2PC) is the only known protocol that allows you to give atomic guarantees for distributed transactions. For this protocol to work, a database must be able to support the ‘Prepare’ contract. However, not all databases provide such support. Also, some of the engines that do support it either do it incorrectly or inefficiently. Specifically, the pre-5.7 MySQL XA protocol works incorrectly for replication, and is therefore not usable. The question was asked: Is it possible to build a Prepare protocol on top of a database that does not support it? The answer is: Yes, for an engine like MySQL. The explanation follows. 2PC in very few

Vitess V2: Now with more V3

Starting with Vitess v2.0.0-beta.2 , the VTGate V3 API can route complex single-shard queries (containing joins, subqueries, aggregation, sorting, and any combination thereof) as well as perform cross-shard joins. That means you no longer need to tell VTGate the keyspace ID that a query targets, as you did with the VTGate V2 API. The fact that keyspace IDs are now hidden from the application has enabled drop-in Vitess libraries for standard database interfaces like  JDBC  (written by Flipkart ), PDO (written by Pixel Federation ), PEP 249 , and database/sql . We've also made it possible to do resharding without having to add a keyspace ID column to your tables, which means no more schema changes and column back-fills when migrating existing databases to Vitess. To show off these new features, we recently gave a talk at Percona Live 2016 (no video unfortunately, but  the slides are posted ) in which we did a live demo of resharding an app that's completely unaware of shar

Percona Live featured talk with Sugu Sougoumarane – Vitess: The Complete Story

Cross-posted from  Percona Blog . Welcome to the next installment of our talks with  Percona Live Data Performance Conference 2016  speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus! In this installment, our Percona Live featured talk with Sugu Sougoumarane, Infrastructure & Storage Engineer at  YouTube  is about  Vitess: The Complete Story . I had a chance to speak with Sugu and learn a bit more about YouTube and Vitess:

Vitess 2.0 is now beta!

That means we've accomplished all our planned overhauls of client APIs and backward-incompatible protocol changes. See the release notes for what's new. We're now working closely with several users who are evaluating Vitess and providing feedback on the use cases that are important for their particular applications and production environments. If you're at the same stage, we welcome you to join the discussion by posting on the mailing list . We're also trying out Slack for more conversational topics. We don't have an automatic invite system in place, so please email vitess@googlegroups.com to request an invite if you're interested in joining the channel. Lastly, we're starting our own blog (seeded with our previous guest posts on other blogs). This will be a place for our engineers to go more in-depth into various parts of Vitess. Thanks, and happy scaling everyone! - Anthony Yeh, Software Engineer @ YouTube

Cloud Native MySQL Sharding with Vitess and Kubernetes

Image
Cross-posted on  Google Cloud Platform Blog . Cloud native technologies like Kubernetes help you compose scalable services out of a sea of small logical units. In our last post , we introduced Vitess (an open-source project that powers YouTube's main database) as a way of turning MySQL into a scalable Kubernetes application. Our goal was to make scaling your persistent datastore in Kubernetes as simple as scaling stateless app servers - just run a single command to launch more pods . We've made a lot of progress since then (pushing over 2,500 new commits) and we're nearing the first stable version of the new, cloud native Vitess. Vitess 2.0 In preparation for the stable release, we've begun to publish alpha builds of Vitess v2.0.0 . Some highlights of what's new since our earlier post include: Using the final Kubernetes 1.0 API. Official Vitess client libraries in Java, Python, PHP, and Go. Java and Go clients use the new HTTP/2-based gRPC

Scaling MySQL in the cloud with Vitess and Kubernetes

Cross-posted on  Google Cloud Platform Blog . Your new website is growing exponentially. After a few rounds of high fives, you start scaling to meet this unexpected demand. While you can always add more front-end servers, eventually your database becomes a bottleneck, which leads you to . . . Add more replicas for better read throughput and data durability Introduce sharding to scale your write throughput and let your data set grow beyond a single machine Create separate replica pools for batch jobs and backups, to isolate them from live traffic Clone the whole deployment into multiple datacenters worldwide for disaster recovery and lower latency At YouTube, we went on that  journey  as we scaled our MySQL deployment, which today handles the metadata for billions of daily video views and  300 hours of new video uploads per minute . To do this, we developed the  Vitess   platform, which addresses scaling challenges while hiding the associated complexity from the application l