Custom Sharding With Vitess

Vitess supports a variety of predefined sharding algorithms that can suit different needs. This is achieved by associating a Vindex with your main sharding column. A Vindex essentially provides a mapping function that converts your column value to a keyspace_id. This keyspace_id is then used to decide the target shard. A full description of VSchema and Vindexes can be found here. However, such predefined vindexes will work only if you intend to shard your system using Vitess. What if you're already sharded? Would it be possible to make Vitess accommodate your sharding scheme? This blog intends to cover such a use case.
Vitess is indeed capable of accommodating any sharding scheme because of its pluggable Vindex API. In fact, all the predefined vindexes of Vitess are plug-ins themselves. In order for Vitess to accommodate your sharding scheme, all you have to do is define a Vindex that performs such a mapping.

Use Case
The following example is inspired from my conversations with Simon …

Vitess releases version 2.1

The Vitess project is proud to announce the release of version 2.1. This version comes packed with new features that improve usability, availability and resilience of the overall system.
The release coincides with the Percona Live 2017 Conference, where project co-founder Sugu Sougoumarane will give the talk "Vitess beyond YouTube". He is joined by Robert Navarro from Stitch Labs  who is going to describe how Stitch Labs uses Vitess in production..

Version 2.0 introduced a sharding agnostic API to the world. This allowed clients to connect to Vitess and send queries as if it was a single database engine. However, questions remained about the atomicity of transactions that spanned multiple shards or keyspaces. With 2.1, you can now request such transactions go through the Two-Phase Commit (2PC) protocol that gives you all-or-none cross-database commits.

Another noteworthy new feature is the support for the native MySQL protocol. This allows users to trivially repoint their appli…

Distributed Transactions in Vitess

With Vitess introducing sharding and allowing you to create cross-shard indexes, distributed transactions become unavoidable for certain workloads. Currently, Vitess only supports best-effort distributed transactions. So, it’s possible that a distributed commit only completes partially, leaving data in an inconsistent state.

At this point, 2 Phase Commit (2PC) is the only known protocol that allows you to give atomic guarantees for distributed transactions. For this protocol to work, a database must be able to support the ‘Prepare’ contract. However, not all databases provide such support. Also, some of the engines that do support it either do it incorrectly or inefficiently. Specifically, the pre-5.7 MySQL XA protocol works incorrectly for replication, and is therefore not usable.

The question was asked: Is it possible to build a Prepare protocol on top of a database that does not support it? The answer is: Yes, for an engine like MySQL. The explanation follows.
2PC in very few wordsIf …

Vitess V2: Now with more V3

Starting with Vitess v2.0.0-beta.2, the VTGate V3 API can route complex single-shard queries (containing joins, subqueries, aggregation, sorting, and any combination thereof) as well as perform cross-shard joins. That means you no longer need to tell VTGate the keyspace ID that a query targets, as you did with the VTGate V2 API.

The fact that keyspace IDs are now hidden from the application has enabled drop-in Vitess libraries for standard database interfaces like JDBC (written by Flipkart), PDO (written by Pixel Federation), PEP 249, and database/sql. We've also made it possible to do resharding without having to add a keyspace ID column to your tables, which means no more schema changes and column back-fills when migrating existing databases to Vitess.

To show off these new features, we recently gave a talk at Percona Live 2016 (no video unfortunately, but the slides are posted) in which we did a live demo of resharding an app that's completely unaware of sharding. The Shard…

Percona Live featured talk with Sugu Sougoumarane – Vitess: The Complete Story

Cross-posted from Percona Blog.

Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus! In this installment, our Percona Live featured talk with Sugu Sougoumarane, Infrastructure & Storage Engineer at YouTube is about Vitess: The Complete Story. I had a chance to speak with Sugu and learn a bit more about YouTube and Vitess:

Vitess 2.0 is now beta!

That means we've accomplished all our planned overhauls of client APIs and backward-incompatible protocol changes. See the release notes for what's new.

We're now working closely with several users who are evaluating Vitess and providing feedback on the use cases that are important for their particular applications and production environments. If you're at the same stage, we welcome you to join the discussion by posting on the mailing list.

We're also trying out Slack for more conversational topics. We don't have an automatic invite system in place, so please email to request an invite if you're interested in joining the channel.

Lastly, we're starting our own blog (seeded with our previous guest posts on other blogs). This will be a place for our engineers to go more in-depth into various parts of Vitess.

Thanks, and happy scaling everyone!

- Anthony Yeh, Software Engineer @ YouTube

Cloud Native MySQL Sharding with Vitess and Kubernetes

Cross-posted on Google Cloud Platform Blog.

Cloud native technologies like Kubernetes help you compose scalable services out of a sea of small logical units. In our last post, we introduced Vitess (an open-source project that powers YouTube's main database) as a way of turning MySQL into a scalable Kubernetes application. Our goal was to make scaling your persistent datastore in Kubernetes as simple as scaling stateless app servers - just run a single command to launch more pods. We've made a lot of progress since then (pushing over 2,500 new commits) and we're nearing the first stable version of the new, cloud native Vitess.
Vitess 2.0 In preparation for the stable release, we've begun to publish alpha builds of Vitess v2.0.0. Some highlights of what's new since our earlier post include:
Using the final Kubernetes 1.0 API. Official Vitess client libraries in Java, Python, PHP, and Go. Java and Go clients use the new HTTP/2-based gRPC framework. Can now run on top of MySQ…