Vitess
Vitess is a battle-hardened open source technology invented at YouTube for deploying, scaling and managing large clusters of database instances. It now powers some of the largest sites on the Internet.
Why choose Vitess to scale your database?
Scalability with sharding
- Leverage vertical and horizontal sharding
- Distribute data across 10s, 100s, or 1000s of servers
- Presented as a single, logical database to application
High-availability
- Configure and manage replicated MySQL servers
- VTOrchestrator monitors health of all nodes in cluster
- Automatic failover for near-zero downtime
Connection pooling
- Connections are pooled via VTGate and VTTablet layers
- Break through connection limitations of vanilla MySQL
- Support massive application connection demands
Online schema migrations
- Integrates with pt-online-schema-change and gh-ost
- Start, monitor, and cancel migrations with confidence
- Command-line interface for migration interactions
MySQL compatibility
- Compatible with most MySQL features and syntax
- Migrate existing MySQL databases into Vitess
- Great for both small and large MySQL-backed applications
Kubernetes-native
- Developed at YouTube to run on Borg
- Ideally suited for running databases in Kubernetes
- PlanetScale maintains an Operator for Vitess
No database storage system other than Vitess truly fit all of Slack's needs
— Michael Demmer, Principal Engineer @Slack
How Slack leverages Vitess to keep up with its ever growing storage needs
Vitess has paved the way for us to unify all of our data storage infrastructure and our microservices infrastructure onto Kubernetes, and it's giving us a blueprint for what the rest of our data stores might look like on Kubernetes. That's been a great win for us as an infrastructure team.
— Alex Charis, Senior Software Engineer @HubSpot
How HubSpot manages their sharded cluster with Vitess
Vitess resources
Vitess video course
Watch PlanetScale's Learn Vitess course to learn more about it's architecture, how to shard, and how to use with Kubernetes all in less than 2 hours of watch time.
Official documentation
Check out the official Vites documentation to learn more about Vitess' architectural details, starter tutorials, design docs, and explanation of concepts.
Slack community
Hop on to the Vitess Slack workspace to ask questions, get updates, and contribute your own database expertise to the community.
Check out the source
Visit the Vitess GitHub page to see the source, browse open issues, and see what features are under active development.
Latest release
Take a look at the Vitess v21 release blog to see what new features have been introduced.
Follow on X
Follow Vitess on X to stay up to date with the latest announcements from the Vitess community.
How does Vitess work?
Vitess is one of the best ways to scale a MySQL database cluster. Vitess utilizes vanilla MySQL, but adds proxy, query routing, monitoring, and control plane components to make scaling MySQL feasible. However, Vitess itself is a complex piece of software, with many separate components that work together to keep things operating smoothly. Let's take a look at what each component contributes to the bigger picture.
When an application needs to connect to a Vitess cluster, it does not make connections directly to MySQL. Instead, connections are made to a VTGate. The VTGate layer acts as the entry point to the cluster, proxies connections, and handles routing of incoming queries to the appropriate MySQL instances. A Vitess cluster will typically have at least three VTGates, each in a different availability zone. Large clusters may have hundreds or thousands of gates. The VTGates will then forward queries on to the appropriate keyspace or shard.
A Vitess cluster can be unsharded or sharded. In an unsharded cluster, all tables for a given logical database (keyspace) live on the same server. Each keyspace typically has at least two replica servers, used for high-availability and for handling some of the read traffic. A sharded cluster is one where the tables of the database (keyspace) are spread across many servers. Each shard will have a primary and replicas.
Another critical component of Vitess is the VTTablet. When a VTGate needs to forward a query to MySQL, it sends it to a VTTablet. VTTablet mediates all communication between VTGates and MySQL. This is needed so that Vitess can pool connections to the MySQL instances, allowing connections to have less of a memory impact. It also monitors the health and resource usage of the underlying MySQL instances.
Replication from the primary to the replicas is handled by MySQL's built-in replication engine. If Vitess detects that the primary server goes down or is having connectivity issues, it can automatically and quickly fail over, reassigning the primary role to one of the replicas. This, combined with Vitess' ability to buffer queries at the VTGate layer, allows for failover with minimal impact to connected applications.
Vitess also has a sophisticated control plane used for augmenting the cluster and monitoring health.
Every Vitess cluster must have a running topology server used to store metadata about the clusters configuration. Vitess recommends using fellow CNCF project ETCD for this component.
vtorc is used to automatically detect faults and make repairs to components not running correctly. This component is critical for a highly-available cluster. vtctld is a server that is responsible for handling cluster changes and workflows. This, paired with the client vtctl gives you powerful command-line control over your cluster. VTAdmin is a web-based application that can be used to monitor your cluster.