This is the first of several lessons on Vitess. Vitess allows you to bring your MySQL database's capabilities to the next level, providing built-in functionality for vertical and horizontal sharding, automatic failover, replication management, and more. Over the past decade, Vitess has been used to successfully scale-up database infrastructure at hundreds of large organizations, including GitHub, Slack, Pinterest, Etsy, Block, and many more. In this lesson, we'll go through the typical process of scaling a MySQL database, including how Vitess plays a role in this process. Throughout the course, you'll see how to get up-and-running with Vitess, how to use it to do vertical and horizontal sharding, and finally how all PlanetScale databases are powered by Vitess under the hood. Time to dive in!
Typically, the database backing a software project is chosen early on in the development process. Once this decision is made, it can be difficult to change at a later time due to compatibility and institutional knowledge, so making the right choice early on is key. MySQL is one of the world's most popular SQL databases, and has a long history of being reliable, performant, and, when used with Vitess, extremely scalable.
For an early-stage software project, a single MySQL server running may be sufficient to handle all of the traffic and storage. Ideally, you'd have backups taken at regular intervals and stored securely.
As demand on a database grows, one option to scale up with this demand is to scale vertically. This means increasing the capabilities of the MySQL server — upgrading CPU capacity, RAM, and increasing disk size. This works well in many cases, and is also relatively easy to do given modern cloud computing infrastructure provided by organizations like Amazon, Google, DigitalOcean, and others. Unless needed for specific reasons, many organizations both large and small to use the infrastructure provided by these organizations instead of managing their own hardware. This greatly simplifies their infrastructure and allows them to focus more on developing great products.
Eventually, even this technique of scaling will reach limitations or cause bottlenecks. A good next step to take is introducing read-only replicas set up in a primary-replica configuration. This not only increases capacity of your server (particularly read capacity) but also can improve database availability when used correctly. Database clusters with a primary and multiple replicas can be configured to automatically replace a primary with a replica if a primary goes down or has network latency issues. Getting this properly configured can be a painstaking process. However, Vitess provides features to configure a MySQL cluster to handle these types of situations automatically. Therefore, Vitess is a great choice even if your scale does not require data sharding.
Another option that can be introduced is a query cache. Adding one or several dedicated caching layers to this architecture can provide nice speedups for some of your queries, though this comes at the cost of additional complexity, and having to concern yourself with TTL configuration, synchronization, and the like.
For some organizations, all of the aforementioned scaling techniques can be applied, and even still, more scaling is required. For example, when the size of a database starts to exceed a few terabytes, it can be difficult for a single primary server to handle all write load with desired levels of efficiency.
In these scenarios, the next step to scaling is to shard your database. Relational databases can be sharded in two ways:
- Vertical sharding: Splitting up your database by table, spreading them out across multiple separate MySQL instances.
- Horizontal sharding: Splitting up your data on a per-row database, and using a function to determine which MySQL instance each row should be stored on.
This is where Vitess truly shines, as it has built-in support for configuring both vertical and horizontal sharding.
Both of these techniques can be accomplished by introducing custom sharding logic into your application code, but this adds excessive complexity to an already complex piece of software. A better solution for accomplishing this is to use Vitess. Vitess acts as a middle-man between your application code and your instances of MySQL. It provides mechanisms for you to manage and shard the data in your database, all while maintaining high availability and excellent performance.
You can learn more about Vitess on their website, and by continuing with the follow-up lessons in this series.