Sharding for unlimited scale
Sharding is a proven database architecture, allowing you to spread out your data across many servers. This facilitates increased failure isolation, faster backups, cluster flexibility, and unlimited scale.
Sharding is used by hundreds of large organizations around the world, managing data sets of terabyte to petabyte scale. Shopify, Figma, Uber, Slack, Cash App. These are just a few of the organizations that leverage a sharded architecture for their data stores.
Scaling issues? Get in touch for a demo of our sharding solution.
What are the benefits of sharding?
- Handle tables with trillions of rows
- A single table can span multiple servers
- Reshard as data continues to grow
Read about dealing with large tables
- Highly-parallelized backups
- Back up massive databases in hours, not days
- Use backups to quickly replace disrupted shard servers
Read about faster backups with sharding
- Distribute data across 10s, 100s, or 1000s of servers
- Handle millions of queries per second
- Presented as a single, logical database to application
Read how PlanetScale handles 1M QPS with sharding
- Avoid paying premiums for expensive IOPS classes
- IOPS and throughput demand spread across many servers
- Cost-effective I/O scaling
Read about increasing IOPS and throughput with sharding
Who is sharding for?
- You have reached or are nearing the largest available instance size. For example, if you are on a
m7a.24xlarge
orm7a.32xlarge
, it is likely time to shift to sharding. - Your monthly cloud bill incurs significant cost for extra IOPS and throughput. With sharding, your data and queries get spread out across many servers, reducing I/O bottlenecks.
- You are facing operational challenges such as long backups, frequent server outages, lock contention, etc. Sharding with PlanetScale provides solutions to many of these all too common scaling difficulties.
- You are planning for future growth and want to start with a platform that will provide scaling capabilities with minimal friction.
What is sharding?
PlanetScale builds its sharding solution on top of MySQL and Vitess. MySQL is the worlds most popular open-source relational database. Vitess is an open-source software layer that sits between a fleet of MySQL instances and your application servers. It manages query routing, connection pooling, automatic failover, backups, and sharding.
Sharding comes in two types: vertical and horizontal.
With vertical sharding, each individual table resides on a single server, but collectively the tables are spread out across many servers.
Perhaps you have a database with several hundred tables. Most of them are small, but two are large - 1TB each. Instead of housing everything on one server, we can use vertical sharding. The two large tables will get placed on their own dedicated servers, and the rest can remain on a shared server. This will still appear as a single, unified database from the application's perspective, since it connects to the database through a proxy layer.
Horizontal sharding is another way to spread tables out across servers. With horizontal sharding, we take the rows of an individual (large) table and spread them out across many servers. The proxy layer maintains metadata to keep track of which rows of a table live on which server, thus allowing it to effectively fulfill queries on this data.
Perhaps we have a database with many small tables, and one huge table that has 1 trillion rows and is 2TB. The large table could get spread across four shards, each managing 500GB of the data. The proxy layer will manage routing queries to the appropriate shard.
The sharding strategy is the technique used to distribute the data. A common sharding strategy is to use ID hashing as the shard key. With this technique, we choose one of the ID columns from the table. Each time we receive a new row, Vitess generates a hash of this ID. Each shard server is responsible for storing the rows for a range of hashes, and new row gets sent to the appropriate server.
A good choice of shard key can lead to excellent performance, but a poor choice can be detrimental to your database. PlanetScale enterprise support provides guidance on how to shard your database effectively.
Vitess or PlanetScale?
Vitess and MySQL are both widely-used open-source projects. In light of this, why would one choose to use PlanetScale over self-managed Vitess?
Vitess is a complex piece of software with many components. PlanetScale make spinning up, managing, and modifying clusters a breeze.
Enterprise support engineers guide you through the process of configuring and sharding your database. We carry the pager for your database.
Branching, deploy requests, insights, safe migrations, and more. Built on top of Vitess and MySQL for higher development velocity.
Easily deploy to AWS and GCP regions. Advanced edge routing and global replicas to better serve a distributed audience.
The history of Vitess and PlanetScale
Vitess originated from an engineering team at YouTube, who needed to scale their massive fleet of MySQL databases to support millions of simultaneous users. YouTube was an early adopter of the sharded database architecture.
Several years later, Vitess was donated to CNCF, joining the likes of other battle-harded tools like Kubernetes, Prometheus, and Argo.
PlanetScale was later founded by the creators of Vitess. Today, we employ the majority of the maintainers of the Vitess project. We actively develop new features to enhance reliability and meet the needs of our large customers. Let us help you scale your database with our team of Vitess experts.
Get started with sharding on PlanetScale
PlanetScale gives you a shard-native platform to provide the ultimate solution for scalability and availability. Get in touch now for a quote.