Navigation

Sharding

Sharding is a proven database architecture, allowing you to spread out your data across many servers. This facilitates increased failure isolation, faster backups, cluster flexibility, and unlimited scale.

Sharding is used by hundreds of large organizations around the world, managing data sets of terabyte to petabyte scale. Shopify, Figma, Uber, Slack, Cash App. These are just a few of the organizations that leverage a sharded architecture for their data stores.

What are the benefits of sharding?

Massive tables

  • Handle tables with trillions of rows
  • A single table can span multiple servers
  • Reshard as data continues to grow

Read about dealing with large tables

Fast backups

  • Highly-parallelized backups
  • Back up massive databases in hours, not days
  • Use backups to quickly replace disrupted shard servers

Read about faster backups with sharding

Unlimited scale

  • Avoid paying premiums for expensive IOPS classes
  • IOPS and throughput demand spread across many servers
  • Cost-effective I/O scaling

Read about increasing IOPS and throughput with sharding

Optimize IOPS

  • Integrates with pt-online-schema-change and gh-ost
  • Start, monitor, and cancel migrations with confidence
  • Command-line interface for migration interactions

Learn about online schema migrations

Who is sharding for?

Sharding is for organizations with large databases who need scale and flexibility. If you are currently using a single-primary database such as Amazon RDS, you should consider sharding in the following scenarios:

Several customers have transitioned their database to a sharded architecture on PlanetScale with great success.

What is sharding?

PlanetScale builds its sharding solution on top of MySQL and Vitess. MySQL is the worlds most popular open-source relational database. Vitess is an open-source software layer that sits between a fleet of MySQL instances and your application servers. It manages query routing, connection pooling, automatic failover, backups, and sharding.

Sharding comes in two types: vertical and horizontal.

With vertical sharding, each individual table resides on a single server, but collectively the tables are spread out across many servers.

Perhaps you have a database with several hundred tables. Most of them are small, but two are large - 1TB each. Instead of housing everything on one server, we can use vertical sharding. The two large tables will get placed on their own dedicated servers, and the rest can remain on a shared server. This will still appear as a single, unified database from the application's perspective, since it connects to the database through a proxy layer.

Vertical sharding

Horizontal sharding is another way to spread tables out across servers. With horizontal sharding, we take the rows of an individual (large) table and spread them out across many servers. The proxy layer maintains metadata to keep track of which rows of a table live on which server, thus allowing it to effectively fulfill queries on this data.

Perhaps we have a database with many small tables, and one huge table that has 1 trillion rows and is 2TB. The large table could get spread across four shards, each managing 500GB of the data. The proxy layer will manage routing queries to the appropriate shard.

Horizontal sharding

The sharding strategy is the technique used to distribute the data. A common sharding strategy is to use ID hashing as the shard key. With this technique, we choose one of the ID columns from the table. Each time we receive a new row, Vitess generates a hash of this ID. Each shard server is responsible for storing the rows for a range of hashes, and new row gets sent to the appropriate server.

A good choice of shard key can lead to excellent performance, but a poor choice can be detrimental to your database. PlanetScale enterprise support provides guidance on how to shard your database effectively.

Sharding strategy

The history of Vitess and PlanetScale

Vitess originated from an engineering team at YouTube, who needed to scale their massive fleet of MySQL databases to support millions of simultaneous users. YouTube was an early adopter of the sharded database architecture.

Several years later, Vitess was donated to CNCF, joining the likes of other battle-hardened tools like Kubernetes, Prometheus, and Argo.

PlanetScale was later founded by the creators of Vitess. Today, we employ the majority of the maintainers of the Vitess project. We actively develop new features to enhance reliability and meet the needs of our large customers. Let us help you scale your database with our team of Vitess experts.