Documentation

Sharding schemes

This document explains the basic concept of sharding schemes as used in the PlanetScale CNDb. PlanetScale is built on Vitess, where sharding schemes are called VSchemas; these two terms are equivalent. To learn more about VSchemas, see the open source Vitess documentation for VSchemas.

What is a sharding scheme?

In order to scale your database, PlanetScale can distribute your database tables into shards. If you want a sharded database, you need to configure a sharding scheme. However, if you want replicas of your database tables, you do not need a sharding scheme. Your database can also use both sharding and replication. An unsharded database does not require a sharding scheme. Your application does not need to be aware of the sharding scheme.

PlanetScale uses keyspaces to divide data into shards: each shard is assigned a range within the keyspace. Vitess uses Vindexes to map column values onto keyspaces. The sharding scheme relates tables, shards, keyspaces, and Vindexes. Your PlanetScale database uses all of this information to treat the different shards as one database.

What is the format for a sharding scheme?

Sharding schemes use JSON format. Each sharding scheme contains at least one key-value pair indicating whether or not the database is sharded; it also contains at least one JSON object, called tables, which itself contains one JSON object for each sharded table. Each of these table objects can contain an object called column_vindexes, which contains the name of any column(s) in the table that map to a Vindex, along with the name of the Vindex. Finally, it contains a JSON object called vindexes, which contains one JSON object for each Vindex on the database; each of these Vindex objects contains a key-value pair called type, which specifies the Vindex type.

Example sharding scheme

Below is an example sharding scheme. This sharding scheme indicates that the database is sharded, that it has one sharded table called user, whose user_id column maps to the Vindex hash, and that this Vindex is of type hash.

    {
        "sharded": true,
        "vindexes": {
          "hash": {
            "type": "hash"
          }
        },
        "tables": {
          "user": {
            "column_vindexes": [
              {
                "column": "user_id",
                "name": "hash"
              }
            ]
          }
        }
      }

What does the sharding scheme do?

The sharding scheme contains, among other things, the Vindex for a sharded table. The Vindex tells Vitess where to find the shard that contains a particular row for a sharded table. Every sharding scheme must have at least one Vindex, called the Primary Vindex, defined. The Primary Vindex is unique: given an input value, it produces a single keyspace ID, or value in the keyspace used to shard the table. The Primary Vindex is typically a functional Vindex: Vitess computes the keyspace ID as needed from a column in the sharded table.

Example

In the example sharding scheme above, the user table object contains an object called column_vindexes, which itself contains two key-value pairs: column and name:

    "column_vindexes": [
        {
        "column": "user_id",
        "name": "hash"
        }
    ]

The column_vindexes object specifies that the user_id column maps to a Vindex called hash. This means that, at query execution time, your database will use the WHERE clause of the SQL query to identify the range of values of user_id it needs to return; then, it will use a hash function to compute the keyspace IDs for the desired rows.