PlanetScale's database branching uses a declarative schema approach, but we take it even further and treat all the changes in your branch as a single deployment. As much as possible, PlanetScale deploys your entire set of changes near-atomically, which means the production database schema remains stable throughout the deployment process and changes almost all at once when all changes are ready. In this post, we will discuss the benefits of atomic multi-change deployments and work through the technical challenges of making them possible.
Atomicity ensures an all-or-nothing change. A data transaction is an obvious example: you can change data in two tables, say insert to one and update the other, and enforce that either both changes happen or none do. You run both in a transaction and finalize the change with a
COMMIT. The database keeps a transaction counter along with change journaling. If there's a crash halfway through the transaction, its recovery process can reliably identify the transaction's incomplete and undo the partial changes. But what's trivial for data changes is not so trivial for schema changes.
PlanetScale uses MySQL under the hood through Vitess. With MySQL, it is not possible to transactionally and atomically make changes to multiple table schema definitions. If you want to
CREATE one table,
ALTER another, and
DROP a third, you must run these changes in some order. For this reason, we use the term "near-atomically." We also use this with PlanetScale's gated deployments, and we often use the term "gated" to indicate that all changes complete together.
We like to think of schema changes as deployments, similar to code deployments. Existing relational database systems have trained us to think that schema changes are necessarily dangerous, disruptive, irrevertible, and sequential.
Consider that a change to our code requires changing three different large tables:
- Adding and modifying a column on one.
- Adding a column and an index on another.
- Adding a new check constraint on the third.
The database system considers these three unrelated changes, but we know they are semantically related.
Schema changes on large tables may take a long time, sometimes hours or more, to complete. Let's assume each of our changes runs for 8 hours. Historically, we are accustomed to accepting that we must run one
ALTER TABLE after the other. Once we start our deployment, we expect
24 hours until it's complete. But during that time, the database is in a semantically inconsistent state. The deployment is partially done and partially queued up.
In an ideal world, we can wait out these
24 hours and call it a day (no pun intended). But in reality, we might find that our design was flawed, or perhaps there's an incident that takes priority, and we want to cancel the deployment. Has it been
10 hours? One of the changes will have been applied, the others are still pending. With traditional databases, you can't just cancel that completed schema change.
We're also used to the notion that a schema change is dangerous. Have we dropped the wrong column? Did you make a bad assumption about the data type? Or did you miss a constraint? The wrong schema change is notoriously known to have been the source of many production system outages. And at
8 hours each, it looks like our three-table deployment will be risky around all time zones.
Gated deployments offer a change of concept, where all your changes are staged for however long it takes for all of them to be ready. In our example above, we can assume at around
24 hours for all changes to become ready. At that point, we complete the deployment, applying all the changes in production all at once. And because it is impossible to ensure atomicity, the changes are applied a few seconds apart.
With this approach, there is only one "major event" to this deployment. Since gated deployments allow you to pick your preferred time to complete the changes, you can control the time of the "event." And, if there is a change of heart during the staging period or an incident that takes over priorities, the deployment may be canceled without impacting production. The friction point, where a schema may only be partially deployed, is reduced from days or hours to seconds.
Some schema change operations are immediate. For example, when creating a new table or modifying a view definition. Those changes have no data directly associated and are very fast to perform. Some
ALTER TABLE changes are also eligible for fast execution. And yet, many are not. If we wanted to deploy a
ALTER VIEW, and
ALTER TABLE, all changes as part of one branch and one deployment, we need to somehow be able to time them such that they all complete together. If we have multiple
ALTER TABLE changes on large tables, we need to find a way to not only time those to complete together but also somehow be able to run them all concurrently without putting too much load on the production database.
And on top of it, some of the changes in the migration might actually have dependencies. The user is given a free hand to change their branch, and when the time comes to deploy the branch's changes to production, we may find that one change assumes another already applies.
So, the task is to be able to run some changes concurrently, time long-running changes with immediate changes, and resolve any conflicts that imply ordering changes - all while running concurrently.
As described in How Online Schema Change tools work, PlanetScale uses an elaborate copy-and-swap algorithm to emulate an
ALTER TABLE operation. We create a new table in the likeness of the original table, we modify that table, we bring the new table in sync with the original table by both copying over the existing rows as well as applying the ongoing changes, and we finalize by swapping the two from under the hands of the application.
This emulation mechanism is what allows us some concurrency and control over the cut-over timing. As we complete copying over a table's existing data set, we can continue to watch the ongoing changes to the table, technically indefinitely, or until we decide that it's time to cut over. We impose a brief lock to finalize the last few changes to make the swap. This allows us to run multiple concurrent operations on different tables and keep pushing the final cut-over until we know all operations are ready to complete.
And we don't have to overwhelm the production database during that time. We may alternate between the copying phases - the heavy-lifting part of the emulation - of the different tables and only parallelize the tailing of the ongoing changes.
When we stage a deployment request, we begin by running - but not completing - all long-running changes. When we find, possibly hours later, that all long-running changes are ready to complete, we then introduce the immediate changes - like
ALTER VIEW, and similar statements. We can then apply the final cut-over for all long-running changes and the immediate changes, near-atomically, a few seconds apart.
Alas, what happens when one change depends on another?
Consider these simplified two changes:
ALTER TABLE t ADD COLUMN info VARCHAR(128) NOT NULL DEFAULT '' AFTER id;ALTER VIEW v AS SELECT id, info FROM t;
In production, the column
info does not exist. The view
v in production does not read from this column. To deploy these two changes, we absolutely have to first apply the change to
t, and only then the change to
v. This seems pretty straightforward: complete the migration on
t, and immediately apply the change to
However, how do we go about the reverse?
ALTER TABLE t DROP COLUMN info;ALTER VIEW v AS SELECT id FROM t;
If we first make the change to
v becomes invalid. At first sight, it may appear that we should first apply the change to
v, followed by the change to
t. However, the table
t may be large enough that it takes hours to migrate. If we want to apply the changes together, then the way to go is:
- Begin the change on
- Wait until the change is ready to complete.
- Issue the immediate change on
- Follow by completing (cutting-over) the change on
The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request.
PlanetScale continues to utilize Vitess's schemadiff library, which can determine, where possible, a valid sequence (read: ordering) of changes given two schemas. When
schemadiff reads a schema, it maps and validates any dependency between entities. For example, it can validate that table and columns referenced by some view exist or that there are no cyclic view definitions (
v1 reads from
v2, which reads from
schemadiff compares two schemas and generates the diff statements, it also analyzes the dependencies between those statements. If any two diff statements affect entities with a dependency relationship in the schema(s), then
schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then
schemadiff needs to resolve the ordering of all three. All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs.
If you take a sample diff from one equivalence class and then some sample diff from a different equivalence class, you know there's absolutely no dependency between the two. They can be executed in any order. However, any two diffs within the same equivalence class can have a dependency and should be treated as if they do, although in some cases, the two could be executed in any different order. To that effect, for each equivalence class,
schemadiff finds a permutation of the diffs such that if executed in order, the validity of the entire schema is preserved. It's worth reiterating that changes to the underlying database can only be applied sequentially. Thus, we must validate that the schema remains valid throughout the in-order execution.
schemadiff achieves this by running in-memory schema migration and validation at every step.
a. Given a set of diffs,
b. Group them into equivalence classes, where changes to elements that have dependencies are grouped together.
c. Ordering of equivalence classes is arbitrary.
d. Within an equivalence class there must be a valid ordering.
PlanetScale then takes that valid ordering of diffs as the blueprint for a deployment where it runs the migrations concurrently via
Vitess, staging the changes until it determines that all deployments are ready to complete. At this time, it seals the change near-atomically.
When all migrations are complete, PlanetScale then stages tentative reverts for all migrations. The user has a 30-minute window to undo those schema changes without losing data accumulated. If the user does choose to revert (say, some parts of the app appear to require still the old schema or if performance tanks due to wrong indexing), then those reverts are likewise applied near-atomically. Notably, the reverts are finalized in reverse ordering to the original deployment. There is no need for computation here: we rely on the fact that the original deployment was found to have a step-by-step valid ordering. Undoing those changes in reverse order mathematically maintains that validity.
Resources are not infinite, and only so many changes can run concurrently. Altering a hundred tables in one deployment request is not feasible and possibly not the best utilization of database branching. It is possible to go too far with a branch so that the changes are logically impossible to deploy (or rather, so complex that it is not possible to determine a reliably safe path). Like code, schema changes should be made and deployed with measures in place.
Treating a deployment request, a group of schema changes, as a unit that should be deployed all or none is a difficult task that requires complex validation, scheduling, and execution. But the effort pays off: we know that the deployment is cancellable up to the very last moment and without making any impact on production. We only have one potential point in time that requires our attention. We actually control that point in time. We don't need to tell the database how to go about the changes; we only need to tell it what we would like to have.