<?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
      <title>Blog — PlanetScale</title>
      <subtitle>Posts about the PlanetScale platform, MySQL, PostgreSQL, databases, and more.</subtitle>
      <link href="https://planetscale.com" />
      <link rel="alternate" type="text/html" hreflang="en" href="https://planetscale.com/blog" />
      <link rel="self" type="application/atom+xml" href="https://planetscale.com/blog/feed.atom" />
      <id>https://planetscale.com/blog/feed.atom</id>
      <updated>2026-05-14T00:00:00.000Z</updated>

  
      <entry>
        <title>Egress problems and where to find them</title>
        <link href="https://planetscale.com/blog/database-egress" />
        <id>https://planetscale.com/blog/database-egress</id>
        <published>2026-05-14T00:00:00.000Z</published>
        <updated>2026-05-14T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Name something in recent history that got better and cheaper (other than the TVs at the entrance of Costco). I'll wait.
Better performance and lower costs rarely come together, but optimizing your queries to reduce egress gives you both.
So once you hit scale, or ideally before scale bites you, improving the efficiency of your queries by making the responses smaller and their frequency lower can pull off a rare double: make your application faster and cheaper.
Definitions
Egress: Data transferred out from your database over the public internet. Most cloud providers bill for this, so it's something we want to minimize.
Ingress: Data transferred into your database over the public internet. Most cloud providers either do not bill for this, or do so only in specific scenarios.
PlanetScale includes 100GB of egress on High Availability (HA) plans. Non-HA $5/month Postgres includes 10GB of egress. Usage is metered beyond those allowances, so it's worth knowing about and minimizing where possible.
This post focuses largely on Postgres, but the general principles apply to all databases across all the major cloud providers.
Common culprits
If your egress numbers are approaching the included quota, or exceeding it by more than you’d like, your problems likely stem from two things: you're either fetching too much, too often, or both.
Consider the case of a content-heavy application. The database is full of documents made of rich text and block content. That content is stored in a JSONB column using the Portable Text specification.CREATE TABLE posts (
    id          integer                  NOT NULL DEFAULT nextval('posts_id_seq'::regclass),
    title       text                     NOT NULL,
    slug        text                     NOT NULL,
    content     jsonb                    NOT NULL DEFAULT '[]'::jsonb,
    created_at  timestamp with time zone NOT NULL DEFAULT now(),
    updated_at  timestamp with time zone NOT NULL DEFAULT now(),
    CONSTRAINT  posts_pkey PRIMARY KEY (id),
    CONSTRAINT  posts_slug_unique UNIQUE (slug)
);

Too much out
Fetching too much is easily done. Performing a SELECT * query will return every value from every column in every matching result and will return more data as more columns are added. Likewise, "unbounded queries," that is, a query without a limit, will linearly return more data as more matching data exists.-- ❌ returns unlimited columns and rows
SELECT * FROM posts;

-- ✅ returns limited columns and rows
SELECT id, title FROM posts LIMIT 10;

Selecting specific columns has the added benefit of making your code more declarative about the data your application requires. While PlanetScale measures the data transfer size of your queries, it can't make assumptions about how much of that query response was used. The more specific your queries are, the simpler the debugging process becomes.
For a JSONB column, you may also consider using Postgres' built-in syntax to extract specific values from the data if not all values are required.
For example, perhaps you want to build a table of contents from level 2 and 3 headings from our Portable Text column. An unspecific query would just return the entire content column.SELECT content FROM posts WHERE id = 1

Instead, we can use the jsonb_agg() function in Postgres to filter the array of objects down to just the headings we're looking for.SELECT jsonb_agg(block) AS headings
 FROM
   posts,
   jsonb_array_elements(content) AS block
 WHERE
   id = 1
   AND block->>'_type' = 'block'
   AND block->>'style' IN ('h2', 'h3');

Including JSON filtering will introduce some CPU overhead, so it's a tradeoff. Monitor resource usage and see if the reduced egress is worth it.
Fetch only the rows, columns, and data from those columns that your application requires.
Pagination also bounds how much data leaves your database per request. Without it, a growing dataset means ever-larger responses. Two common approaches:
Offset/limit skips a number of rows and returns a fixed page size. Simple to implement, but the database still scans all skipped rows, so deeper pages cost more.SELECT id, title FROM posts ORDER BY id LIMIT 10 OFFSET 0;  -- page 1
SELECT id, title FROM posts ORDER BY id LIMIT 10 OFFSET 10; -- page 2

Cursor pagination uses the last value from the previous page as the starting point. It performs consistently regardless of depth.SELECT id, title FROM posts ORDER BY id LIMIT 10;                -- page 1
SELECT id, title FROM posts WHERE id > 10 ORDER BY id LIMIT 10; -- page 2

For more detail on each approach to pagination, see Offset limit pagination and Cursor pagination in the MySQL for Developers course.
Too much in
While most cloud providers do not typically charge for ingress, there are instances where your ingress operations quietly result in egress.
ORMs can have this happen by default when returning data from an insert operation. Here is an example insertion operation using Drizzle.const post = await db
  .insert(posts)
  .values({ title: 'Hello world', slug: 'hello-world' })
  // ❌ returns everything with no parameters
  .returning()

The function call above would result in an SQL query like this.INSERT INTO
	posts (title, slug)
VALUES
	('Hello world', 'hello-world')
RETURNING
    *;

In this particular instance, we're only writing the title and slug, so the response is relatively small in terms of bytes transferred. It's worth noting, however, that more columns were returned than were written.+----+-------------+-------------+---------+-------------------------------+-------------------------------+
| id | title       | slug        | content | created_at                    | updated_at                    |
|----+-------------+-------------+---------+-------------------------------+-------------------------------|
| 5  | Hello world | hello-world | []      | 2026-05-11 16:04:03.049885+01 | 2026-05-11 16:04:03.049885+01 |
+----+-------------+-------------+---------+-------------------------------+-------------------------------+

The content column is small for now, but if we were writing an UPDATE to an existing and very large document, it would be returned with every operation.
Now imagine our content editor upserts changes to an edited document every second. This could be a massive payload of our Portable Text JSON, with every insert operation returning the full body of the inserted item, essentially doubling the operation's egress.const post = await db
  .insert(posts)
  .values({ title: 'Hello world', slug: 'hello-world' })
  // ✅ returns only the id column
  .returning({ id: posts.id })

Return only what you need, if anything.
Too often
If every user of your application requesting the same data results in a fresh request to your database, you're wasting your egress quota.
In the simplified diagram below, that means trying to avoid every user request from triggering fresh egress to generate a response.

Caching and Content Delivery Networks (CDNs) exist largely to improve performance. One way they achieve this is by reducing data transfer. By loading a local copy of the data your application needs instead of fetching it fresh from the database.
An application-level cache (like Redis) between your database and application, or a network-level cache (like a CDN) between your application and a user, can help reduce the frequency of requests to your database.
Preventing unnecessary work in your database is increasingly important as your dataset grows and the frequency of requests increases. A single JSONB column of Portable Text, for example, could get into megabytes in size, and you won't want it requested from the database with each page load, should your article hit the front page of Hacker News.
Too internet
Egress is typically charged when data travels over the public Internet. PlanetScale supports AWS PrivateLink and GCP Private Service Connect to improve security and reduce egress costs (another win-win combo).
If your application is hosted within the same infrastructure as your database (and it should be), you may be able to use either of these private connections to skip this public internet hop.
PlanetScale charges much lower rates for data transferred over these private connections, however, both ingress and egress are billed. See the documentation for more pricing details and to see if this is an option for you.
Read more: Private connections in the PlanetScale docs
Identifying egress usage
PlanetScale Postgres offers us ways to measure the bytes returned by individual queries, but not to observe egress bytes usage patterns over time. Let's look first at what it takes to measure a query.
With EXPLAIN
If we prepend EXPLAIN to the same unbounded, unspecific query as before, we're shown the query plan for the response.> EXPLAIN SELECT * FROM posts;

+----------------------------------------------------------+
| QUERY PLAN                                               |
|----------------------------------------------------------|
| Seq Scan on posts  (cost=0.00..15.60 rows=560 width=116) |
+----------------------------------------------------------+

The query plan shows us rows=560, an estimate of the number of rows returned, and width=116, an estimate of the size of each row. These estimates are based on averages and won't reflect the size of any particular row, especially for variable-length columns like JSONB.
The only way to accurately measure the transfer size of a query is to run it. Let's measure the difference between querying the full content column of a post compared to just extracting the headings.
We could use pg_column_size() to measure the size of the content column, but it would return the TOAST-compressed size, not the size of the data being sent over the wire. octet_length() will return a closer approximation of the relative size.
Postgres uses TOAST (The Oversized-Attribute Storage Technique) to compress and store large values, such as our JSONB column, so its on-disk size is dramatically smaller than its measured egress size. TOAST-compressed data is decompressed and serialized before being sent over the wire.-- Full content column
SELECT pg_size_pretty(octet_length(content::text)::bigint)
FROM posts WHERE id = 1;

+----------------+
| pg_size_pretty |
|----------------|
| 37 kB          |
+----------------+

-- Just the headings
SELECT pg_size_pretty(octet_length(jsonb_agg(block)::text)::bigint)
FROM posts,
  jsonb_array_elements(content) AS block
WHERE id = 1
  AND block->>'_type' = 'block'
  AND block->>'style' IN ('h2', 'h3');

+----------------+
| pg_size_pretty |
|----------------|
| 5127 bytes     |
+----------------+

Less data is smaller, big surprise!
This is useful information for this specific query, but measuring queries individually is tedious. Ideally, we want to monitor the size of every query generated by our application and see usage patterns over their lifetime. Fortunately, Insights does this for us.
With Insights
PlanetScale Insights monitors the queries performed in your database. These statistics can be viewed in the dashboard and are made available to agents via the PlanetScale MCP server.
Often, developers use Insights to measure query latency to improve performance, but it also provides many other statistics, such as bytes returned.

Open Insights and from the query list select the "Data" tab. These tabs contain preset columns relevant to debugging specific scenarios. Here we've sorted by "Bytes returned per query" and can see the largest transfer size of all queries in the currently selected time period.
Consider a query that returns 37 KB per call. Run 100 times, it transfers less than 4 MB and is probably not worth optimizing. Run 100,000 times, it transfers nearly 4 GB. Sort by the queries with the highest total bytes returned to find improvements which may have the most impact.
Look for frequently run, large-byte-transferred queries to identify opportunities for improvement.
Egress and ingress metrics
For PlanetScale Postgres databases, the overall volume of egress and ingress can also be measured in the Metrics tab. At the bottom of this tab are graphs for ingress and egress.
From here, you can look for spikes that correlate with queries run at particular times to find any outliers.

Read more: Metrics in the PlanetScale docs
Tagging classes of queries
Additionally, on PlanetScale Postgres, if you know your application contains several related queries you'd like to monitor collectively for egress or performance, query tagging is a way to link them.
Query tags are added using the SQL Commenter format. By adding tags, for example, we can tag every query that requests or updates a row with portable-text so that we can measure all these queries together.SELECT
	id, content
FROM
	posts
LIMIT
	10
/* returns=portable-text */;

From the "Tags" page in the dashboard, we can now view queries with just this tag and measure their transfer sizes more cleanly.

Read more: Tags in the PlanetScale docs
Conclusion
Don't wait until things start getting expensive before thinking about egress. Optimizing early can result in more declarative queries, cleaner code, faster responses, and lower resource demands on your database.
Connect your agent to the PlanetScale MCP server and prompt your agent to find opportunities to improve your application's database egress usage.
From the point of view of an application developer that understands efficient database usage patterns, interrogate our code base for examples where we are querying for columns of data that the application is not using, returning data from updates or inserts that we do not need, or improvements to reduce the frequency or quantity of queries performed for the same data. Read https://planetscale.com/blog/database-egress for more details.]]></content>
        <summary><![CDATA[Reducing the size and frequency of requests to your database has the double benefit of making your applications faster and cheaper.]]></summary>
      </entry>
    
      <entry>
        <title>Problem solving with PlanetScale Insights</title>
        <link href="https://planetscale.com/blog/problem-solving-with-insights" />
        <id>https://planetscale.com/blog/problem-solving-with-insights</id>
        <published>2026-05-07T00:00:00.000Z</published>
        <updated>2026-05-07T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[There are so many ways your database can disappoint you. It'll make your application perform in ways you don't expect and upset your users.
In a sufficiently complex application, finding and eliminating performance problems can be difficult. Fortunately, PlanetScale gives you the tools to isolate the problem. PlanetScale Insights, available in the dashboard and through the MCP server, provides accurate, up-to-date information on how the queries in your codebase perform in production.
But with so many different metrics available, how do you differentiate good numbers from bad, signal from noise, or know what the most likely fix is once you've pinned down the problem?
For this post, I'll walk through exploring Query Insights for a demo e-commerce app connected to a PlanetScale Postgres PS-10 database with a few million rows of data. I set up a flow of constant, regular traffic along with a few "unexpected" spikes.
PlanetScale Insights also works for PlanetScale Vitess/MySQL databases and has many of the same features. This post focuses only on PlanetScale Postgres.
Latency timeline graph

The default view of the PlanetScale Insights dashboard shows query performance, counts, and row reads and writes for the past 24 hours. You can navigate through up to seven days' worth of traffic data.
Query latency is the best starting point for isolating query performance issues. You can toggle trend lines in the graph on and off; the query list below aligns with the same timeline.
On this page, latency percentiles are computed from all query pattern executions performed within the observable time window. How fast most runs are versus the slow tail. That is how you differentiate the median run (p50) versus the worst few percent (p99 and above).
p50: Half of this query's executions complete faster than this value, half slower. This is the median latency for that pattern.
p95: 95% of executions complete faster than this; only 1 in 20 are slower. This filter identifies patterns that occasionally misbehave, but tuning them often will not move overall database latency (for example, workload p50) very much.
p99: 99% of executions complete faster than this; only 1 in 100 are slower. This is where gains can be made for that pattern's worst runs.
p99.9: Only 1 in 1,000 executions are slower. These are usually extreme outliers for that pattern: lock contention, cold caches, missing indexes, table scans, and similar.
Max: The single slowest execution of this pattern in the time window. Useful for spotting worst-case scenarios, but a single anomaly can skew this number and may be related to an almost random event that never reoccurs. Always compare it against the percentiles above.
For a deeper dive on understanding latency percentiles, watch Ben's video.

Given this screenshot of Insights from my example application, the tabs at the top show that at 12:05 GMT+1 the p50 is 1.4ms and the p99 is 2s.
Point-in-time performance numbers can be useful, but execution trends over time matter much more to find real, unexpected outliers.
From the graph we can see the p99 is consistently far higher than the p50 and p95, with one huge spike where it got as high as 12s.
Generally, you may think that if "only" 1/100 queries are slow this latency may have a limited blast radius. But if a page load in your application triggers 10s or 100+ queries to your database the impact could be widespread and affect more users than you think.
These slower p99 queries we need to find and resolve. Let's find the guilty parties.
Query list
Below the latency graph, filtered to the same timeline, is a list of queries. From here, you can investigate the performance of each individual query that was run on your database at the same time. There are many columns of data you can read to investigate query performance. Which data is useful to you will depend on what you're debugging.
If you're not sure which numbers to look for, the tabs on the top right have preconfigured columns.
For example, if your database consistently shows high CPU usage, click the "Resources" tab to view CPU usage metrics. You can click any column to sort by that metric.
Since we're looking to fix query latency, we'll click the Performance preset and sort queries by p99 latency (ms).

Note that the Performance preset also includes the "Rows read/returned" column; this is often the simplest identifier of slow queries. It contrasts rows the engine had to read with rows actually returned—when reads are high but returns are low, the database is doing a lot of work per useful row, often because of missing or unsuitable indexes. Most often, these queries can be fixed with an index.
Solving the response time issues for some of these queries will be simpler than for others. A number of these queries have a little (i) information icon beside them showing that the queries are being performed without an index and may benefit from one.
(It's also worth noting that some of these queries are slow because they're deliberately bad queries. I needed an exceptionally unoptimized application for this blog post. So, for example, we're not going to "fix" a query for a random product ID.)
Searching for queries
The search box above the query list lets you perform targeted searches for specific queries. You may write part of an SQL query in this box, but there are additional search syntaxes to query by feature, latency, tag, or more. Examples include:
indexed:false — find all queries not using an index
index:table_name.index_name — find all queries using a specific index
p50:>250 — filter by latency threshold
query_count:>1000 — filter by execution count
tag:key:value — filter by tag
Clicking the SYNTAX button on the right side of the search box reveals the full set of filters you can use to narrow down your query filtering.
Other graphs
Along with Query latency there are graphs to show other activity trends in your database.

The Queries tab shows total queries per second over time. If latency rises at the same time as query volume, you may be looking at a traffic spike rather than a single query pattern getting worse.
The Rows read tab shows how many rows the database reads per second. High rows read, especially compared to rows returned in the query list, can indicate that the database is reading unnecessary rows and may benefit from a better index.
The Rows written tab shows rows written per second over the selected time period. It gives you a separate view of write volume alongside query latency, query count, and rows read.
Query details
Let's click in to look at an individual query and what Insights can tell us.
If your application writes raw, sensible SQL, your query might look as simple as this:

If you're using an ORM, your query could be incomprehensible at first glance. Fortunately, the "Summarize query" button runs the query pattern through an LLM to describe its purpose in plain English.
You may also notice the query has been anonymized. Because parameters in a query may contain sensitive information, they're replaced with placeholders when logged in to Insights. In this instance, the search term %turbo% is rendered as the parameter $1, but it is not visible in Insights.
The page of a query pattern also contains a table of notable queries, individual executions that took longer than 1 second, read more than 10,000 rows, or produced an error. This could help determine whether your query is not always slow and perhaps reveal a common time when it runs slower than normal.
Taking action on a query
On this page, you can see the same performance graphs as the query list page, but isolated to just this one query. In this screenshot, we see a recommendation from Insights to add an index if the performance is poor. A lot of the time, this is a great idea.
Unfortunately, because this query is using a wildcard search, a BTREE index won't help.
If the query were simpler, like below, an index on the name column would greatly improve performance.select count(*) as count from products where name = 'turbo';

Instead, we may be better off with a GIN trigram index, as they are better designed for wildcard searches. Fortunately, pg_trgm is a supported extension in PlanetScale Postgres, so I was able to experiment with it. It improved query performance, but only slightly.
Often, an index can fix a slow query. Other times, slow queries reveal bad schema or application design. Both are important to resolve; the latter is just a little more complicated, as you may need to rip and replace the query in order to improve application performance.
The most common mistake of a smart engineer is to optimize a thing that should not exist

If the selected query is using indexes, statistics are shown below the latency graphs, along with tags attached to that query (more in the section below).
Insights MCP
Fortunately, there's never been a better time to fix complex problems.
The PlanetScale MCP server has access to the same data you're able to browse in the dashboard. This means you can task an agent with finding and suggesting fixes for slow queries within your codebase. With your application as its context and real-world production data available via tool calls to Insights, you no longer have excuses for slow database queries.
At PlanetScale, we have workflows configured to do this daily. See the video below for more details.
In the case of our slow query that can't be fixed with an index, this is a great job for an agent. It can not only read Insights data but also perform queries on its own. While experimenting with indexes on this database, I observed the agent reading the output of EXPLAIN ANALYZE to ensure the index was being used and to report the impact on results.
Consider a prompt something like:
I need you to help resolve a slow database query in this application. Make suggestions on whether we can resolve this by adding an index. If so, let's test the results before and after. Additionally, we may need to rethink the query and consider whether there are more efficient ways to obtain the same data to improve application performance.
To help keep your agent focused, include the details of the PlanetScale database in your application's AGENTS.md, for example:## PlanetScale

- Organization: ready-set-go
- Database: tutorial-insights
- Branch: main

MCP permissions are set when you authenticate the server. It is not advised to give an agent write access to your production database.
Grouping queries with tags
So far, we've looked at identifying queries by grouping together the slow ones. There are other reasons to group queries together, though, which can help with debugging as well as improve performance.
On the Tags page, we can see queries grouped by metadata related to them. There are built-in key-value pairs, such as the application name and remote address of the connection that ran the query.
Custom metadata can be included with queries as SQLCommenter comments. Not all ORMs support comments; check your documentation if you are not writing raw SQL.select count(*) as count from products where name ilike '%turbo%'
/* application='store', action='search' */;

These comments are then logged as key-value pairs as queries are performed, allowing you to investigate the performance of a specific subset of queries based on their application, intention, and more.
So if you're not debugging "why is this query slow," but instead "why is this section of the application slow," you might benefit from grouping that section's queries with the same tag.
For more on tags, see Enhanced tagging in Postgres Query Insights.
Tags are also the backbone of Traffic Control, the killer app of PlanetScale Postgres.
Traffic Control
Some slow queries are unavoidable. We've already determined that our application has a slow query that can't be easily fixed with an index. One option is to remove it entirely in favor of something else. An alluring third option is to put controls on how many resources the query can actually use.
Traffic Control allows you to do just that. Where timeouts in Postgres can be used as a blunt instrument to stop queries running over a certain time, Traffic Control gives you fine-grained control over how many resources a query can consume, as well as controls over concurrency and more. Perhaps our slow search query actually only runs from an admin panel.
So it's less of a concern that a single query is slow, but more of a concern if multiple administrators run it concurrently and bring down the database's performance.
The same tags we applied to observe a category of query behavior can have "resource budgets" applied to them to limit the amount of resources they are permitted to consume.
Insights now identifies slow queries, recommends improvements, and controls whether they can run at all.
See more in the Traffic Control documentation.
Continual improvement
So far, we've covered manual performance investigation. You and your agent are digging through Insights for improvements. As your application runs, Insights also gathers its own data on anomalous behavior and preemptively suggests upgrades.

The Anomalies page highlights when database performance is well outside the expected range. This can reveal unexpected query patterns, traffic spikes, or other problems with your database.
If you have an anomaly in your Insights dashboard, you can click in to see more details about the time of the anomaly and which queries contributed to it. Match this timeframe against any other application logging platforms you have to identify the root cause. It could be an unexpected one-time outlier, or it could be the result of recently updated application code, and is likely to repeat.
Learn more in the Anomalies documentation.
Insights also monitors traffic to regularly produce schema recommendations. These may include the index suggestion we saw earlier, or other helpful tips to potentially improve the health of your database.
Recommendations typically include SQL statements you can run in your database to take action.
Both the Anomalies and Recommendations data found within Insights are available from the PlanetScale MCP server if you would like an Agent to help you decide whether to take action on the database.

Error tracking
Slow queries aren't the only problem Insights can surface. The Errors page captures every database error from the past 24 hours and plots them on a timeline, letting you spot patterns you'd otherwise miss in application logs.

In my demo store, I simulated a retry storm during checkout — a flaky network that caused the same order to be submitted multiple times with the same idempotency key. The errors tab immediately surfaced the duplicate key value violates unique constraint message on the orders_idempotency_key_key index. Clicking into it revealed each occurrence: the exact query, when it ran, how long it took, and the tags I'd attached to identify the checkout action. From there, I could see the errors clustered in tight bursts, a telltale sign of retries hitting the same unique constraint rather than a systemic problem.
This is the kind of issue that often goes unnoticed. The application catches the exception, retries successfully, and the user never sees a failure — but the database is doing unnecessary work. The Errors page makes these invisible problems visible.
Conclusion
PlanetScale Insights is the best way to see how your database actually performs in production, providing you and your agents with the metrics that matter to improve your database schema, queries, or completely change access patterns.
In a future article, we'll look at how to inspect common database problems by viewing specific metrics in Insights. If there's an issue with your queries you can't yet get to the bottom of, let us know!]]></content>
        <summary><![CDATA[The best way for you and your agents to see how your database actually performs in production.]]></summary>
      </entry>
    
      <entry>
        <title>On benchmarking</title>
        <link href="https://planetscale.com/blog/on-benchmarking" />
        <id>https://planetscale.com/blog/on-benchmarking</id>
        <published>2026-05-05T00:00:00.000Z</published>
        <updated>2026-05-05T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Benchmarking is hard.There are many ways to do it wrong and few to do it right.
But zooming out from any single system or harness, there are broad principles that should be applied to all benchmarking.Using these correctly makes it difficult to produce biased results.
Am I the world's best benchmarker?Certainly not.I invented the language balls, after all.But correctness and precision are important parts of PlanetScale's culture.We've spent considerable time learning the art of benchmarking, and are here to share best-practices.
Here, we're focusing primarily on benchmarking databases, but these principles apply to many domains.
Client-server architecture
Databases typically operate in a client-server model.The database server is started, accepts connections from clients, executes queries, and returns results.
To benchmark, we need a client that establishes the connections, generates queries, and takes measurements.Since both sides consume resources and we want to give the database its full share of the host server, it's common to set up a distinct server for benchmark execution.
As usual, there's a catch.This introduces latency between the two machines.

How much this skews the results of the benchmark depends quite a bit on how "far apart" the benchmark server and database server are(network latency)and how long the queries / transactions take on the database(execution latency).

Let's consider a scenario where each query takes ~10ms to execute on the database.If the network round-trip time is 2.5 milliseconds, then we can execute approximately 80 queries per second over a single connection.On the other hand, what if the round-trip is 15 milliseconds?We've now cut our single-threaded QPS capability in ~half, resulting in 40 QPS.

Same database.Same benchmark client.The only difference is the speed at which bytes can go over the wire between the two.
This latency variation will always have an impact on latency measurements.
It can also impact throughput.We often don't run benchmarks on a single connection.We'll do 10, 50, or 100 simultaneous connections to best utilize the parallelism of the machine and database.But if we have a fixed connection count, and are not making it dynamic to account for round-trip latency, we can end up allowing the elevated latency to hurt throughput.
Finally, you should double-check that the client server is not a bottleneck.While benchmarking, ensure that CPU and network utilization are well under their capacity.We want to be straining the database server, not the client.
Choosing resources
It's easy to make one database look better than another with an imbalance of resources.Postgres running on a 16-core server will almost always perform better than on an 8-core server.
An important prerequisite to proper benchmarking is setting up the compute, storage, and networking resources to allow for a fair fight.
This isn't as easy as it sounds, especially when we're talking about running things in the hyperscaler clouds like AWS and GCP.For example, the Geekbench results for an AWS r7g.2xlarge are ~15% lower than the results for an r8g.2xlarge.Both have 8 vCPUs and 64 GB RAM.But move one generation newer, and there's a ~15% CPU improvement.

You might then be tempted to just use the same instance for everything, but this breaks down too.The availability of instance types varies over time, region, and database provider.In some cases, it's not possible to match.
In an ideal world, we'd run everything on the exact same instance.In reality, we sometimes have to settle for matching CPUs and RAM as best we can, and living with the differences.However, you must give this your best effort.Purposefully choosing to benchmark your product on 2025-gen CPU and then comparing to a competitor's product on a 2022 CPU, when the alternate was readily available, is intentionally misleading.
Workload
Even once we know that our infrastructure is set up sanely, there's a lot to consider for the workload we run.
The easiest way to think about this is in terms of traffic ratios.
How many queries are hitting RAM vs disk?
What % of the data is hot (frequently queried) vs cold (rarely queried)?
What's the ratio of reads to writes?
All of these impact performance, especially when combined with the variations of underlying hardware.
Queries executed on a relational database often require some amount of I/O work.Writing data must always be persisted to disk.Reading data can come from the in-memory cache, or disk on cache misses.

Some databases operate on local SSDs, while others use network-attached storage like AWS EBS or Google Persistent Disk.Some even take a hybrid approach.Either way, the percent of read traffic hitting RAM vs disk impacts performance due to I/O wait times.
Consider a benchmark like sysbench OLTP read-only.This is a simple, read-only benchmark that runs a handful of select query patterns repeatedly.As benchmarks often do, the data size is configurable in the preparation phase.If we run this benchmark on a server with 64 GB of RAM and a 32 GB data size, the entire data set will fit in RAM after warming.The same benchmark run with a 320 GB data size will generate significant I/O and inevitably run slower.
This is related to, but not the same as, data distribution.
Even for a fixed data size, access patterns can vary widely.The simplest examples are uniform and Zipfian.

A uniform access pattern gives every row the same chance of being queried on each request.If we have 100 rows, each has a 1% chance of being read for each operation.
A Zipfian access pattern is skewed: the k-th most popular key is accessed roughly proportional to 1/k.A small number of hot rows receive a large share of requests, while most rows are accessed rarely.
These are only simple models.Real workloads often have messier shapes: recently inserted rows might be hotter than old rows, one tenant might dominate traffic, or a small working set might receive most reads for a period of time.
Which pattern the benchmark operates with significantly impacts performance, because it in turn impacts how frequently we need to access disk vs RAM and the amount of cache churn.
Closed and open loop
There are two types of benchmark workload shapes: open and closed loops.
In a closed-loop benchmark, the client sends requests and then waits for a response before sending the next.while True:
    # wait for response
    response = send_bench_request()
    # then send next
    process(response)

We may do this in parallel across many connections, but each individual connection sends a controlled sequence of queries.A closed loop can also hide a failure mode called coordinated omission: when the database stalls, the client stops issuing new requests too, so the benchmark only records the stalled request and omits the work that would have queued behind it.This is especially misleading for tail latency, where the missing queued requests are exactly the ones that would have made p95/p99 look worse (more on latency and percentiles soon).
Open loop on the other hand has a fixed pace of sending requests, regardless of how quickly the database responds.while True:
    # fire and forget
    send_bench_request()
    # fixed pace
    time.sleep(0.1)

This can be fixed throughout the entire benchmark duration, or vary in a controlled way:

Open-loop benchmarks tend to be more realistic.In production systems, database load is applied at the rate that the clients demand, regardless of how well the database is keeping up.
Closed-loop benchmarks are more commonly seen in academic and performance comparisons, as they offer a more controlled environment for comparing things like QPS across a fixed amount of concurrency.
Both are beneficial, but they are useful for different things.Important to decide up front what the purpose of a benchmark is, then choose the type accordingly.
What to measure?
Broadly, there are two things we like to measure when benchmarking: throughput and latency.Any good database benchmark will report on both of these things.
Throughput
Throughput is the amount of work completed in a slice of time.In databases, the most common measures are Queries Per Second (QPS) or Transactions Per Second (TPS).For many popular benchmarks like TPC-C and TPC-H, TPS < QPS because there are typically multiple queries within single transactions.Either works fine as a measure.
To measure throughput, choose a workload, a period of time to run it for (say, 5 minutes / 300 seconds), and then execute with TPS / QPS sampling.As a benchmark runs, samples are taken of how many queries or transactions complete each second.We then display this as a graph, showing every collected data point:

A more compact way of displaying this is via a bar chart with error bars.

This communicates similar information in a more compact way, but it's ideal to show a full line graph, as that also better visualizes inconsistencies or spikiness of performance throughout a benchmark run.More on this later.
Error bars are only one way to summarize variance.Coefficient of variation, interquartile range, and histograms are different lenses on the same samples, each helping show whether a benchmark was stable, noisy, or hiding outliers.It's helpful to include these or provide the data so readers can compute them themselves.
Throughput only tells half the story.
Latency
Latency is the amount of time it takes to complete an operation, query, or transaction.We can look at individual latencies ("How long did this particular SELECT * FROM... take?"), but more often we assess latencies in aggregate.
The standard language for communicating about latencies in distributed systems is with percentiles over some span of time (1 second, 1 minute, etc.).For example:
p50 - The median latency.During this time period, half of the requests executed faster than this, the other half slower.
p90 - The 90th percentile.During this time period, 9 out of 10 requests executed faster, 1 out of 10 slower.
p99 - The 99th percentile.During this time period, 99 out of 100 requests executed faster, 1 out of 100 slower.
We can measure any latency percentile we want, but these are the most common, along with p95 and p99.9.When benchmarking, we typically measure one or more of these in a series of small windows over the entire benchmark period.Say, sample p50, p90, and p99 once per second over a 5-minute (300-second) execution.Then, we plot the results.

In some cases, the line graphs are overkill.As with throughput, the visual can be compressed using a bar chart showing the median (or mean), with error bars.

We now have a way of communicating both how much work we accomplished and how quickly each unit of work was completed.
Warmup
We've now settled the prep work and know what we should be measuring.Now let's get tactical.How do we ensure that we are fair when running the benchmark?There's a lot to consider for the executions themselves.
A big one is cache warmup.If we've recently booted up our database, the various caches are not full of pages (buffer_cache in Postgres, buffer_pool in MySQL).These require time and query load to warm, during which time latency and throughput will slowly be brought up to full potential.

We typically run databases without measurement for a few minutes to ensure all caches are warmed before starting benchmark measurement.This ensures non-full caches and other startup costs don't impact the numbers.
Configuration
Even when warm, there are a number of configuration options that impact performance over long stretches of time.Though there are many, a good example of this is checkpoint_timeout in Postgres.
This and max_wal_size determine how frequently we need to flush table / index changes to disk (I/O checkpointing).If we set these to low / aggressive values, we may trigger it once every minute, causing regular performance dips.If we set it lax to only trigger once every ten minutes, we may not even notice it in the results of a 5-minute benchmark execution.

We can end up with graphs like this in these cases.But run for another 10 minutes, and we'd likely see a large performance dip on the green line.
Background jobs, I/O checkpointing, autovacuum, and other work can impact the throughput, skewing the benchmark results.
It's important to consider the impact database configurations have on performance.An identical benchmark on the same hardware can perform very differently with different tunings.DBMSs give us these tunings so we can trade off things like performance, durability, data size, and resource consumption on a case-by-case basis.It's generally best to either (a) ensure all configuration options are aligned or (b) for pre-tuned situations (like most database-as-a-service providers) leave things at the pre-tuned defaults.
(In)consistency
Another important consideration, especially in the cloud, is (in)consistency.Even with the same benchmark instance and same client machine, latency and throughput can vary from run to run.This can be due to contention on the network or noisy neighbors that are co-occupying the same hardware you are running on.

It's advisable to do multiple runs to measure consistency.
Apples to apples to oranges
The best benchmarks are the ones that compare apples-to-apples.In other words, ones that create data-driven comparisons between products that have the same or very similar characteristics and feature sets.
Examples of this are:
Comparing 4 different Postgres configurations to determine workload suitability
Comparing 3 different cloud MySQL platforms to determine which is most performant
Comparing MySQL and Postgres on an identical workload (different databases, but same stated purpose)
People sometimes draw comparisons between vastly different database engines, resulting in wild claims.Things like:
Analytics queries run 100x faster on Apache Pinot than Postgres
Achieve 100x higher QPS on a purpose-built realtime database compared to a Postgres relational database
SQLite latency is 80% lower than MySQL
These are comparing databases that were distinctly optimized for different purposes.It's easy to make one look better than the other, especially when cherry-picking the workload.
Don't do this.Ensure comparisons are between comparable technologies and workloads that fit the DBMS's stated purpose.The one exception may be as an internal test to determine which technology, amongst ones with vastly different goals, is best-suited for a system.
Document everything
Good benchmarks should be reproducible.Document the client and target setups as exhaustively as possible: hardware (or cloud instance type), OS, software versions, build flags, configurations, benchmark tool, exact command line, etc.After looking at the results of a benchmark, an engineer should be able to reproduce the results.
Benchmark crimes
As you can see, there's a lot to good benchmarking.Missing any one of these steps leads to bias.Some of the most common mistakes:
Reporting only averages, without percentiles, variance, or the full time-series
Leaving out hardware, instance type, etc.
Measuring before the system reaches steady state
Reporting a percentage difference without the surrounding variance
Forgetting to check whether the benchmark client is the bottleneck
That last one is easy to miss!
If the client machine has maxed out on CPU or network connections, the graph may look like the database has plateaued.But all you've really measured is the limit of the load generator.
Go forth and benchmark
You now have an elementary understanding of database benchmarking.
When presenting results, don't stop at the numbers.If two runs differ meaningfully, offer a hypothesis for why: hardware, configuration, workload shape, cache behavior, network latency, or something else.The reader should not have to invent the causal story themselves.
Apply all these to your next round of benchmarks, and you're less likely to veer off-course.]]></content>
        <summary><![CDATA[Benchmarking is hard. Done wrong it is very misleading, and unfortunately it is frequently done wrong. Let's explore how not to make silly mistakes.]]></summary>
      </entry>
    
      <entry>
        <title>Transparency in benchmarking</title>
        <link href="https://planetscale.com/blog/transparency-in-benchmarking" />
        <id>https://planetscale.com/blog/transparency-in-benchmarking</id>
        <published>2026-05-05T00:00:00.000Z</published>
        <updated>2026-05-05T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Database benchmarks are imperfect.They are also useful.
No benchmark can tell you exactly how a database will perform for your application.Workload shape, data size, region placement, storage, configuration, and cost all matter.But fair benchmarks help customers understand tradeoffs, compare options, and ask better questions before choosing infrastructure.
The DeWitt clause
Many cloud vendors include language in their terms that restricts comparative benchmarking.These restrictions are called "DeWitt clauses", named after database researcher David DeWitt.That is a strange legacy for someone whose work helped move the database industry forward by measuring real systems and publishing results.
Previously, PlanetScale also included a DeWitt clause in our Acceptable Use Policy (AUP).Today, we are removing this in favor of a more open "Benchmarking" section in our AUP.
The new section reads:
You may perform benchmark tests (“Benchmark”) of the Services, provided that the Benchmark is conducted in good faith and uses a fair and transparent methodology. Please refer to PlanetScale's published benchmarking best practices. Except with respect to Beta Features, you may disclose the results of the Benchmark. If you perform or disclose, or direct or permit any third party to perform or disclose, any Benchmark of the Services, you (i) will include in such disclosure, and will disclose to PlanetScale, all information necessary to replicate such Benchmark, and (ii) agree that PlanetScale may perform and disclose the results of Benchmarks of your products or services, irrespective of any restrictions on Benchmarks in the terms governing your products and services.
Any Benchmark must be conducted in accordance with the Agreement, including this Acceptable Use Policy. The Benchmark must not interfere with the Services or misrepresent the configuration, methodology, results, or cost of the Services or any compared service.
Benchmarks have gained a bad reputation because they are frequently conducted poorly.Sometimes this is done with malicious intent, often referred to as "benchmarketing."At other times it is done out of ignorance.Many engineers are not trained in all aspects of fair benchmarking.
A new standard
Anyone benchmarking PlanetScale should follow the best practices outlined in our benchmarking guide.These practices come from our deep experience benchmarking databases in the cloud where topology, server location, region, workload, and instance type differences materially impact the result.
We encourage other vendors, analysts, and practitioners to use the same standard.Benchmarks should be deep, thorough, technically sound, and transparent enough for others to understand and reproduce.
Our ask
We invite other vendors to adopt this same language and standard in their own AUPs.Allow public benchmarking, remove DeWitt clauses, and hold benchmarks to clear expectations for fairness and transparency.
Customers should be able to compare the systems they rely on.]]></content>
        <summary><![CDATA[Transparent database benchmarks help customers make better decisions and push vendors to build better products.]]></summary>
      </entry>
    
      <entry>
        <title>RLS sounds great until it isn&apos;t</title>
        <link href="https://planetscale.com/blog/rls-sounds-great-until-it-isnt" />
        <id>https://planetscale.com/blog/rls-sounds-great-until-it-isnt</id>
        <published>2026-04-30T00:00:00.000Z</published>
        <updated>2026-04-30T00:00:00.000Z</updated>
        
        <author>
          <name>Josh Brown</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When you leave your house, go to sleep, or go do work in the yard, you lock yourdoor. Maybe you have a gate or fence you lock too. Without these, anyone canwaltz into your house and snoop around.
Row Level Security (RLS) can be attractive to developers for numerous reasons,but the foot-guns and gotchas in RLS often outweigh the benefits. You probablywant to keep your doors locked.
Friends and family: Managing access
RLS for Postgres lets administrators define security policies in their database,instead of the application layer. Let's imagine your house is your database, andthe rows, tables, and data are like the things inside.
When your friends or family come over, you give them keys to every drawer theyare allowed to have access to. Maybe everyone gets access to the silverware, butonly the family can access your laundry room.
This is similar to how policies work in RLS. The rules for who gets which keysare your policies. If a user passes a policy rule (has the key) then they areallowed to access the data. At a very small scale, this can seem like a greatidea. Anyone can access your database however they want and your policies ensurethey aren't seeing things they shouldn't.
Testing and scaling these policies as your database grows becomes nearimpossible. For every new feature in your application, you must ensure your RLSpolicies are protecting the correct rows. Remembering to add these policies canbe cumbersome, especially when they need to be manually synced to your codebase.
RLS fundamentally exists to protect your data. If you mess up even a singlepolicy however, your data becomes exposed. Managing access in the same locationyour code lives is much easier than remembering to write a new policy every timea new table, column, or feature is added to your product.
The party: Managing connections
Postgres uses a process-per-connection architecture. Each new user connecting toyour database directly with their role is like a new person coming into yourhouse. At first it's fine, but once you have 100 people it gets crowded prettyquick.
PgBouncer is a connection pooler that reuses a small number of directconnections to your database while letting many clients connect to it. Whenusing PgBouncer with RLS, you lose the upstream identity of the client.
The traditional way of solving this is using local variables instead of roles todefine RLS policies. You define a policy that reads from a session-localvariable instead of checking the Postgres role:CREATE POLICY user_isolation ON orders
  FOR ALL USING (user_id = current_setting('app.tenant_id')::bigint);

Then wrap every transaction in your application to set that variable:BEGIN;
SET LOCAL app.tenant_id = '1234';
SELECT * FROM orders;
COMMIT;

This requires a lot of extra application code to manage all the different localvariables attached to each and every transaction (1). IfSET LOCAL is omitted, current_setting() returns an empty string or throws anerror depending on how your policy is written.
Annoying neighbor: Attack Surface
You go out to get your mail and you find your neighbor standing over yourmailbox trying to open it over and over. You try to tell them that one is yoursand to let you in, but they are having none of it. Now you have to sit and waituntil they get bored and figure out they don't have the right key.
RLS acts like an extra WHERE clause appended to your queries. Unless the userlacks read permission on a table, their queries will still run even if no datais returned. On complex joins or queries lacking indexes, this can hurt databaseperformance.
If a malicious user starts retrying a query over and over, RLS will make surethey don't see any data, but cannot stop them from running the query itself.Relying on RLS to completely protect your tables burns valuable CPU cycles andcan potentially starve your other, honest users.
Any user of your application, particularly in situations where you do not havesufficient rate limiting in place, can DDoS your database simply by hitting anAPI endpoint. This is preventable by checking authentication to see if a user isallowed to run a query, without relying on RLS to manage your security for you.
A large keyring: Performance Implications
Every time your friend goes to get a Diet Coke, they need to find the fridge keyon their very large key chain. This wastes valuable time sifting through all thedifferent keys and trying each one, so instead they mark the key so it's easierto find next time they go to the fridge.
RLS policies are generally executed per row (2), meaning anyfunction or complex logic will run for each row scanned. This can be solved bywrapping functions into subqueries. Setting up a simple benchmark, we can see thedifference between RLS, RLS cached, and with RLS disabled. If you want to try ityourself, you can usethis benchmark repository.

For this benchmark, we tested 5 different setups. Two different functions thatare called from two different policies, and one without RLS at all.
RLS with a VOLATILE function
RLS with a STABLE function
RLS with a VOLATILE function + cache
RLS with a STABLE function + cache
No RLS
A volatile function is defined with the keyword VOLATILE that tells Postgresthe function may modify data or return different values upon successive calls.This is the default mode for a new function in Postgres.CREATE OR REPLACE FUNCTION get_current_role()
RETURNS TEXT
LANGUAGE SQL
VOLATILE
SECURITY DEFINER
AS $$
    ...
$$;

The other option is to use STABLE in our function definition. Stable functionscannot modify data, and are expected to return the same value for successivecalls within the same transaction. When using RLS however, Postgres does notcache the value when evaluating the policy on each row during queries. In orderto successfully cache the result across each policy evaluation, we need to trickPostgres.
When we wrap the function call in a SELECT, Postgres creates an InitPlanquery node type. By default, anything after the USING keyword is executed as aSubPlan type, where Postgres expects that the outcome can change row to row.This is desired as that is what we are checking; for every row, should the userbe allowed to fetch it.
An InitPlan is only run once per execution of the outer plan, and cached forreuse in later rows of the evaluation. Using EXPLAIN, we can see how thedifferent policy definitions change the estimated cost.
-- RLS without subquery: no InitPlan, high cost
CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::bigint AND get_current_role() = 'admin');
EXPLAIN:
    Aggregate  (cost=34828.68..34828.69 rows=1 width=40)
      ->  Index Scan using orders_tenant_id_idx on orders  (cost=0.43..34826.20 rows=495 width=6)
            Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
            Filter: (get_current_role() = 'admin'::text)

-- RLS with subquery: Initplan caches result, lower cost
CREATE POLICY tenant_isolation ON orders USING  (tenant_id = current_setting('app.tenant_id')::bigint AND (SELECT get_current_role()) = 'admin');
EXPLAIN:
    Aggregate  (cost=10095.69..10095.70 rows=1 width=40)
      InitPlan 1
        ->  Result  (cost=0.00..0.26 rows=1 width=32)
      ->  Index Scan using orders_tenant_id_idx on orders  (cost=0.43..10092.95 rows=495 width=6)
            Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
            Filter: ((InitPlan 1).col1 = 'admin'::text)

The cost= in the explain rows is Postgres' guess at how expensive a query willbe to run, in arbitrary units. The first number is the estimated startup cost;or how expensive it is to do the sorting and filtering of the query beforereturning rows to the user. The second number is the estimated total cost,including fetching all the rows. The rows= and width= are how many expectedrows the query will return, and the width of those rows respectively.
When Postgres doesn't think it can cache the inner query, the cost is over 3xhigher than if it would have been able to. In reality, the actual latencydifference is much larger than 3x as seen in the chart above.
When Postgres doesn't cache expensive functions in your policy definitions, RLSbecomes expensive overhead. RLS can be just as fast as if you weren't using itat all in some scenarios. The issue is that RLS becomes yet another layer ofcode that needs to continuously optimized, where small mistakes can cause largeperformance hits.
It's your house: Permission ownership
It's your house, you obviously have the keys to everything, but what if youweren't supposed to?
Every Postgres table has an owner. Normally you'd control table and row accesson a per-Postgres-role basis, however when you connect to Postgres as the owningrole of a table, none of its RLS policies apply. You must explicitly opt in:ALTER TABLE users FORCE ROW LEVEL SECURITY;

Even this may not be sufficient if you are connected with the Postgres superuserrole. Any roles that contain the SUPERUSER attribute will always bypass RLS.This is easy to miss and easy to test incorrectly. Your policy tests might passunder a non-owner role while production traffic runs as the owner.
Making a ham sandwich: Stricter patterns
Let's say your friend Andy wanted to make a ham sandwich. He had access to thefridge and utensils, but not your grocery list. When he made his sandwich, heused up all the mustard, and now you need to go get more. When using RLS, Andy'squery can't touch our grocery list. We have to update that separately.
Without RLS this is easy. When using RLS, doing this type of query can add a lotof complexity. Getting the utensils, making the sandwich, and updating thegrocery list might not share the same permissions. While rows in one table maybe accessible to a user, updating rows in another may not be. Since we own thegrocery list, we don't want anyone touching it except in well defined scenarios.
One way to solve this is by using multiple roles and multiple transactions, butthis becomes overly cumbersome on our application layer. A better solution wouldbe to add a SECURITY DEFINER function in our database that gives roles accessto modify or view data in a well defined way:CREATE FUNCTION use_ingredients(ingredients text[])
RETURNS void
LANGUAGE plpgsql
SECURITY DEFINER AS $$
BEGIN
  -- Runs as the function owner, bypassing Andy's RLS policies
  UPDATE grocery_list SET quantity = quantity - 1
  WHERE item = ANY(ingredients);
END;
$$;

SECURITY DEFINER causes the function to run as its owner's role, bypassing RLSentirely for that operation. Now you're back to managing security on both RLSand your application layer, ensuring only specific parameters are allowed topass to this function.
Keeping database functions in version control also becomes difficult. Somemigration tools include SQL functions and policies, but are another part of yourschema migrations that can cause headaches down the road.
Your application layer also needs to stay in sync with every function it callsin your database. Changing function definitions, names, or return values mayrequire a new database migration, or delicate surgery to ensure a stable update.
End of the day
Once we have managed locking everything under a different key inside your house,who has what keys, who is allowed in, and who is delegating access for who, wefind our application code has almost as much logic as if it didn't have RLS atall.
RLS policies themselves are stored in pg_policies inside your database, not inyour source code. Most standard migration tools don't track policy changesalongside schema changes. Policy migrations become a separate, manual process,and they drift. A schema change that adds a column or renames a table cansilently break a policy that no one realizes is outdated until something breaksin our application, impacting users.
Each query to the database will already need some sort of modifier in yourapplication code to add local variables for user identification when usingPgBouncer. Misconfigured local variables could be just as damaging as if RLSwasn't there to begin with.
We still need to check early on if a user has permission to run a query, or elsewe risk allowing users to degrade our database performance with spam. If we arealready checking permissions at the application layer, the benefits of RLSbecome harder to observe.
Optimizing queries also becomes much harder. Queries are artificially restrictedto what they are allowed to see, and need bespoke functions and permissions toget access. This causes our management of source code and database logic tobecome even harder to manage, between policies, functions, and the mappingsbetween them.
How to do it right
At PlanetScale, we typically recommend against relying on Postgres RLS. Theremay be occasional useful scenarios, but when implementing RLS correctly atscale, the benefits quickly turn into cons with a higher overhead not only toperformance, but also developer experience and complexity.
Application-layer authorization like middleware, ORM-level scoping, or adedicated permissions table keeps your logic visible, testable, and co-locatedwith the code that uses it.
Your database is more like a warehouse. Don't treat it like your house.
Footnotes
Note that PgBouncer pool_mode must be in either session or transaction.statement mode won't work with SET LOCAL at all.
The Postgres query planner can sometimes determine that a policy is safe tocache across evaluations on its own. Doing this properly can be a trickyprocess. Even in our benchmark example, functions that are marked as stablestill need to be wrapped in a subquery in order for Postgres to properlycache the result. Each policy is different, and determining the properoptimizations for each one is another layer of complexity in your codebase.]]></content>
        <summary><![CDATA[PostgreSQL's Row Level Security sounds like a clean way to enforce access control at the database layer, but the foot-guns, pooling incompatibilities, and performance traps often make it more trouble than it's worth.]]></summary>
      </entry>
    
      <entry>
        <title>Approaches to tenancy in Postgres</title>
        <link href="https://planetscale.com/blog/approaches-to-tenancy-in-postgres" />
        <id>https://planetscale.com/blog/approaches-to-tenancy-in-postgres</id>
        <published>2026-04-21T00:00:00.000Z</published>
        <updated>2026-04-21T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We've updated our use of the term "row-level isolation" to "shared-schema" in this article to avoid confusion with Postgres RLS.We do not recommend relying on Postgres RLS.
Multi-tenancy is a term used across various kinds of technical infrastructure, including application hosting, compute, databases, and more.
For example, you may purchase cloud services from a provider, but your account is one of many that draws from a common pool of resources. Your account is one "tenant" in a multi-tenant infrastructure.
In this article, we're focusing on using a single Postgres database cluster to serve an application with many tenants—you are our customer, and your customers are tenants in that cluster.
Given the many approaches to multi-tenancy within a Postgres database, it is worth clarifying the recommended best practices and the data models you should avoid. These recommendations are informed by years of seeing multi-tenant applications, both good and bad, succeed and fail at scale.
Definitions
The term "database" is overloaded and can refer to different things:
A Database Cluster refers to the entire database server instance – the running Postgres process, its storage and any replicas.
A Logical Database is an isolated namespace within a database cluster that contains its own schemas, tables, and data.
When you generate credentials to connect to a database, you're connecting to the database cluster. The queries you perform will target a single logical database within it. On PlanetScale Postgres, the default logical database name is postgres.
In short: one database cluster can contain many logical databases.
When modeling data in a relational database:
A Tenant refers to a single entity that accesses their own subset of data in your application.
Single-tenancy refers to giving each tenant their own isolated schema, logical database, or database cluster.
Multi-tenancy refers to using a consistent schema (set of tables and relationships) for all of the users of your application within a single database cluster.
Three approaches to tenant isolation
There are three common approaches to separating tenant data within a single database cluster:
Shared-schema where each user/tenant uses a shared set of tables and is isolated by a column value such as user_id, tenant_id, etc.
Schema-per-tenant where each tenant has its own schema and tables
Database-per-tenant where each tenant has its own logical database, schema, and tables
Of the three approaches, shared-schema is the most common and is our recommended approach.
Shared-schema is also the only true method of "multi-tenancy" in a relational database. Schema-per-tenant and database-per-tenant within the same database cluster do not share tables, but they do share resources.
Finally, you may already be running a database using one schema-per-tenant. You may be able to migrate to a recommended approach to improve the performance of your application and workloads. See Migrating to schared-schema multi-tenancy.
Good examples for multi-tenancy
Good examples of multi-tenancy include SaaS applications that need to isolate data for each customer but have so many customers that it would be impractical to assign each customer to an individual database cluster. Or multi-national applications that need to isolate data for each country, market, or region.
These are good use cases for multi-tenancy because only the data is different between tenants. The schema, tables, relationships, application code and access patterns are uniform across all tenants.
With any multi-tenancy approach, your goal should be for data belonging to each tenant to be consumed by the same applications, with care to ensure that one tenant cannot query another tenant's data nor that their behavior in your application could jeopardize the experience of another tenant.
These recommendations assume all tenants share the same schema. If tenants genuinely need different schema structures, schema-per-tenant or database-per-tenant is the better fit.
Shared-schema
Recommended. This is the most common, general-purpose method for combining tenants in a single database.
All data is stored in a single database cluster
All tenants share the same schema and tables
Each tenant's data is isolated with a column such as tenant_id
With shared-schema, each tenant shares the same schema and tables, but has its own data.
This is the simplest model conceptually and the most scalable approach to multi-tenancy.CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    tenant_id BIGINT NOT NULL,
    customer_name TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    total NUMERIC
);

-- tenant_id should lead most indexes
CREATE INDEX idx_orders_tenant_created ON orders(tenant_id, created_at DESC);

-- Insert data for different tenants into the same table
INSERT INTO orders (tenant_id, customer_name, total) VALUES (1, 'Alice', 49.99);
INSERT INTO orders (tenant_id, customer_name, total) VALUES (2, 'Hans', 59.99);

-- Every query must filter by tenant
SELECT * FROM orders WHERE tenant_id = 1;

Depending on the size of your tables, shared-schema can easily scale to many thousands of tenants. Migrations and schema changes need only be applied to a single table to update all tenants. Querying across tenants is simple and efficient.
Modeling tenants
In most multi-tenant applications, tenants have metadata beyond just an ID — a name, a region, etc. A dedicated tenants table gives you a place to store this and lets the tenant_id column across your schema remain a compact, performant BIGINT foreign key.CREATE TABLE tenants (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    code VARCHAR(2) UNIQUE NOT NULL, -- 'uk', 'de'
    name TEXT NOT NULL               -- 'United Kingdom', 'Germany'
);

Using a BIGINT for tenant_id is preferred over text-based identifiers. A BIGINT is faster to compare than a string and is a stable identifier that won't need to change if a tenant rebrands or a region code is restructured.
The column name tenant_id is a common one, but not a required naming convention. For example, a social media application may use the column user_id for the same purpose.
Enforcing tenant filtering
The inherent risk of shared-schema is that every query must include WHERE tenant_id = ?. Rather than relying on each query to add this manually, use ORM global scopes, middleware, or a shared data access layer to inject the tenant filter automatically.
Postgres also offers Row-Level Security (RLS) as an optional, additional layer of defense. RLS automatically appends a filter to every query on a table based on a session variable. In the example below, RLS ensures that queries are scoped to the current tenant without relying on the application to include the filter.-- Create a non-superuser role for the application
CREATE ROLE app_user LOGIN PASSWORD 'secret';
GRANT SELECT, INSERT, UPDATE, DELETE ON orders TO app_user;

-- Enable RLS and define the policy
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::BIGINT);

-- At runtime, your app sets the tenant context per request
BEGIN;
SET LOCAL app.current_tenant = '1';
-- SET LOCAL ensures the setting is scoped to this transaction
-- which is important when using connection pooling.
-- Only returns orders for tenant_id = 1

SELECT * FROM orders;
COMMIT;

We generally don't recommend relying on RLS. It shifts security logic into the database, where policy misconfiguration, silent failures, and connection pooling interactions are difficult to debug. Keep tenant isolation enforced in your application code.
Partitioning
With all data stored in a single table, as your database scales and your tenant count grows, shared-schema can be further optimized by partitioning the table. The tenant_id column, which is used to partition the data, is an ideal partition key.
Partitioning is a Postgres feature that splits a single logical table into multiple sub-tables based on a column value. Your application queries don't need to target a specific partition, as Postgres will automatically route the query to the correct one.
In practice, you would only partition tables that grow large enough to benefit from it. A messages table with billions of rows is a strong candidate for partitioning by tenant, but not a small reference table like office_locations with only thousands of rows.
Note that Postgres requires the partition key to be part of the primary key on partitioned tables.-- Create a partitioned table
CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    tenant_id BIGINT NOT NULL,
    customer_name TEXT,
    total NUMERIC,
    PRIMARY KEY (tenant_id, id)
) PARTITION BY LIST (tenant_id);

-- Create a partition for each tenant
-- All rows with tenant_id=1 (UK) go into 'orders_tenant_1'
CREATE TABLE orders_tenant_1 PARTITION OF orders FOR VALUES IN (1);
-- All rows with tenant_id=2 (DE) go into 'orders_tenant_2'
CREATE TABLE orders_tenant_2 PARTITION OF orders FOR VALUES IN (2);

-- Your application doesn't know or care about partitions
INSERT INTO orders (tenant_id, customer_name, total) VALUES (1, 'Alice', 49.99);
-- Postgres automatically routes this to orders_tenant_1

SELECT * FROM orders WHERE tenant_id = 1;
-- Postgres only scans orders_tenant_1 (partition pruning)

Partitioning can greatly improve performance and scalability by reducing the amount of data that needs to be scanned and the size of indexes. Internal processes such as vacuuming and index maintenance are also performed on a per-partition basis.
This adds to operational overhead, as you will need to create a new partition for each tenant.
Row-level isolation with partitioning offers some of the benefits of database-per-tenant multi-tenancy with lower operational overhead.
Tenant data lifecycle
With partitioning, onboarding each new tenant requires creating a new partition.
Partitioning simplifies offboarding tenants: you can drop the partition, and all data for that tenant is deleted.-- Wrap in a transaction in case the DROP fails
BEGIN;
ALTER TABLE orders DETACH PARTITION orders_tenant_1;
DROP TABLE orders_tenant_1;
COMMIT;

Without partitioning, a new tenant's data can be inserted into a table with no schema changes or migrations.
However, removing tenants requires doing table-level delete operations, which can generate a significant number of dead tuples and increase vacuum pressure.DELETE FROM orders WHERE tenant_id = 1;

Schema-per-tenant
Generally not recommended. Schema-per-tenant has a few benefits but does not work well at scale.
All data is stored in a single database cluster
Each tenant has its own schema and tables
Each tenant's schema and data are isolated by the schema name as a prefix to the table name
The appeal of this approach is greater isolation, since your queries do not need to filter on a specified tenant_id column. Instead, your application can reuse the same queries but with a different search_path to target the correct tenant's data.-- Create the schemas
CREATE SCHEMA uk;
CREATE SCHEMA de;

-- Each gets identical tables
CREATE TABLE uk.orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);
CREATE TABLE de.orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

-- At runtime, your app sets the search path per request
BEGIN;
SET LOCAL search_path TO uk;
SELECT * FROM orders;  -- returns uk.orders data
COMMIT;

BEGIN;
SET LOCAL search_path TO de;
SELECT * FROM orders;  -- now returns de.orders data
COMMIT;

There are performance benefits to using a schema-per-tenant. With each table containing fewer rows, indexes are smaller and more likely to fit in the buffer cache. One tenant's update/delete churn will not increase another tenant's bloat or vacuum workload.
However, the operational overhead of maintaining a schema-per-tenant outweighs the performance benefits. It increases schema migration complexity because they need to be applied to each tenant's schema. Should you need to query across tenants, complex cross-schema joins will be required.
While this approach works, it likely won't scale beyond a few hundred tenants. Every table, index, constraint, and sequence across all schemas lives in shared system catalogs. With hundreds of schemas, each containing even a modest number of tables and their indexes, these catalogs grow into millions of rows. This slows the query planner as it consults the catalog on every query. Migrations slow down as the catalog size increases.
Safety concerns of SET search_path
There is no database-level enforcement of preventing access to the wrong schema. Schema-per-tenant feels like greater separation of data, but it does not meaningfully impact data isolation from a security perspective. You may also need to create a separate database user and set up precise schema-level permissions for better security.
Tenant data lifecycle
Onboarding new tenants requires creating a new schema for the tenant and performing a migration.
Removing tenants from a schema-per-tenant configuration may be one of the few operational advantages of this approach to multi-tenancy, as it is a single, simple operation.DROP SCHEMA uk CASCADE;

Database-per-tenant
Generally not recommended. Database-per-tenant has a few benefits but is at odds with the connection model of Postgres.
All data is stored in a single database cluster
Each tenant has its own logical database, schema, and tables
Each tenant's data is isolated by the logical database name
Within a PlanetScale Postgres database, you have the option to run CREATE DATABASE to create many logical databases within a single database cluster.
The appeal of using logical databases per tenant is increased isolation: you do not need to filter by a column or modify search_path; instead, you can modify the connection string to connect to the correct database. This makes working with the data and schema of an individual tenant much simpler.-- Create separate databases
CREATE DATABASE uk_store;
CREATE DATABASE de_store;

-- Connect to the UK database and create tables there
\c uk_store
CREATE TABLE orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

-- Connect to the German database and do the same
\c de_store
CREATE TABLE orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

There are notable performance benefits to using a database-per-tenant. With each table in each database containing fewer rows, indexes are smaller and more likely to fit in the buffer cache. One tenant's update/delete churn will not increase another tenant's bloat or vacuum workload.
Database-per-tenant is better for performance than a schema-per-tenant, as each database contains its own catalog of tables, indexes, constraints, and sequences.
However, these performance benefits are still outweighed by the drawbacks of increased operational complexity. Critically, connection pooling becomes a problem immediately, as PgBouncer pools are calculated per-database and will quickly exceed your max_connections limit. Connection limits are the primary issue with database-per-tenant multi-tenancy.
Additionally, each CREATE DATABASE copies Postgres's template database, consuming roughly 8 MB. Unlike schema-per-tenant, where all schemas share a single set of system catalogs, every logical database carries its own, multiplying storage and catalog maintenance overhead with each new tenant.
While all the isolation and performance benefits of a database-per-tenant are compelling, it conflicts with Postgres's connection model.
Additionally, if you need to query across tenants, there is no way to do so in Postgres. You would need to use an external data warehouse or a custom application layer to join the data together.
While this approach works, it likely won't scale beyond a few hundred tenants.
Security considerations
Of all the multi-tenancy approaches, the database-per-tenant approach is the most isolated from a security perspective. Each tenant has its own logical database, schema, and tables. Each tenant's data can be accessed only by a user with privileges to that database and schema.
Even so, the limitations on connectivity and the operational complexity of this model make it difficult to recommend.
Tenant data lifecycle
Every new tenant requires a new logical database to be created and a migration to set up its tables.
Removing tenants from a database-per-tenant configuration may be one of the few operational advantages of this approach to multi-tenancy, as it is a single, simple operation with no side effects.DROP DATABASE uk_store;

Protecting tenants from each other
In all three approaches to multi-tenancy, tenants must be protected from one another, both in terms of data access and resource contention.
Our recommended approach, shared-schema, is the most exposed because tables and indexes are shared. Care must be taken here to keep things safely isolated. Schema- and database-per-tenant approaches are more isolated at the relation level, but all three compete for CPU, memory, disk I/O, and connections.
One tenant running an expensive query degrades performance for all other tenants, commonly referred to as a "noisy neighbor" problem. Within your database, you can add some protection by setting statement_timeout and idle_in_transaction_session_timeout appropriately. Your application should also be aware of potential rate limits, which could allow one tenant to disrupt another tenant's experience.
PlanetScale Query Insights can help you identify and troubleshoot performance issues within your database, which you can debug manually or with an Agent using the PlanetScale MCP server.
Migrating to shared-schema multi-tenancy
Should your application already be configured for schema, database, or some other kind of multi-tenancy, you may be able to migrate to shared-schema multi-tenancy by adding a tenant_id column to your tables and updating your application to filter by this column.
If you are not yet on PlanetScale, we have successfully migrated large, multi-tenant workloads that were experiencing operational, performance, or scaling issues. We offer hands-on assistance on a case-by-case basis.
Reach out to discuss your current situation.
Other examples for multi-tenancy
For needs that are less than mission critical, such as internal applications and side projects, you may diverge from the recommendations in this post. For example, you might like to run distinct applications from a single database cluster, as it seems cheaper or operationally advantageous.
If your multiple "tenants" are actually different applications with unique data structures running from a single database, we simply ask you to exercise caution.
If you can't behave, be careful.]]></content>
        <summary><![CDATA[There are many ways to slice a Postgres database for multi-tenant applications. Let's look at the three most common approaches and the trade-offs.]]></summary>
      </entry>
    
      <entry>
        <title>Keeping a Postgres queue healthy</title>
        <link href="https://planetscale.com/blog/keeping-a-postgres-queue-healthy" />
        <id>https://planetscale.com/blog/keeping-a-postgres-queue-healthy</id>
        <published>2026-04-10T00:00:00.000Z</published>
        <updated>2026-04-10T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[A healthy digestive system is one that efficiently eliminates waste. Fiber is a key part of a healthy diet, not because it is nutritious, but because it keeps everything you consume moving.
Databases are not so different. If you want a healthy queue table, you'll need to monitor the systems that are designed to perform cleanup well before they're backed up.
Postgres has been a popular choice for queue-based workloads long before it was a good fit for the job. Over many years and multiple major versions, Postgres has only become an even stronger choice for this type of workload.
But what makes job queues uniquely problematic? And in spite of all these advancements, what traps remain?
It's worth knowing, since they could bring down not only your job queue but also your mixed-workloads database and your entire application.
Sharing the load
The "just use Postgres" meme lends credence to the notion that every workload belongs in a Postgres database. It's not the worst idea. You really can throw just about anything at a Postgres database and make it stick. The rich extensions ecosystem fills any functionality gaps in "vanilla" Postgres.
As a result, you may have multiple distinct workload types running in the same database at the same time. Your OLTP, OLAP, Time Series, Event Sourcing, Full Text, Geospatial, and/or Queue workloads may all be running at the same time in the same database cluster with different needs, challenges, and priorities—while competing for the same resources.
There are dedicated services for each of these workload types that you can use in isolation. If you're reading this blog post, however, you're likely looking to optimize how they can all work in harmony.
At PlanetScale, we're always in favor of choosing the right tool for each job—Postgres or otherwise. But if you're curious about maintaining healthy queues alongside mixed workloads in Postgres, keep reading.
The queue workload
What makes a queue table unique is that most rows are transient. Inserted, read once, and deleted. So the table's size stays roughly constant while its cumulative throughput is enormous.
Your application may use a job queue to track asynchronous actions like sending an email, creating an invoice, or generating a report. The major benefit of doing this in Postgres is that you can keep the job state and any other logic running in your database in sync with the transaction.
If the job fails, the entire transaction fails and rolls back. If the transaction fails, the job may retry or get deleted. Using an external vendor requires careful coordination to keep in sync with your application's transactional state.
Our example for today
Consider this simple queue table one might use to create individual jobs that need doing. The payload column contains all the information your application needs to complete the operation.CREATE TABLE jobs (
  id BIGSERIAL PRIMARY KEY,
  run_at TIMESTAMPTZ DEFAULT now(),
  status TEXT DEFAULT 'pending',
  payload JSONB
);

CREATE INDEX idx_jobs_fetch ON jobs (run_at) WHERE status = 'pending';

As your application regularly performs queries to check for jobs to be done, it searches for the oldest job that is still in a pending state, performs whatever work is necessary, and then deletes that job.
The worker opens a transaction and claims the next pending job:BEGIN;

SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY run_at
LIMIT 1
FOR UPDATE SKIP LOCKED;

In practice, keeping this transaction as short as possible is critical — the longer it stays open, the longer it holds back vacuum. The examples in this post assume sub-millisecond worker operations.
The worker performs whatever work the job requires. If the work fails, the transaction rolls back — the row was never modified, the lock is released, and the job becomes visible to other workers again.
If the work succeeds, the worker deletes the job and commits:DELETE FROM jobs WHERE id = $1;
COMMIT;

For concurrency and faster job processing, you may want multiple workers executing individual jobs simultaneously. With the example query above, each worker is protected against performing duplicate work by the line FOR UPDATE SKIP LOCKED as the same query will "lock" the row it is working with until the transaction is committed.
As we can see, the nature of a job queue workload is quite simple. A row is fetched and then deleted. Beneath the surface, however, there's more to it than that. There's cleanup to be done.
The common issue that degrades job queues and the database they operate in is when the database cannot clean up after these transactions faster than new work accumulates.
Performance alone is not the problem
Postgres is documented by others to handle this workload at massive scale. So Postgres' capability to support job queues is not in question.
Keeping your job queue in harmony with the other competing workloads of your database is typically the challenge.
The health of your queue table depends not only on its own configuration, but also on the behavior of every other transaction running on the same Postgres instance. While replicas and replication slots can also work against queue tables, this post focuses on competing query traffic on the primary.
Cleaning up dead tuples is the problem
When rows mutate, Postgres can maintain multiple versions of the same row, so that different transactions can see row values as of the time they were queried. This is Postgres' implementation of "Multi-Version Concurrency Control" (MVCC) and a core principle of its design.
This means in our job queue a row in a Postgres database targeted by a DELETE operation is not immediately removed. Instead, it is marked for deletion, made invisible to new transactions, and remains in the database until cleaned up. These not-yet-deleted, invisible rows are referred to as "dead tuples."
Dead tuples are cleaned up by a "vacuum" operation, which can be performed manually or occurs regularly in a healthy Postgres database. While dead tuples are not returned in a SELECT query, they still incur a cost.
For a sequential scan, the executor reads dead tuples from heap pages and checks their visibility before discarding them.
For an index scan — the kind our job queue relies on with ORDER BY run_at LIMIT 1 — the cost is more insidious: the B-tree index itself accumulates references to dead tuples, forcing the scan to traverse entries that point to rows no longer visible.
Each dead index entry means additional I/O to check a heap page only to discard it. This overhead is invisible to the application but can grow substantially with the number of dead tuples.
As for how frequently cleanup is attempted, autovacuum_naptime controls how long the launcher sleeps between checking each database for tables that need vacuuming, usually 1 minute by default. When a table is vacuumed depends on the dead tuple thresholds autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor.
Dead tuples under the hood
Let's envision the scenario where we have a jobs table in which tasks of different types are regularly created and processed. Another application accesses the same database to perform large analytical queries and generate reports. These are lower-priority and slower to complete.
Say you perform a query on the jobs queue table:SELECT * FROM jobs WHERE status = 'pending'

The response you expect to see shows those three pending jobs:-- What you see
 id |         run_at          | status  |            payload
----+-------------------------+---------+-------------------------------
 42 | 2026-04-07 09:01:12 UTC | pending | {"type": "email", "to": "..."}
 43 | 2026-04-07 09:01:14 UTC | pending | {"type": "invoice", "id": 781}
 44 | 2026-04-07 09:01:15 UTC | pending | {"type": "report", "id": 332}
(3 rows)

Within each row is metadata that the query executor reads to determine whether it should be included in the response or is invisible to the current transaction. While you can't query for dead tuples, you can include this metadata in the response of any live tuples.-- This transaction is given an ID (XID) by the database
SELECT ctid, xmin, xmax, id, status FROM jobs WHERE status = 'pending';

 ctid  |  xmin  | xmax | id | status
-------+--------+------+----+---------
 (0,7) | 439821 |    0 | 42 | pending
 (0,8) | 439825 |    0 | 43 | pending
 (0,9) | 439830 |    0 | 44 | pending
(3 rows)

ctid — the physical location of the tuple on disk, expressed as (page, offset) within the table's heap.
xmin — the transaction ID (XID) that inserted this row; readers use it to decide whether the row existed when their transaction began.
xmax — the XID that deleted or locked this row; a value of 0 means no transaction has marked it for deletion yet.
There may also be dead tuples, previously deleted rows that haven't been physically removed yet, which the Postgres executor still has to scan on its way to returning a response. While you only saw three live rows, the executor scanned through many more:-- Conceptual view: what the executor scans (not real query output)
 ctid  |  xmin  |  xmax  | id | status  |
-------+--------+--------+----+---------+
 (0,1) | 439790 | 439792 | 36 | pending | -- dead: xmax set, deleted by transaction 439792
 (0,2) | 439795 | 439797 | 37 | pending | -- dead
 (0,3) | 439800 | 439803 | 38 | pending | -- dead
 (0,4) | 439804 | 439806 | 39 | pending | -- dead
 (0,5) | 439808 | 439812 | 40 | pending | -- dead
 (0,6) | 439814 | 439818 | 41 | pending | -- dead
 (0,7) | 439821 |      0 | 42 | pending | -- live: xmax is 0, not deleted
 (0,8) | 439825 |      0 | 43 | pending | -- live
 (0,9) | 439830 |      0 | 44 | pending | -- live
(6 dead tuples + 3 live rows scanned, 3 rows returned)

That story is not limited to the heap. Any index on the table keeps leaf entries in sorted order and each entry references a ctid on the heap. A scan in index order follows those pointers and checks the heap. There is wasted work in that scan when any leaf entry still exists when its heap tuple is dead. Conceptually (a worst case when cleanup has not yet removed those pointers):-- Conceptual view: index leaf entries visited in run_at order (not real query output)
  run_at (pending)    | tid   | after heap lookup
----------------------+-------+--------------------
 2026-04-07 08:59:01  | (0,1) | dead — discarded
 2026-04-07 08:59:03  | (0,2) | dead
 2026-04-07 08:59:05  | (0,3) | dead
 2026-04-07 08:59:07  | (0,4) | dead
 2026-04-07 08:59:09  | (0,5) | dead
 2026-04-07 08:59:11  | (0,6) | dead
 2026-04-07 09:01:12  | (0,7) | live
 2026-04-07 09:01:14  | (0,8) | live
 2026-04-07 09:01:15  | (0,9) | live
(6 dead index targets + 3 live rows reachable, same shape as the heap walk above)

At our imaginary scale, three jobs and six dead tuples are no problem.
However, a database is destined to fail if it cannot reclaim dead tuples faster than its workload creates them. A well-tuned and provisioned Postgres cluster can handle job queue throughput of tens of thousands of jobs per second. So what causes table bloat?
Typically, this happens when high write churn — the rapid cycle of inserting, updating, and deleting rows — outpaces autovacuum. But autovacuum falling behind isn't just a question of throughput. Even when autovacuum runs frequently enough, it cannot remove dead tuples that might still be visible to an active transaction.
When autovacuum sucks
There are a few common situations that make autovacuum ineffective at cleaning up dead tuples.
Certain table locks can prevent cleanup, and improper autovacuum configuration can contribute to a suboptimal cleanup rate for dead tuples.
In a healthy database, autovacuum will run regularly and clean up dead tuples as they become visible to it.
Most commonly, however, cleanup is blocked when active transactions prevent dead tuples from becoming reclaimable. Postgres will not vacuum away any dead tuple that might still be visible to an active transaction. The oldest such transaction sets the cutoff—referred to as the "MVCC horizon." Until that transaction completes, every dead tuple newer than its snapshot is retained.
A single transaction that takes 2 minutes to complete pins the horizon for the full 2 minutes.
Another type of workload that produces the same failure mode is multiple overlapping queries, none individually long-running, that keep the horizon pinned continuously.
For example, imagine three analytics queries, each running for 40 seconds, staggered 20 seconds apart. No individual query would trigger a timeout for running too long. But because one is always active, the horizon never advances, and the effect on vacuum is the same as one transaction that never ends.
This is unlikely if the only workload your database has is your job queue. But you're following the "just use Postgres" philosophy. So, you have many overlapping workloads, each with its own priorities, that rely on staying out of each other's way. Your problem is not that Postgres is a bad fit for a job queue or that it can't complete jobs fast enough—it's that these fast jobs and the dead tuples they rapidly accumulate aren't being cleaned up fast enough because of other concurrently running, overlapping slower queries.
Tools at our disposal
Over several years and major versions of Postgres, new tools have been added to simplify the maintenance of queue performance.
As mentioned, you may attempt to tune Postgres' autovacuum settings, such as autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, to improve the frequency and effectiveness of the operation. But in our imagined scenario, it's not the job queue's throughput we wish to fix; it's how other workloads negatively affect it.
To prevent long-running queries from running too long, there are several timeout configuration options:
statement_timeout, introduced in Postgres 7.3, kills any individual SQL statement that exceeds the specified duration.
idle_in_transaction_session_timeout, from Postgres 9.6, terminates sessions that have been idle inside an open transaction for longer than the specified duration.
transaction_timeout, from Postgres 17.0, kills any active or inactive transaction that exceeds the specified duration.
However, none of these solves our specific problem. They're blunt instruments that only target the execution time of a single query and cannot limit concurrency or execution cost. We need to prevent any workload that keeps the MVCC horizon continuously pinned.
What's needed is a tool that can distinguish among different "classes" of traffic, leave high-priority workloads unaffected, and throttle the rate at which lower-priority workloads consume resources.
Enter Database Traffic Control™
Traffic Control is part of the Insights extension, developed by PlanetScale and exclusively available for PlanetScale Postgres. Perfect for when you need fine-grained control over how individual queries perform and how many resources they can consume.
Queries targeted by a Resource Budget in Traffic Control are assigned a limited set of resources; once that limit is exceeded, they can be blocked.
The solution to our imagined scenario is to limit how often overlapping slower queries can run and how many can run at once. Timeouts are a blunt instrument that can't give us that kind of granular control. With those queries capped, we can be assured autovacuum will be more likely to clean up dead tuples at an acceptable rate.
Since the solution to our problem involves terminating certain queries, it is critical that our application includes retry logic. It wouldn't be fair to say the database runs better if it does less work. We're trying to smooth out the rate at which work is performed while still doing the same amount.
In our application, blocked queries aren't rejected forever; they are retried at a more appropriate time.
Building a demo
The inspiration for this post came from an internal discussion on the wisdom of putting job queues in your Postgres database, in which the following blog post was shared.
In 2015, Brandur Leach published Postgres Job Queues & Failure By MVCC, documenting a catastrophic failure mode in Postgres-backed job queues. That blog post also includes a test bench to demonstrate how an unclosed transaction can pin the MVCC horizon and prevent cleanup.
Fortunately, the original test bench is still available at brandur/que-degradation-test, and so to test everything we've learned so far, we can use it as inspiration to validate our solution.
Recreating the problem
A lot has changed since 2015. My intention was to recreate the same application workloads with Postgres 18 and see if I could reproduce the same problem.
The original test bench requires Ruby, the Que gem (v0.x), and it was tested on Postgres 9.4. Running it as-is would test a decade-old library on modern Postgres, not the pattern on modern Postgres. To isolate the SQL-level behavior in a codebase I could understand, I rewrote the test in TypeScript and with Bun.
In short, we maintained the same recursive CTE pattern as Que. With the same schema, producer rate, work duration, worker count, and long-runner pattern. Running on a PlanetScale PS-5 cluster (which starts at $5/month).
The outcome was visible, but manageable, degradation. While the original test put the database into a death spiral within 15 minutes, my PS-5 kept the worker queue near zero for the same duration. However, there was still notable, linear growth in dead tuples, suggesting that on a longer timescale, the same problem would have been encountered. So while the original problem is mitigated in newer versions of Postgres (thanks in part to B-tree index cleanup—bottom-up deletion for version churn, scan-driven removal of dead index tuples, and related behavior), it has not been eliminated.
Attempts to fix it
Next, I wondered if newer versions of Postgres have improved performance; can we solve the original problem? There are two specific improvements available to us in 2026 that weren't there in 2015.
FOR UPDATE SKIP LOCKED replaces the recursive CTE entirely with a single SELECT that skips rows locked by other workers.
Batch processing (10 jobs per transaction) one lock acquisition covers 10 jobs instead of 1, amortizing the index scan cost.
We kept everything else identical: 8 workers, 50 jobs/sec producer, 10ms work, long-runner starts after 45s. The results looked like this:
Metric
original (recursive CTE)
enhanced (SKIP LOCKED + batch)
Baseline lock time
2-3ms
1.3-3.0ms
End lock time (typical)
10-34ms
9-29ms
Worst spike
84.5ms (at 33k dead tuples)
180ms (at 24k dead tuples)
Queue depth
0-100 (oscillating)
0 (mostly)
Dead tuples at end
42,400
42,450
Throughput
~89/s
~50/s
The degradation curves are almost identical. These updates did not affect MVCC degradation, as both approaches scan the same B-tree index and encounter the same dead tuples.
The major improvement is the throughput difference, but this reflects the test's design, not the lock strategy. At 50 jobs/sec production, the CTE workers each grab jobs independently and outpace the producer, while the batched workers drain the queue and spend time in backoff sleep. Neither version was under real pressure.
In summary, a Postgres-backed queue designed a decade ago that could kill a database in 15 minutes can now survive longer, but the original problem remains. Modern Postgres has lifted the floor but not removed the ceiling. If instead of running 50 jobs/sec, we ran 500 jobs/sec, the same problem occurs faster, performance degrades, and your application suffers.
Fixing with Traffic Control
Resource Budgets in Traffic Control give us a few levers to govern how many resources a targeted query has access to:
Server share and burst limit: A percentage of server resources and how quickly they can be consumed.
Per-query limit: The time a query can run, measured in seconds of full server usage.
Maximum concurrent workers: A percentage of available worker processes.
Resource Budgets are configured to use one or more of these limits to prevent specific workloads from consuming resources, which would otherwise negatively affect other workloads.
Queries are targeted most commonly by metadata included in an SQLCommenter tag appended to the query. For our example, the analytics queries had action=analytics set.
Since idle_in_transaction_session_timeout can catch and kill the "long-runner" idle transaction from the original benchmark, I switched the degradation trigger to the more realistic production scenario: multiple overlapping analytics queries that hold transactions open through active work — the kind you can't just kill with a session timeout.
To demonstrate Traffic Control's effectiveness at curbing this degradation, I throttled the Maximum concurrent workers of all action=analytics queries to 1 worker (25% of max_worker_processes) with the intention of only ever allowing a single analytics query to run at a time.
To stress the system enough to produce a death spiral within our 15-minute test window, I increased production to 800 jobs/sec.
I ran the "enhanced" workload twice on the same EC2 instance against the same PlanetScale database:
800 jobs/sec
3 concurrent analytics workers running 120-second queries, staggered so they overlap continuously
15-minute duration
The results demonstrated the ability to solve the core cleanup problem.
Metric
Traffic Control disabled
Traffic Control enabled
Queue backlog
155,000 jobs
0 jobs
Lock time
300ms+
2ms
Dead tuples at end
383,000
0–23,000 (cycling)
Analytics queries
3 concurrent, overlapping
1 at a time, 2 retrying
VACUUM effectiveness
Blocked (horizon always pinned)
Normal (windows between queries)
Outcome
Death spiral
Completely stable
Traffic Control was able to target specific workloads and limit their concurrency — something not possible with autovacuum configuration tuning or timeouts. The analytics reports are still executed as capacity allows, with 15 completed over the 15-minute window. It takes longer to complete more analytics queries, but the queue remains healthy throughout.
Summary
The MVCC dead tuple problem in Postgres-backed queues is not a relic of 2015. Modern Postgres has raised the threshold — B-tree improvements and SKIP LOCKED buy significant headroom — but the underlying mechanism is unchanged. Dead tuples accumulate when VACUUM cannot clean them, and VACUUM cannot clean them when long-running or overlapping transactions pin the MVCC horizon.
In a "just use Postgres" world where queues, analytics, and application logic share a single database, this is not a theoretical risk. It is the normal operating condition. The dangerous version isn't a dramatic crash — it's a quietly degraded equilibrium where lock times creep up, jobs slow down, and no alert fires.
Postgres provides timeout-based tools, but they can't distinguish between workload classes or limit concurrency. If you run queues alongside other workloads, the most impactful thing you can do is ensure VACUUM can keep up. Traffic Control makes that simple.]]></content>
        <summary><![CDATA[Dead tuples from high-churn job queues can silently degrade your Postgres database when vacuum falls behind—especially alongside competing workloads. Traffic Control keeps cleanup on track.]]></summary>
      </entry>
    
      <entry>
        <title>Patterns for Postgres Traffic Control</title>
        <link href="https://planetscale.com/blog/patterns-for-postgres-traffic-control" />
        <id>https://planetscale.com/blog/patterns-for-postgres-traffic-control</id>
        <published>2026-04-02T00:00:00.000Z</published>
        <updated>2026-04-02T00:00:00.000Z</updated>
        
        <author>
          <name>Josh Brown</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Last week we introducedDatabase Traffic Control™. TrafficControl lets you attach resource budgets to slices of your Postgres traffic,like keeping your checkout flow running while a runaway analytics query getsshed instead. We have already discussed some scenarios where youshould use Traffic Control, along with how to define resource limits, so nowlet's dig into what Traffic Control looks like in your codebase.

This post walks through some practical patterns in Go. Each pattern targets adifferent failure mode, architecture, or foot gun. Most of them layer on top ofone another too, so you can adopt them individually or combine them for extrapeace of mind. Keep in mind the general concepts here are applicable to whateverlanguage your application is written in.
Setup
Most of the patterns here rely on custom tags attached to your queries. TrafficControl reads these using theSQLCommenter format: a SQL commentappended to each query with URL-encoded key=value pairs.SELECT * FROM orders 
  WHERE user_id = $1
 /*route='checkout',feature='new_order_flow'*/;

These tags are then available for new Traffic Control rules.

Here's a minimal Go helper that appends tags in this format:import (
    "fmt"
    "net/url"
    "sort"
    "strings"
)

// appendTags appends SQLCommenter-format tags to a SQL query.
func appendTags(query string, tags map[string]string) string {
    if len(tags) == 0 {
        return query
    }
    parts := make([]string, 0, len(tags))
    for k, v := range tags {
        parts = append(parts, fmt.Sprintf("%s='%s'", k, url.QueryEscape(v)))
    }
    sort.Strings(parts) // deterministic order
    return query + " /*" + strings.Join(parts, ",") + "*/"
}

You'll also want a way to thread tags through your call stack without touchingevery function signature. A context key works well for this:type contextKey string

const sqlTagsKey contextKey = "sql_tags"

func tagsFromContext(ctx context.Context) map[string]string {
    if tags, ok := ctx.Value(sqlTagsKey).(map[string]string); ok {
        // return a copy so callers can't mutate shared state
        out := make(map[string]string, len(tags))
        for k, v := range tags {
            out[k] = v
        }
        return out
    }
    return make(map[string]string)
}

func contextWithTags(ctx context.Context, tags map[string]string) context.Context {
    return context.WithValue(ctx, sqlTagsKey, tags)
}

With these two helpers in place, the patterns below mostly just set keys andvalues in context. Tagging happens automatically when the query executes.
Per-service isolation via Roles
In a microservice architecture, a single misbehaving service should not be ableto degrade every other service sharing the same database. The simplest way toisolate a service is to create a Traffic Control rule based on a uniqueconnection string for the given service, or via application name.
A budget on username='pscale_api_123abc' will isolate all traffic from thatrole. This also helps in incident response: you can immediately cap a service'sresource share without redeploying anything.
Note that the username is the internal Postgres username of the role, not thedashboard role name. You can also target custom roles created by CREATE ROLEif your microservices have strict security over table permissions.
You can also use the application_name by appending it to your connectionstrings such as postgresql://other@localhost/otherdb?application_name=myapp.
Route-level tagging in an HTTP service
When you're running a monolith or a large API service, the problem isn't usuallythe whole service, it's specific routes. The /api/export endpoint thatgenerates CSV reports should not be able to kill the /api/checkout flow.
An HTTP middleware can inject the route into context at runtime before anyhandler runs:// Any route using SQLTagMiddleware will have the pattern injected into its context
// dynamically at runtime
func SQLTagMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        tags := tagsFromContext(r.Context())
        route := strings.ReplaceAll(strings.ReplaceAll(r.Pattern, "{", ":"), "}", ":") // Removes "{}" characters from route
        tags["route"] = route
        tags["app"] = "web"
        ctx := contextWithTags(r.Context(), tags)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

Wrap your database calls to pick up the tags automatically:// QueryContext for SELECT statements
func (db *DB) QueryContext(ctx context.Context, query string, args ...any) (*sql.Rows, error) {
    return db.sql.QueryContext(ctx, appendTags(query, tagsFromContext(ctx)), args...)
}

// ExecContext for INSERT/UPDATE statements
func (db *DB) ExecContext(ctx context.Context, query string, args ...any) (sql.Result, error) {
    return db.sql.ExecContext(ctx, appendTags(query, tagsFromContext(ctx)), args...)
}

Now every query carries the route it came from. You can create a Traffic Controlbudget targeting route='/api-export' and give it a conservative CPU limit.
This also makes it easy to set up broad budgets during incidents. If yousuddenly see a spike and don't know which route is responsible, the violationgraph in Traffic Control will show you exactly which route tag is hittinglimits.
Feature flags and new deployments
Shipping a new feature to production always carries risk. Maybe the new querypattern is fine under your test load but becomes expensive at scale. TrafficControl gives you a way to cap the blast radius before it becomes an incident.
The simplest version sets a tag from an environment variable at startup:var deploymentTag = os.Getenv("DEPLOYMENT_TAG") // e.g. "new_checkout_v2" or git sha "96e350426"

func tagWithDeployment(ctx context.Context) context.Context {
    if deploymentTag == "" {
        return ctx
    }
    tags := tagsFromContext(ctx)
    tags["feature"] = deploymentTag
    return contextWithTags(ctx, tags)
}

Set DEPLOYMENT_TAG=new_checkout_v2 when rolling out new pods and leave itunset on the old pods. Traffic Control can then have a budget onfeature='new_checkout_v2' in Warn mode from day one, so you see exactly howthe new code behaves before it causes problems. When you're confident, eitherremove the budget or switch it to Enforce as a safety net.
For feature flags controlled at runtime, the same approach works but driven byyour flag evaluation:func (h *OrderHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    if flags.Enabled(ctx, "new_order_flow") {
        tags := tagsFromContext(ctx)
        tags["feature"] = "new_order_flow"
        ctx = contextWithTags(ctx, tags)
    }
    h.processOrder(ctx, w, r)
}

Tier-based limits in multi-tenant apps
In a SaaS application, free-tier users should not be able to degrade theexperience for enterprise customers. Traffic Control lets you enforce this atthe database level rather than just at the application layer.
Inject the user's subscription tier into the SQL tags early in your requesthandling — ideally right after you've resolved the authenticated user:type Tier string

const (
    TierFree       Tier = "FREE"
    TierPro        Tier = "PRO"
    TierEnterprise Tier = "ENTERPRISE"
)

func WithUserTier(ctx context.Context, tier Tier) context.Context {
    tags := tagsFromContext(ctx)
    tags["tier"] = string(tier)
    return contextWithTags(ctx, tags)
}

In your authentication middleware:func AuthMiddleware(users *UserService, next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        user, err := users.Authenticate(r)
        if err != nil {
            http.Error(w, "unauthorized", http.StatusUnauthorized)
            return
        }
        ctx := WithUserTier(r.Context(), user.Tier)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

With this in place, create two Traffic Control budgets:
tier='free' — conservative limits on server share and max concurrent queries
tier='pro' — moderate limits
Leave enterprise traffic unbudgeted or give it a high budget as a ceiling. Whena free-tier user runs an expensive dashboard or triggers a slow query, thebudget sheds that traffic before it touches enterprise workloads.
You can combine this with the route tag from Pattern 2. A budget matchingtier='free' AND route='api-export' can be stricter than a budget ontier='free' alone. Enterprise export requests get more headroom than free-tierexport requests.
Background jobs and scripts
Background jobs are a common cause of database incidents. A migration script, anightly sync, or a one-off data backfill can all accidentally saturate yourdatabase if they run faster than expected. Traffic Control is a clean way togive these jobs a resource ceiling without having to tune query-level timeoutsthroughout your codebase.
For long-running background workers, use a dedicated connection pool with adistinct application_name:func newJobDB(dsn string) (*sql.DB, error) {
    jobDSN, err := url.Parse(dsn) // your connection string
    if err != nil {
        return nil, err
    }
    q := jobDSN.Query()
    // This sets the application name in code instead of in the connection string env variable.
    q.Set("application_name", "background-jobs")
    jobDSN.RawQuery = q.Encode()

    db, err := sql.Open("pgx", jobDSN.String()) // connects to Postgres
    if err != nil {
        return nil, err
    }
    db.SetMaxOpenConns(4) // Jobs don't need high concurrency
    return db, nil
}

newJobDB takes the DSN of your database and sets application_name tobackground-jobs before connecting. Once connected we set the max connectionsto 4 to make sure our background job isn't taking up more workers than itshould, and finally we return it so that the calling function can now query thedatabase.
Setting application_name on the connection string level in code ensures thatit is always set for this service, no matter the query or connection stringgiven. You can pair this with SQL comments as described above for even morefine-grained control and insights into your queries.
For one-off scripts and migrations we can do something similar. Here we encodethe script's identity directly in the connection string so it shows up clearlyin Traffic Control and Insights:// Returns a database instance with the `application_name` set to `script-[scriptName]`
// for use in scripts
func scriptDB(dsn, scriptName string) (*sql.DB, error) {
    u, _ := url.Parse(dsn)
    q := u.Query()
    q.Set("application_name", "script-"+scriptName) // e.g. "script-backfill-order-totals"
    u.RawQuery = q.Encode()
    return sql.Open("pgx", u.String())
}

Create a Traffic Control budget for application_name='background-jobs' in Warnmode before you run this job next. Observe how much of the database's resourcesyour background work typically consumes. Then switch to Enforce to cap it at alevel where it can't crowd out interactive traffic even if a job goes sideways.
Handling blocked queries
When Traffic Control is in Enforce mode and a query exceeds its budget, Postgresreturns SQLSTATE 53000 with an error message prefixed with[PGINSIGHTS] Traffic Control:. Your application needs to handle this withoutcrashing.
With pgx/v5:import (
    "errors"
    "github.com/jackc/pgx/v5/pgconn"
)

const sqlstateTrafficControl = "53000"

func isTrafficControlError(err error) bool {
    var pgErr *pgconn.PgError
    return errors.As(err, &pgErr) && pgErr.Code == sqlstateTrafficControl
}

The right response depends on the query's role in your application:func (s *OrderService) GetUserOrders(ctx context.Context, userID int64) ([]Order, error) {
    rows, err := s.db.QueryContext(ctx, `SELECT id, total FROM orders WHERE user_id = $1`, userID)
    if err != nil {
        if isTrafficControlError(err) {
            // Return a degraded response rather than a 500
            return nil, ErrServiceUnavailable
        }
        return nil, err
    }
    defer rows.Close()
    // ...
}

For non-critical workloads like analytics or reporting, returning a503 Service Unavailable or a cached result is most likely the right behavior.That's exactly the controlled failure mode Traffic Control is designed tocreate. For more critical paths, you may want a short retry with backoff:func queryWithBackoff(ctx context.Context, db *sql.DB, query string, args ...any) (*sql.Rows, error) {
    const maxRetries = 3
    backoff := 100 * time.Millisecond

    for attempt := range maxRetries {
        rows, err := db.QueryContext(ctx, query, args...)
        if err == nil {
            return rows, nil
        }
        if !isTrafficControlError(err) || attempt == maxRetries-1 {
            return nil, err
        }
        select {
        case <-time.After(backoff):
            backoff *= 2
        case <-ctx.Done():
            return nil, ctx.Err()
        }
    }
    return nil, errors.New("overloaded")
}

Observing warn-mode notices
Before switching a budget to Enforce, you'll run it in Warn mode. In Warn mode,queries succeed but the driver receives a Postgres notice containing[PGINSIGHTS] Traffic Control:. With pgx/v5 you can log these notices tobuild an accurate picture of what would be blocked:import "github.com/jackc/pgx/v5"

config, err := pgx.ParseConfig(dsn)
if err != nil {
    log.Fatal(err)
}

config.OnNotice = func(c *pgconn.PgConn, notice *pgconn.Notice) {
    if strings.Contains(notice.Message, "[PGINSIGHTS] Traffic Control:") {
        log.Printf("traffic control warning: %s", notice.Message)
        // Increment a metric, write to a structured log, etc.
    }
}

Collect these logs for a few hours of representative traffic before switching toEnforce. The pattern of which rules fire and how often tells you whether yourlimits need adjustment.
Putting it together
These patterns compose. A real application might layer several of them:func (s *Server) setupMiddleware() http.Handler {
    mux := http.NewServeMux()
    // register routes...

    var handler http.Handler = mux
    handler = SQLTagMiddleware(handler)   // Pattern 2: route tags
    handler = AuthMiddleware(s.users, handler) // Pattern 4: tier tags
    return handler
}

// At startup, the job worker uses Pattern 5: Background jobs
jobDB, _ := newJobDB(dsn)

// New features use Pattern 3:
// DEPLOYMENT_TAG=new_checkout_v2 set in the deployment manifest

Traffic Control sees all of this as a combination of tags. A budget ontier='free' covers all free-tier traffic regardless of route. A budget onroute='api-export' AND tier='free' covers a specific combination. Multiplematching budgets all apply simultaneously and queries must satisfy every budgetthey match. You can build layered policies without complicated rule logic.
Start in Warn mode, observe which budgets would fire during normal load, tightenthe limits until only pathological cases trigger violations, then switch toEnforce. Thegetting started guide walksthrough this rollout process in detail.
The difference between a database outage and a degraded experience often comesdown to whether you've decided in advance which traffic to shed. Traffic Controlmakes that decision explicit and configurable instead of leaving it to whicheverquery happens to win a resource race.]]></content>
        <summary><![CDATA[Practical patterns for leveraging Database traffic Control]]></summary>
      </entry>
    
      <entry>
        <title>Graceful degradation in Postgres</title>
        <link href="https://planetscale.com/blog/graceful-degradation-in-postgres" />
        <id>https://planetscale.com/blog/graceful-degradation-in-postgres</id>
        <published>2026-03-31T00:00:00.000Z</published>
        <updated>2026-03-31T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Not all traffic is created equal.When a database is overwhelmed, you want the important queries to keep executing, even if that means shedding lower-priority work.This is a much better outcome than the alternative: a total database outage.
PlanetScale's Traffic Control makes this feasible at the database level by introducing resource budgets.These let you apply strict limits on slices of traffic, protecting resources for high-priority queries even when there's a surge in requests.

I'll run through exactly how this works in the scenario of running a social media platform.The same principles apply to any application with a wide variety of traffic types.
The scenario
Operating a social media platform involves quite a few different types of queries, each corresponding to different app features:
Authentication
Fetching post content
Fetching user profile information
Submitting new posts
Fetching + updating like, impression, and bookmark counts
Commenting
Loading trending topics
Direct messaging
Under normal load, a well-designed system allows every query to complete quickly.There is no need to prioritize one type of query or feature over another.But when a viral event, bad deployment, or DDoS attack introduces a load spike, these queries start competing for the same finite pool of database resources.Every query has an equal shot at consuming CPU and I/O, which means a flood of impression-count queries can starve the ones that users care most about, like authenticating and loading their timeline.
From the user's perspective, such database issues can make the application completely unusable.They can't read posts, they can't navigate, and they leave.Had you instead just stopped serving lower-priority components of the app for a few minutes, like impression counts and notifications, users would barely notice and stay on your platform.
Categorizing your traffic
To solve this with Traffic Control, the first step is to categorize and prioritize all Postgres traffic.
Critical: The app is broken without these. Authentication, post creation, post fetching, author profiles.If these fail, users have nothing to do on your application and will leave immediately.
Important: Noticeable if missing, but the app is still usable. Comments, post search, direct messaging (oh hello 𝕏.com).
Best-effort: Nice to have. Like, impression, and bookmark counts, trending topics, notifications, analytics dashboard. Users can still use the platform fine even if these features are degraded.
Your tiers will look different depending on your application.The point is to identify what you're willing to shed under pressure so that the things that matter most keep working.
Tagging queries with sqlcommenter
Now that we know the priority of our application's product features, we need a way to identify these in our database queries.Fortunately, a standard exists for sending metadata along with queries: sqlcommenter.This standard allows clients to append a comment with key-value pairs to SQL statements.
For example, a query tagged with a priority tier looks like:SELECT body, author_id, created_at FROM posts WHERE id = $1
/* category='viewPost', priority='critical' */

And a lower-priority query:SELECT COUNT(*) FROM impressions WHERE post_id = $1
/* category='postStats', priority='bestEffort' */

You are in full control of the keys and values.Here, we include a category and priority for every SQL query.PlanetScale Postgres keeps track of the tags that you put on your queries and allows you to both filter query history and budget your traffic based on them.
Setting up resource budgets
With your queries tagged, you create budgets in Traffic Control.We have a lot of flexibility for budget setup.We could create per-category budgets, in which case we'd end up with a dozen or so budgets, being able to tune each individually.

Another option is to go more coarse, creating one budget for each of our three priority levels: critical, important, and bestEffort.

Running with the latter option, we create the following three budgets:
critical-budget: Apply this to all queries with the sqlcommenter key/value pair of priority='critical'.These queries are the most important, and therefore should be given the most relaxed budget.We will apply no Server share or Burst limit to these, but will apply a per-query maximum of 2 seconds to protect the database from rogue slow queries.

important-budget: Apply this to all queries with the sqlcommenter key/value pair of priority='important'. Set Server share to 25% with moderate max concurrent workers. This leaves plenty of room for comments and notifications under normal conditions, but some will be blocked when traffic is unexpectedly high. Start in warn mode to observe the traffic pattern, then switch to enforce once you're confident in the limits.
best-effort-budget: Apply this to all queries with the sqlcommenter key/value pair of priority='bestEffort'. Set Server share to 20% with a low max concurrent workers. Under normal load, this budget provides more than enough resource share for these lightweight queries. When this load spikes, some will be blocked. During such a spike, we can also make the call to dynamically decrease this percentage, right from the PlanetScale app. We could even completely shut off this traffic in favor of giving other queries more resources in an emergency.
Tuning over time
There's no need to get the tunings above perfect from day one.You can start every budget in warn mode.This will not kill any queries that exceed the budget.Rather, it will warn, and you can click into the budget to see how many queries are exceeding it over time.
Below is an example of a budget that is likely too restrictive.It has flagged thousands of requests as exceeding the budget over a 3 hour window.

Warnings are also returned directly in the query response from Postgres in the form [PGINSIGHTS] Traffic Control:.This means you can observe the impact of your budgets from within your application without any user-facing effects.It's a great way to measure real traffic against your proposed limits.
Before setting a budget to enforce, it's recommended to spend a few days tuning the limits up and down until you're in a comfortable spot.We want each budget to be able to fulfill all queries when under normal load, with headroom to take on load variation.

After they are in a good spot, you can either flip them all to enforce mode full-time, or only flip the modes ad-hoc when you encounter unexpected database load.
A crisis event
Revisiting the scenario from earlier, but now with all of our budgets in place, what does an unexpected load spike look like?
First the viral event: A crazy news story or celebrity drama causes a sudden 10x increase in authentications, posts, likes, and as a side effect, notifications, impressions, and page loads.
Having well-established budgets helps keep the lower-priority traffic (like notifications and impression tracking) from starving the more important work.In this extreme scenario, we can click into the best-effort-budget and completely disable this traffic.
Changes to budgets happen live, so we would immediately see the impact of this.Users would temporarily stop receiving notifications and seeing impression counts in favor of still allowing them to authenticate and view posts.
The important-budget traffic is allowed up to 25% of the server share. This preserves a large portion of the server resources for serving the highest priority queries.
What could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality.
We've kept our users happy and avoided an application outage.All with a few clicks and PlanetScale's Database Traffic Control.]]></content>
        <summary><![CDATA[Not all traffic is created equal.When a database is overwhelmed, you want the important queries to keep executing, even if that means shedding lower-priority work.This is a much better outcome than the alternative: a total database outage.]]></summary>
      </entry>
    
      <entry>
        <title>High memory usage in Postgres is good, actually</title>
        <link href="https://planetscale.com/blog/high-memory-usage-in-postgres-is-good-actually" />
        <id>https://planetscale.com/blog/high-memory-usage-in-postgres-is-good-actually</id>
        <published>2026-03-30T00:00:00.000Z</published>
        <updated>2026-03-30T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Houseplants often die from over-watering, not neglect. It is easy to project human needs onto them: "If I am thirsty, they must be thirsty too." But many indoor plants actually benefit from drying out between waterings.
Similarly, your empathy can lead to misinterpreting signals from your database. You don't like feeling overwhelmed, so you don't want your database overwhelmed either.
But not all usage is created equal, and memory in computers can be uniquely complex to understand.

A look at your PlanetScale dashboard might show memory usage sitting at 80%. That looks bad, but it could actually be representative of a healthy system.
To be clear, consistently high CPU usage is a problem. For as long as CPU stays high, queries wait longer, the slowest queries get slower, and you have less headroom for spikes.
Memory is different. The percentage shown in the cluster diagram on your PlanetScale dashboard is measuring the entire node your database runs on, not just Postgres. When most RAM is in use, it usually means the system is keeping data close to the CPU so it does not have to read from disk as often. Unlike sustained high CPU, high memory usage by itself does not mean performance is degraded or that you are at immediate risk of "running out" of memory.
Why Postgres wants your memory
Reading from disk is slower than reading from RAM, even with PlanetScale Metal's locally attached NVMe drives. Postgres is designed to take advantage of that gap by caching as much data in memory as it can.
There are two layers of caching at work, and both consume RAM.
shared_buffers is Postgres' own buffer pool. When a query needs data, Postgres first checks this pool for the relevant pages, the fixed-size (8 KB by default) chunks of table and index data it works with, before reading from disk. The more of your working data that fits here, the fewer disk reads Postgres needs to perform.
This parameter can be configured in the cluster configuration page of the PlanetScale dashboard. The default value should be sufficient for most workloads, and modifying it should not be your first step in troubleshooting memory usage.
The OS page cache is the second caching layer. Even when Postgres does go to disk, the operating system keeps a copy of the data it reads in RAM so the next access is faster. This is not a Postgres feature — it is standard Linux behavior. Postgres was designed with this in mind, and its own documentation notes that the operating system's cache is expected to handle data beyond what fits in shared_buffers.
Between these two layers, a healthy Postgres server will use most of the available RAM. That is the goal, not a side effect. For context, reading a page from RAM is roughly 1,000 times faster than reading it from even a fast NVMe drive. A database that keeps frequently accessed data in memory avoids that penalty on every query.
When caching is working well, the vast majority of page reads are served from memory without touching disk. If that ratio drops — because the working dataset has outgrown available memory, for example — queries slow down as Postgres waits on disk more often.
See our documentation on "Normal operating ranges" to sense-check what values you should be seeing in Cluster Metrics for CPU, memory, and more.
Memory usage compared to CPU usage
At a glance, CPU and memory usage numbers look comparable because they share a 0–100% scale, but they describe very different behavior.
CPU is work. Sustained high CPU means the database is spending time on work it cannot skip. When CPU is saturated, queries arrive faster than they can be processed. They queue, latency climbs, and connection timeouts can cascade into application-level failures. There is no "good" kind of sustained high CPU usage.
Memory is workspace. Postgres and the OS use spare RAM to avoid expensive disk reads. Higher use improves performance ... most of the time.
"Most of the time" because memory usage gets a little complicated.
Two kinds of memory usage
The single “memory usage” percentage number combines two different behaviors.
To explore that number in more detail, within the Cluster Metrics page of the PlanetScale dashboard, memory is shown as a stacked chart over time with four different categories: active cache, inactive cache, RSS, and memory mapped. These four categories can be grouped into two separate but equally important use-cases: cache and process memory.

1. Cache (active, inactive, and memory mapped)
Much of what looks like “used” memory on a healthy database host is cache: file data the operating system keeps in RAM after reads so the next access is cheap. You may see this referred to as "page cache" in other dashboards.
Active cache is data the OS recently touched and wants to keep around. Inactive cache hasn't been accessed lately. Memory-mapped pages are cached pages that are backed by real files on disk.
All three of these cache types are reclaimable by the operating system and can be dropped when something else needs RAM.
If total memory is high because cache is high, good! Frequently accessed data stays near the CPU for faster access.
2. Process memory (RSS)
Separately, Postgres holds memory for processes that are actually using it. You will see this referred to as RSS (Resident Set Size) in the PlanetScale dashboard.
This memory is not reclaimable by the operating system and is what increases out of memory (OOM) risk. High memory usage through high RSS leads to restarts and degraded behavior.
If total memory is high because RSS is high, that is referred to as memory pressure and is a problem.
What is Resident Set Size?
Roughly, RSS is the amount of private memory allocated to a process such as stack, heap, catalog/relcache caches, query execution memory like sorts and hash tables.
Given Postgres' process-per-connection architecture, each process requires some baseline amount of memory. Not every process will consume the same amount of memory.
Further, some memory use is shared across processes. So calculating RSS use is not as simple as adding up the memory usage of every process.
RSS increases for a number of reasons:
Postgres may grant multiple work_mem allocations within a single query; see below for more details.
Catalog bloat can spike RSS usage, common in multi-tenant schemas using a table-per-tenant pattern.
The operating system's memory allocator may not return memory efficiently.
Misbehaving or misconfigured extensions can increase RSS usage.
Cached plans and prepared statements accumulate per-session memory that is not released until the session ends or the statement is explicitly deallocated.

The work_mem parameter's default value is set relative to the amount of memory in your database cluster. It can be modified in the cluster configuration page of the PlanetScale dashboard.
Tuning work_mem might seem like an obvious lever — decrease it to reduce RSS, or increase it to prevent operations from spilling to disk. But the allocation is per-sort/hash-node, per-query, per-backend.
A single complex query can allocate work_mem multiple times, and that multiplies across every active connection. Setting it too low forces more disk I/O; setting it too high globally can cause total memory usage to spike unpredictably under load. Neither direction is a safe default change without first understanding your workload's concurrency and query complexity.
Efficient connection pooling can be the best way to reduce RSS usage. Fewer active connections result in fewer copies of all that per-process overhead.
PgBouncer on PlanetScale runs in transaction mode, where connections are returned to the pool after each transaction completes. See our blog post on Scaling Postgres connections with PgBouncer for more details.
Investigating memory usage while debugging performance
If you're experiencing degraded performance, the challenge is figuring out what drove the RSS growth.

Query Insights helps you investigate query performance through CPU time, I/O, and latency, but it does not show per-query memory. You may see OOM markers and slow-query signals, but not query-specific RSS usage.
RSS is a per-process metric, not a per-query metric. That means you cannot read “RSS per query” directly from EXPLAIN or Query Insights. Instead, you may need to gather multiple signals and triangulate:
Use Cluster Metrics to identify when RSS rises.
In Query Insights for that same window, look for expensive patterns (high runtime, CPU, I/O, rows/blocks read) and OOM-adjacent activity.
Re-run suspect queries with EXPLAIN (ANALYZE, BUFFERS, MEMORY) to inspect operator-level memory usage.
Check connection counts in the same window, because many concurrent connection processes can increase RSS even when a single query is moderate.
The out of memory documentation has more details on the likely causes of, and how to prevent, OOM events.
In summary
A lot of cached data in memory is a good thing. Ideally, your "hot dataset" fits in the page cache of your database cluster to maintain fast performance. Too little cached data can lead to increased CPU usage and degraded performance.
High memory usage is not automatically bad. If your high memory usage is due to cache, you typically have a healthy, performant database.
Memory pressure is bad. Rising RSS toward limits, OOM kills, unexplained restarts, and tail latency spiking together with heavy disk I/O when the working set is tight on RAM are the signals to act on.
Sustained high CPU is a problem. It means you are out of headroom. Tune the workload (see Query Insights) or upgrade.
If the dashboard shows a high “% memory used,” do not panic. Investigate the types of memory being used and check for OOM events before taking action.]]></content>
        <summary><![CDATA[A high memory percentage in PlanetScale Postgres is not necessarily a problem. Let's compare how memory and CPU usage are different, how not all memory usage is created equal, and which signals actually require attention.]]></summary>
      </entry>
    
      <entry>
        <title>Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI</title>
        <link href="https://planetscale.com/blog/planetscale-stripe-projects-partnership" />
        <id>https://planetscale.com/blog/planetscale-stripe-projects-partnership</id>
        <published>2026-03-26T00:00:00.000Z</published>
        <updated>2026-03-26T00:00:00.000Z</updated>
        
        <author>
          <name>Elom Gomez</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We're excited to announce that PlanetScale is participating as a co-design and launch partner for Stripe Projects, a new developer preview from Stripe that centralizes dev tool provisioning and billing in one place.
What is Stripe Projects?
Stripe Projects is a new way for developers and coding agents to discover, provision, and pay for developer tools all from the Stripe CLI. Instead of jumping between dashboards, entering payment info, and copying credentials across services, everything lives in one centralized workflow.
This fragmented developer workflow has always existed, but AI agents have made the gap much more obvious. The ecosystem has been missing a standard way for provisioning and credential handoff to work reliably across providers. And we're excited to partner with Stripe to close this gap.
With PlanetScale as a launch partner, you can now spin up and pay for fully managed MySQL or Postgres databases directly from your terminal in seconds.
Try it out today
Stripe Projects is currently in developer preview. You can request early access here. Once you're in, follow these instructions to spin up a PlanetScale Postgres or MySQL database:
Install the Stripe CLI
Install the Projects plugin: stripe plugin install projects
Initialize Stripe Projects in your app stripe projects init
Add a PlanetScale database: stripe projects add planetscale/postgresql or stripe projects add planetscale/mysql
Go through the prompts to create your database: database name, cluster size, region, and number of replicas
Within seconds, your PlanetScale Postgres or MySQL database is provisioned without you ever leaving the terminal
Sync your database credentials to your .env file: stripe projects env --sync
Resources and feedback
You can start using PlanetScale with Stripe Projects in the Stripe Marketplace. Or, head to the Stripe Projects documentation to learn more.
We'd love to hear how you're using PlanetScale with Stripe Projects. Join our Discord to let us know or reach out to us on X!]]></content>
        <summary><![CDATA[PlanetScale is a co-design and launch partner for the Stripe Projects developer preview, allowing you or your coding agents to provision and manage databases and other dev tools directly from the Stripe CLI.]]></summary>
      </entry>
    
      <entry>
        <title>Enhanced tagging in Postgres Query Insights</title>
        <link href="https://planetscale.com/blog/enhanced-tagging-in-postgres-query-insights" />
        <id>https://planetscale.com/blog/enhanced-tagging-in-postgres-query-insights</id>
        <published>2026-03-24T00:00:00.000Z</published>
        <updated>2026-03-24T00:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[As part of yesterday's Traffic Control launch, we made enhancements to the Insights query tagging feature for Postgres databases. Insights has supported query tags for some time, but they were previously only attached as metadata on individual notable query logs. With yesterday's release, tags are now present in aggregated query data, which enables powerful new capabilities. It's now possible to view the complete distribution of tags assigned to a query pattern, search queries by tag, and see a per-tag breakdown of database-level statistics. This blog post gives an overview of the feature, and digs into the details of how we implemented it.
Adding Tags
Query tags are string key-value pairs that are included in query SQL using specially formatted SQL comments. For example, the following query has the controller and action tags attached.select * from users 
  where id = 1 
  /* controller='users',
     action='show' */;

Typically tags are specified at the application level and applied automatically to all queries issued by the database framework you're using. Common examples are controller, action, job, or source_location.
In addition to tags set by the database client, Insights automatically adds the following tags to all queries:
application_name - set by the Postgres driver
username - the Postgres user executing the query
remote_address - the remote IP address
Feature Overview
This feature introduces three new surfaces where tag information can be seen.
Query Pattern Tags
To see the set of tags associated with a given query pattern, click on a query pattern from the main Insights dashboard. This page lists the tags that have been submitted with a given query pattern over a particular time range, as well as the percentage of queries that included each tag value.

Database Tags
To see aggregate statistics for your entire database broken down by tag, go to the Tags section in the Insights sidebar and select the tag or set of tags that you want to view.

Query Filter
To see a list of query patterns that have a given tag value, go to the Insights dashboard and search for a particular tag with tag:MY_TAG:MY_VALUE. The returned query patterns and statistics are filtered to only queries with the specified tag pair.

Implementation
To understand how tagging works in Insights, it helps to understand the underlying data sources that power Insights. Query performance data is observed by the Insights Postgres extension, emitted to Kafka and written to ClickHouse. The extension publishes to two separate Kafka topics:
Individual queries - any query reading more than 10,000 rows, taking longer than 1 second, or resulting in an error. One message is sent per qualifying query. This powers the Notable queries feature.
Aggregate summaries - statistics like total query count, rows read, and cumulative query time. One message is sent for every query pattern every 15 seconds. This powers the majority of Insights including the query table, anomalies, and all query-related graphs.
Prior to this release, tag data was only attached to the individual query data stream. This adds important information to notable queries, but because the data wasn't present in the aggregate summaries, it wasn't possible to filter or group aggregate data by tag. Insights couldn't answer important questions like:
What queries has this user executed?
What percentage of my total query run time is coming from this controller?
Which background jobs are executing this query?
Our goal with this release was to associate all query data with the relevant tags to make it possible to answer this class of questions.
Sending Tags
To explore the various approaches for implementing tags, let's use the following query executions as an example:select * from users where id = 1 /*controller='users'*/;
select * from users where id = 2 /*controller='sessions'*/;
select * from users where id = 3 /*controller='sessions'*/;

Since each of these queries has the same fingerprint (query with all literal values removed), without tags we would only need to send a single summary message. To include tags, we have several options. The first would be to continue sending only a single query summary event with a count of how many times each tag was observed. This would produce a summary message like the following (other stats fields are omitted):{
	sql: "select * from users where id = ?",
	query_count: 3,
	total_time: "100ms",
	tags: {"controller=users": 1, "controller=sessions": 2}
}

This message tells us the given query was executed three times - twice from the sessions controller and once from the users controller - and had a cumulative execution time of 100ms.
At first glance, including tags in this manner is an attractive option. It's simple to implement - we just accumulate tags along with the other aggregate stats - and it doesn't increase the number of events that need to be emitted and stored. It has a serious shortcoming, however: it's not possible to attribute aggregated stats to any individual tag. For example, it's not possible to know the total time of queries emitted from just the users controller, because we can't tell what portion of the 100ms was associated with controller=users. The summary data for one tag is permanently combined with the data from all tags.
To overcome this limitation, we can instead emit a separate aggregate summary message for each set of unique tags. In our example this would mean we emit two separate messages to the insights pipeline:{
	sql: "select * from users where id = ?",
	query_count: 1,
	total_time: "20ms",
	tags: {"controller": "users"}
}
{
	sql: "select * from users where id = ?",
	query_count: 2,
	total_time: "80ms",
	tags: {"controller": "sessions"}
}

This approach makes it possible to fully disambiguate aggregated statistics based on the attached tags. We can tell that the users controller was responsible for exactly 20ms of total execution time and the sessions controller was responsible for exactly 80ms.
This comes at a cost though: we have to emit a separate message for each unique tag combination. This can be problematic for high-cardinality tags (tags with a large number of distinct values). Consider a customer that has set a request_id tag on all of the queries issued from their web tier. Where we previously would be able to collapse 500 user-lookup queries into a single summary message, we now have to send 500 messages because they each have a unique request_id. In the worst case, this means that the summary data stream must send one summary message per query execution, and we've lost all of the scalability advantages of aggregating query statistics. For large clusters executing millions of queries per second, this would be prohibitively expensive to process and store, and would consume considerable resources on the database host where telemetry data is emitted.
To prevent this from overwhelming the pipeline, we implemented several strategies to dynamically reduce the cardinality of tags and therefore decrease the number of messages that must be handled by the Insights pipeline.
Cardinality Reduction
The core idea is simple: when a tag (or set of tags) would result in sending too much telemetry data, we collapse that tag by replacing specific values (like request_id="a" and request_id="b") with a value that indicates it has been removed: request_id=*. This lets us more aggressively merge aggregates and reduce the total number of messages sent, while ensuring that we're capturing 100% of the summary data.
We employed two separate approaches for tag collapsing.
Per-tag Limits
This mechanism tracks the number of unique values seen for each tag key, scoped per query pattern. If that count exceeds a predefined limit (currently 20), we proactively collapse that key for all queries for the next hour. This catches inherently high-cardinality tags like request_id or user_id.
An important part of this approach is that cardinality is monitored per query pattern and not globally. Consider the source_location tag that contains the file and line number showing where the query was initiated in the client app. Overall this tag is high-cardinality, because each query pattern likely has its own unique value for source_location, but it is highly correlated with the query pattern so it doesn't actually result in additional messages being sent to the pipeline - we are already sending a separate query summary message for each query pattern. Monitoring cardinality per-pattern allows high-cardinality tags that are highly correlated with query pattern to pass through without being collapsed.
Per-interval Limits
Within each 15-second interval, we track all aggregates keyed by their unique set of tag key-value pairs. Because we must emit a message for each unique combination of tags, even individually low-cardinality tags could produce an unacceptably large number of combinations of tags. For example, if a query pattern has 6 tag keys that each have 10 distinct values, there could be 10^6 individual tag combinations. To prevent an explosion in the number of messages that must be tracked, we perform dynamic cardinality reduction on a per-interval basis for any individual query pattern that has more than a fixed number of tag combinations.
To reduce the combined cardinality of a given set of aggregates, we find the highest cardinality tag and collapse it (replace all values with a single value). We successively perform this operation until the number of aggregates is beneath the fixed threshold (currently set to 50 in production).
To illustrate this operation, consider five executions of the same query pattern.select * from users where id = ? /*controller='users',    host='app-1'*/
select * from users where id = ? /*controller='users',    host='app-2'*/
select * from users where id = ? /*controller='sessions', host='app-3'*/
select * from users where id = ? /*controller='sessions', host='app-4'*/
select * from users where id = ? /*controller='sessions', host='app-1'*/

Without any limits, this produces five separate aggregate messages. To reduce the aggregate message count, we identify that the host tag has the highest cardinality (4 unique values) and replace all of its values with a placeholder and merge the remaining results. This yields only two combinations that must be emitted to the pipeline, one for each of the two unique controller tag values.
Tracking Tag Collapsing
When a tag must be collapsed due to either of the cardinality limitation mechanisms, we record the fact that the key has been collapsed in the emitted aggregate message. This allows us to detect when collapsing has occurred and display a message noting the percentage of tag values where the value is unknown.
Conclusion
Query tagging is a powerful feature. Being able to slice your Insights data by arbitrary tags gives you a much clearer picture of your database performance. We're excited for you to try it.]]></content>
        <summary><![CDATA[Introducing query tagging improvements in Postgres Query Insights]]></summary>
      </entry>
    
      <entry>
        <title>Behind the scenes: How Database Traffic Control works</title>
        <link href="https://planetscale.com/blog/behind-the-scenes-how-traffic-control-works" />
        <id>https://planetscale.com/blog/behind-the-scenes-how-traffic-control-works</id>
        <published>2026-03-23T16:00:00.000Z</published>
        <updated>2026-03-23T16:00:00.000Z</updated>
        
        <author>
          <name>Patrick Reynolds</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Today, we released Database Traffic Control™, a feature for mitigating and preventing database overload due to unexpectedly expensive SQL queries.  For an overview, read the blog post introducing the feature, and to get started using it, read the reference documentation.  This post is a deep dive into how the feature works.
Background
If you already know how Postgres and Postgres extensions work internally, you can skip this section.
A single Postgres server is made up of many running processes.  Each client connection to Postgres gets its own dedicated worker process, and all SQL queries from that client connection run, one at a time, in that worker process.  When a client sends a SQL query, the worker process parses it, plans it, executes it, and sends any results back to the client.  Planning is a key step, in which Postgres takes a parsed query and turns it into a step-by-step execution plan that specifies the indexes to use, the order to load rows from multiple tables, and the operators that will be used to filter, aggregate, and join those rows.  Most queries can be run using several different plans, so it's the planner's job to estimate the cost of the possible plans and pick the cheapest one.
Every part of how Postgres handles queries can be modified by extensions.  Extensions can add new functions, new data types, new storage systems, and new authentication methods, among other things.  (They can also add new failure modes, but that's a topic for another day.)  Extensions can also passively observe and report on traffic, like PlanetScale's own pginsights extension that powers Query Insights.
Much of what Postgres extensions can do, they do using hooks.  A hook is a function that runs before, after, or instead of existing Postgres functionality.  Want to observe or replace the planner?  There's a hook for that.  Want to examine queries as they execute?  There are three hooks for that.  As of this writing, there are 55 hooks available to anyone writing Postgres extensions.
PlanetScale's pginsights extension installs hooks for the ExecutorRun and ProcessUtility functions, among others, to run timers and measure resource consumption while SQL statements execute.  Since each hook wraps the original Postgres functionality, that means pginsights sees each query just before it executes and again just after it completes.  Any time that has elapsed and any resources the worker process has consumed are directly attributable to that query.  The extension does some aggregation, sends aggregate data periodically to a data pipeline, and returns control to Postgres to accept the next query.
Insights, hooks, and blocking queries
When we first started planning for Traffic Control, we knew we would use a Postgres extension with a hook on ExecutorRun to decide whether or not each statement would be allowed to run.  Initially, we wrote a new extension for this.  We soon realized that there are two ways to choose which queries to block: based on static analysis of the individual query, or based on cumulative measurements of resource usage over time.  We split the extension along those lines.  Blocking based on static analysis got merged into the project that became pg_strict.  Blocking based on cumulative resource usage became Traffic Control.
It turns out Traffic Control needed the same hook points and much of the same information that pginsights already had.  So rather than duplicate all that code and impose the extra runtime overhead of another extension, we taught pginsights how to block queries.

If there are any Traffic Control rules configured, then at the beginning of each query execution, the extension does four things:
It identifies all of the rules that match the tags and other metadata of the query.  Each rule identifies a budget; multiple rules can map to the same budget.
It checks to see if any of the applicable budgets has reached its concurrency limit.
It checks if the query's estimated cost is higher than any applicable budget's per-query limit.
It checks to see if every applicable budget has enough available capacity for the query to begin execution.  In the documentation, these parameters are described as the burst limit and the server share.  As we'll see below, those parameters combine over time to describe the behavior of a leaky-bucket rate limiter.
If any budget fails any of these checks, then the query is warned or blocked, based on how the budget is configured.
Blocking a query just before it begins execution means the server spends no resources on the query, beyond the cost of the planner and the decision to block it.  That's an improvement over schedulers like Linux cgroups, which let every task begin and simply starve them of resources if higher priority tasks exist in the system.  It's also an improvement over the Postgres statement_timeout setting, which allows any overly expensive query to consume resources until it times out.  Traffic Control blocks expensive, low priority queries before they begin.
Cost prediction
I glossed over something important in the last section: cost.  The concurrency check is easy, because it just counts worker processes already assigned to the queries associated with a Traffic Control budget.  But the other two checks — per-query cost and cumulative cost — require us to know what resources the query will consume before it even begins execution.  How do we do that?  We trust, but also don't trust, the planner.
A SQL query planner takes a parsed SQL statement and selects what it hopes is the most efficient series of steps to execute that query.  To evaluate all the possible plans, the planner has to estimate the cost of each one.  When you run EXPLAIN on a SQL statement, Postgres's planner shows the cost of each step in the chosen plan, as well as the overall total cost.  The cost is measured in dimensionless units and is based on configurable weights assigned to each step the plan will take.  There are a lot of variables that go into the plan cost, most of which you can ignore for the purposes of understanding Traffic Control.  Just remember these two things: plan costs are roughly linear (a plan with double the cost should take something like double the time and resources to execute), and the relationship between plan costs and real-world resources is heavily dependent on what query you're running, what server you run it on, and what else is happening on that server at the moment.
Traffic Control compensates for those dependencies.  We assume that there is an unknown constant k that we can multiply the plan cost by, to get the actual wall-clock time it will take to execute that query.  But that constant is different for each query pattern and for each host.  The constant may also change over time as the workload mix on the server changes and as tables grow and change.  So it's not exactly a constant!
Traffic Control implements a hash table on each host, mapping query patterns to two averages: CPU time and planner cost estimates.  Both are exponential moving averages, heavily weighting recent queries.  Every time a query completes, we update both of those averages.  The magical not-quite-constant k is the ratio of the two.
Each time a query comes in, Traffic Control multiplies the planner's estimated cost by k to guess how much CPU and/or wall-clock time the query will take.  Based on that estimate, Traffic Control decides if the query can be allowed to begin.  If it does, then at the end of query execution, Traffic Control updates the two averages for that query pattern so the k value will be more recent and more precise for the next query that arrives.
Leaky buckets
Two of the checks that Traffic Control performs for each query are easy: if the query's estimated cost is too high, block it.  If too many queries in the same budget are already running, block it.  But the final check — is there enough capacity in the budget to proceed — is harder.  It's important, though!  Many executions of a moderately expensive query can be even more damaging than a single very expensive query, and managing a budget over time is the best way to block queries that are only expensive in aggregate.  Traffic Control considers the cumulative cost of queries in each configured budget.

Each budget is modeled as a reverse leaky bucket.  Here's how that works.  Each query that executes accumulates debt in the bucket.  Any query that would cause the bucket to overflow with debt is blocked.  Debt drains out over time, until the bucket is empty.  The bucket has two important parameters: its size and its drain rate.  The size dictates the burst limit, or what total resources queries under a given budget can use in a short amount of time.  The drain rate dictates the server share, or what fraction of overall resources queries under a given budget can use in the long term.
Traditionally, leaky buckets work the other way: they start out full, they fill (but never overflow) with credits at a configured rate, traffic consumes credits, and if a bucket is ever empty, traffic gets blocked.  We inverted the model for a simple reason: an empty bucket doesn't need to be stored.  Over time, we may need to store many buckets for changing rules and changing query metadata.  We can drop buckets with a zero debt level, meaning that we only need to store recently active buckets, instead of every possible bucket.  We store as many buckets as will fit in a configurable amount of shared memory, and we evict them implicitly when their debt falls to zero.
There is no periodic task that drains debt from all buckets.  Instead, each bucket is updated only when read.  There is also no periodic task to evict buckets with a debt level of zero.  Instead, adding a new bucket to the table evicts any that have already emptied, or whichever bucket is expected to become empty soonest.
Rule sets
One important goal for Traffic Control is that it can efficiently decide when not to block a query.  After all, Traffic Control has to make that decision before each query is even allowed to begin execution.  So the budget here is measured in microseconds.  But we also want developers and database administrators to be able to configure as many rules as it takes to manage traffic to their application.  So it's crucial that we can evaluate many rules quickly.  Enter rule sets: a data structure that allows evaluating n rules in O(1) time.

Each rule has the form <key, value>, and it matches any query that has that same value for that same key.  It's complicated a bit by the fact that value can be an IP address with a CIDR mask.
A rule set maps each <key, value> pair to a rule.  Now, when a query comes in with metadata like username=postgres, app=commerce, controller=api, the rule set can quickly identify the rule for each of those pairs.  Hence, for this query, there are just three lookups in the rule set, regardless of how many rules are configured.
Note that a rule set only identifies rules to consider.  Each rule's budget is only checked if all its conditions match the query.  A rule set is all about checking as few rules as possible.  So, the sequence is: the rule set identifies a list of rules, that list is narrowed down to just the rules that actually match, and then the budgets for all the matching rules get checked to see if the query can proceed.
There are three exceptions to the O(1) target for identifying rules:
Rules for the remote_address key check for a match for each mask length.  So if you have rules for ten different mask lengths, the rule set has to do as many as ten lookups to find the rule with the longest matching prefix.
Any conjunction rule — that is, a rule with multiple <key, value> pairs ANDed together — may be identified as a candidate for queries that match any one of the <key, value> pairs in the rule.  So if you have conjunction rules with overlapping <key, value> pairs, the rule set may identify several or all of them as candidates for each query.
It is possible to add multiple rules for the exact same <key, value> pair.  If you do that, any query with that exact <key, value> pair will get checked against all of those rules.
Applying new rules
Traffic Control is meant to be used both proactively and during incident response.  For incident response, it's important that rules take effect quickly.  And they do!  Rules created or modified in the UI generally take effect at all database replicas in just 1-2 seconds.  How?
Rules and budgets are stored as objects in the PlanetScale app.  Any change to Traffic Control rules made in the UI or the API gets stored as rows in the planetscale database.  Then it's serialized as JSON in the traffic_control.rules and traffic_control.budgets parameters for Postgres.  Some Postgres parameters require restarting the server, but those two don't.  So they cut the line and get sent immediately to postgresql.conf files on each database replica.  Postgres reads the new config, and each worker process parses it into a rule set as soon as it completes whatever query it's executing.  The rule set is in place before the next query begins.
One big advantage of using Postgres configuration files, rather than sending configuration over SQL connections, is robustness on a busy server.  You may want new Traffic Control rules most urgently when Postgres is using 100% of its available CPU, 100% of its worker processes, or both.  Changing config files is possible even when opening a new SQL connection and issuing statements wouldn't be.
Wrap up
Traffic Control uses the hooks and the performance measurements that Query Insights already implemented, then bolts on a system for sorting query traffic into budgets and warning or blocking queries that exceed those budgets.  Each query can be warned or blocked if it's individually too expensive, if too many other queries are already running under the same budget, or if recent and concurrent queries under the same budget have consumed too many resources in the aggregate.  Traffic Control implements a dynamic model per query pattern that leverages the existing Postgres planner to estimate the real-world cost of a query before it begins to execute.  Leaky buckets impose limits on both traffic bursts and the long-term average fraction of server resources assigned to any individual budget.
Taken as a whole, these elements implement Traffic Control, which gives developers and database administrators powerful new tools to identify, prioritize, and limit SQL traffic.]]></content>
        <summary><![CDATA[Learn how Traffic Control enforces real-time limits on Postgres queries.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing Database Traffic Control</title>
        <link href="https://planetscale.com/blog/introducing-database-traffic-control" />
        <id>https://planetscale.com/blog/introducing-database-traffic-control</id>
        <published>2026-03-23T00:00:00.000Z</published>
        <updated>2026-03-23T00:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Postgres has a fundamental gap when it comes to managing query traffic. When an unexpected spike of bad queries, or runaway workload hits your database, Postgres has no good way to fight back. It accepts every query thrown at it until performance degrades or, in the worst case, the server goes down.
Today we're introducing Database Traffic Control™, a Postgres traffic management system built into PlanetScale that lets you enforce flexible budgets on your database traffic. With Traffic Control, you decide in real time how much of your database's resources any given workload is allowed to consume, and Postgres enforces those limits.
How it works
Traffic Control allows you to create budgets that target subsets of your query traffic. You specify which queries fall into the budget using rules that match on different dimensions, including:
Query pattern: a specific query fingerprint identified in Insights
Application name: the app that sent the query
Postgres user: the database user executing queries
Custom tags: any metadata you attach to your queries via SQL comments (feature name, priority level, region, customer tier, etc.)
After deciding what queries you want in your budget, you define the resource limits the budget is allowed to utilize. You can place caps on things like CPU %, CPU burst limits, backend process concurrency, and per-query timing.
Budgets can run in warn mode to observe what would be throttled, or enforce mode to actively block queries that exceed limits. You can also switch between modes at any time.
PlanetScale Insights tracks statistics for every query running against your database. When something goes wrong, you can find the problematic query in Insights, see detailed usage data, and set up a budget to restrict it in just a few clicks.
Use cases
Traffic Control is powerful and flexible.It's helpful in a variety of scenarios both for preventing and reducing the impact of database-related incidents.
Incident response
A rogue query is spiking CPU and degrading performance for your entire application. Find it in Insights, budget its resource usage, and stabilize your database while your team investigates and ships a fix.
Priority-based traffic shaping
Tag your queries by priority (high, medium, low) and create budgets for each tier. Core features like authentication and critical user flows can have higher limits, while lower-priority background jobs are kept from starving them out.
Isolating human from AI agent traffic
As AI-powered features drive more and more queries to your database, Traffic Control lets you set guardrails so automated traffic can't overwhelm the queries powering your human user experience.
Prioritizing paid tiers in multi-tenant apps
If you run a multi-tenant application, you can use tags to identify traffic by customer or tier, then budget accordingly. Enterprise customers stay protected during load spikes caused by trial users.
Getting started
Traffic Control is available today for all PlanetScale Postgres databases. To start using it:
Navigate to your database in the PlanetScale dashboard
Open Insights and go to the Traffic control tab
Create your first budget by selecting the tags and limits you want to enforce (Note: a database restart may be required)
Start in warn mode to observe the impact before switching to enforce
To get the most out of Traffic Control, add sqlcommenter tags to your application's queries. This gives you rich dimensions for categorizing traffic.
Traffic Control is also available via the PlanetScale API and CLI, so you can automate budget creation as part of your deployment pipelines.
See it in action
We built a demo tool to let you try it out and see it in action.Just clone onramp, and run onramp create to auto generate a schema, Traffic Control budgets, and rules.
You can also learn more in the Traffic Control documentation, read the behind-the-scenes deep dive, or follow along with our detailed feature walkthrough:
Traffic Control gives your Postgres database something it has never had: the ability to defend itself. Set up your first budget today and stop worrying about the next query that tries to take your database down.]]></content>
        <summary><![CDATA[Enforce real-time limits on your Postgres query traffic to protect your database from runaway queries and unexpected load spikes.]]></summary>
      </entry>
    
      <entry>
        <title>Scaling Postgres connections with PgBouncer</title>
        <link href="https://planetscale.com/blog/scaling-postgres-connections-with-pgbouncer" />
        <id>https://planetscale.com/blog/scaling-postgres-connections-with-pgbouncer</id>
        <published>2026-03-13T00:00:00.000Z</published>
        <updated>2026-03-13T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[The Postgres process-per-connection architecture has an elegant simplicity, but hinders performance when tons of clients need to connect simultaneously.
The near-universal choice for solving this problem is PgBouncer.Though there are upcoming systems like Neki which will solve this problem in a more robust way, PgBouncer has proven itself an excellent connection pooler for Postgres.
PlanetScale gives you local PgBouncers by default, and makes it incredibly easy to add dedicated ones when needed.The challenge comes in determining the optimal configuration for your app, which is highly use-case dependent.
My aim with this article is to make every engineer well-equipped to tune PgBouncer with confidence.
Why PgBouncer?
PgBouncer is a lightweight connection pooler that sits between your application and Postgres.

PgBouncer is totally transparent, speaking the PostgreSQL wire protocol.From an app's perspective, it's just talking to a Postgres server, PgBouncer acting as a lightweight middleman.It can multiplex thousands of client connections onto tens of Postgres connections.
But why not just make 1000s of connections directly to Postgres?Unfortunately, the Postgres process-per-connection architecture doesn't scale well.Every connection forks a dedicated OS process consuming 5+ MB of RAM and adding context-switching overhead.PgBouncer solves this by maintaining a pool of reusable server connections, reducing resource consumption and letting PostgreSQL handle far more concurrent clients than its native max_connections would otherwise allow.
It's best-practice to keep the count of direct connections to Postgres small.Tens of connections for smaller instances.Hundreds for larger servers.
This is too restrictive for the way modern apps are built.We frequently want thousands of simultaneous connections to the database.PgBouncer gives Postgres that capability while keeping the total number of forked processes low.
At PlanetScale, we recommend using PgBouncer for all application traffic, only resorting to direct connections for administrative tasks and a few other narrow cases.
How to use PgBouncer
PgBouncer maintains a pool of pre-established Postgres server connections.When an app / client needs a database connection, it connects to PgBouncer, and then PgBouncer uses one of the pre-existing pooled connections to pass along the message.When the client is done, the connection returns to the pool for reuse.A single pooled Postgres connection can serve hundreds or thousands of client PgBouncer connections over its lifetime.
When all pool connections are in use, PgBouncer queues the client until one becomes available rather than rejecting it.If the wait exceeds query_wait_timeout (default: 120 seconds), the client is disconnected with an error.
Whereas the Postgres default port is 5432, PgBouncer defaults to 6432.Typically, switching from a direct connection to a PgBouncer connection is as simple as switching the port in your client connection string.
This is true on PlanetScale, with a twist: We give you three options for using PgBouncer:
Local PgBouncer
Every Postgres database includes a local PgBouncer running on the same server as the primary.Connect using the same credentials as usual, just swap the port to 6432.

Dedicated primary PgBouncer
A dedicated primary PgBouncer runs on separate nodes from Postgres, making for better HA characteristics.It connects to the local PgBouncer first, which then connects to Postgres.Client connections persist through resizes, upgrades, and most failovers.Connect by appending |your-pgbouncer-name to your username on port 6432.

Dedicated Replica PgBouncer
Dedicated replica PgBouncers are similar to dedicated primary ones, but connect to the replicas instead (and don't route through the local bouncer).

We recommend this if your applications make heavy use of replicas for read queries.
The three pooling modes
PgBouncer operates in one of three modes.
Session pooling assigns a server connection for the lifetime of the client connection, releasing it only when the client disconnects.This means there's a 1:1 mapping between client and server connections.It's not incredibly useful, as it does little to reduce Postgres connection count.At times, it's helpful for limiting thundering herds of connections.
Statement pooling assigns a server connection for a single SQL statement and releases it immediately after. This means multi-statement transactions are disallowed entirely.Most apps need this, so not useful in 99% of cases!
Transaction pooling is the only sensible option.It assigns a server connection for the duration of a transaction, returning it to the pool the moment a COMMIT or ROLLBACK completes.This is great for most use cases, though there are a few unsupported features in this mode.
PlanetScale only supports Transaction pooling, given the clear weaknesses of the two.When you absolutely need one of those few unsupported features, keep them to a small number of direct-to-Postgres connections.
Knob all the things
PgBouncer's configuration centers on a hierarchy of connection limits.These control how many client connections are accepted, how many server connections are maintained per pool, and how those relate to PostgreSQL's own max_connections.
The connection chain works like this:

max_client_conn is the maximum number of application connections PgBouncer will accept.Because connections are lightweight in PgBouncer, this is frequently set in the 1000s.
default_pool_size controls the number of server connections per (user, database) pair that PgBouncer will make to Postgres.How to configure this depends quite a bit on your schema and access patterns.In an environment where you have a single server with many logical databases and many Postgres users, this will likely need to be set low, between 1-20.When you have a single logical database and a small number of Postgres roles, this can be set much higher.
The total potential PgBouncer ↔ Postgres connections equals num_pools × default_pool_size.With 4 users and 2 databases we get 4 x 2 = 8 pools.At a pool size of 20, PgBouncer could open up to 160 connections to PostgreSQL.
max_db_connections and max_user_connections are hard caps that span across all PgBouncer pools for a given database or user, respectively. They act as safety valves to prevent pool arithmetic from exceeding PostgreSQL limits.These default to 0 (no limit) but can be set in some scenarios for safety.
All the above are PgBouncer settings.The key setting on the Postgres side is max_connections.The total server connections must stay below this number.We should always keep a few available direct connections reserved for admin tasks and other emergency scenarios.We NEVER want PgBouncer to use all of the connections!
All of this can be summarized in a nice formula:

In Postgres, we can explicitly set superuser_reserved_connections, which is handy for ensuring some connections are reserved for the superuser.
Tuning examples
Thinking through some practical scenarios makes this easier to reason about.
Small server
First, let's think through having a PlanetScale PS-80 (1 vCPU, 8GB RAM per node), a single multi-tenant database, and 3 distinct Postgres users we use for clients connecting through PgBouncer: one for the app servers (app), one for an analytics service (analytics), and one for a data exporter (export).
We want to keep direct Postgres connections low, so we set the Postgres max_connections=50.
Though it's a small database, we sometimes have 100s of app servers making simultaneous connections during peak load. We set the PgBouncer max_client_conn=500.
The majority of these connections come from a single Postgres user + database pair (the app-server user connecting to the main logical database).Because of this, we set default_pool_size=30 but then also set max_user_connections=30 and max_db_connections=40. This prevents connections from the app user from utilizing all of the backend connections, ensuring some are always available for the other two.This also means PgBouncer can never hold more than 40 connections to Postgres in total, ensuring 10 are always available for other services or administrative tasks.

Large server
Now for the same scenario, but with much higher traffic, requiring an M-2650 (32 vCPU, 256GB RAM per node).We'll again have the same 3 distinct Postgres users.
Just because we now have 32x the CPU power, we don't want to increase direct Postgres connections by 32x.It's still wise to keep this on the lower side, so we will settle in at a max of max_connections=500.
We now sometimes have 1000s of app servers making simultaneous connections during peak load. We set the PgBouncer max_client_conn=10000.
Because of this, we set default_pool_size=200 but then also set max_user_connections=200 and max_db_connections=450 for similar reasons as the previous example.No one user can use more than 200 connections.
This also means PgBouncer can never hold more than 450 connections to Postgres, ensuring 50 remain available for other purposes, or if we add services requiring features of direct connections like session variables.

Single-tenant configuration
Though single-tenant architectures are generally discouraged, some organizations prefer this or have inherited such a structure.In this case, we'll assume there is a unique logical database co-located on the same Postgres server for every customer.
Say in this case we have a PlanetScale M-1280 (16 vCPUs, 128GB RAM per node), 200 distinct logical databases (for 200 tenants) and a unique Postgres role for each, for the sake of isolating permissions.There is a 1:1 mapping between each logical database and the Postgres user querying it.
This is a much different connection pattern than the previous example.We have 200 roles connecting to 200 logical databases all on the same host, and want to ensure we can scale to thousands of combined connections without hitting limits.
We'll center this around max_connections=400.
If any one tenant peaks at 20 connections, then we'll set PgBouncer's max_client_conn=5000 (includes a bit of buffer).
Recall that default_pool_size controls connections per (user, database) pool. Since each of the 200 users connects to exactly one database, there are 200 active pools. Even a modest default_pool_size results in a large number of server connections: for example, a default_pool_size of 10 would yield a theoretical max of 200 × 10 = 2,000 server connections, far exceeding max_connections=400.
We'll set default_pool_size=2 (at most 2 PgBouncer <-> Postgres connections per pool).Since we have a clean user-to-logical-database mapping, we also set max_db_connections=2 and max_user_connections=2 to enforce this per-pool cap.The maximum total PgBouncer server connections is 200 × 2 = 400, matching max_connections=400.
A single tenant can have 10s or even 100s of connections to PgBouncer, but all these will get multiplexed through at most 2 direct Postgres connections.

App-side PgBouncers
In some deployments, it also makes sense to layer PgBouncer.You can run one PgBouncer on the app or client side to funnel many worker or process connections into a smaller egress set, then run another PgBouncer near Postgres as the final funnel into a tightly controlled number of direct database connections.

This is especially useful when you need connection pooling both close to compute and close to the database.
Multiple PgBouncers
In large-scale deployments, setting up multiple PgBouncers is useful for traffic isolation.When your web app, background workers, and other consumers all share one pool, a spike from one class of traffic can saturate the PgBouncer and delay everything else.

Giving each major consumer its own PgBouncer creates independent funnels with their own limits, pool sizing, and failure domains.That makes it easier to protect latency-sensitive app traffic from bursty worker traffic and tune each workload separately.
For an additional layer of protection, Database Traffic Control™ lets you enforce resource budgets on query traffic by pattern, application name, Postgres user, or custom tags — without needing separate infrastructure. The two approaches complement each other well: PgBouncer manages connections, Traffic Control manages resource consumption.
The key concepts
PgBouncer solves a fundamental architectural constraint in PostgreSQL: the process-per-connection model that makes every connection expensive.When working with PgBouncer, there are a few fundamental things to keep in mind:
Transaction pooling is the mode that matters.Every transaction, be it a single query or many, gets a dedicated connection from PgBouncer <-> Postgres while executing.After this, the connection can be re-used for another transaction, maybe on the same client, and maybe for another.
Use PgBouncer as much as possible.If you absolutely need features that are incompatible with transaction pooling, like LISTEN, session-level SET/RESET, or SQL PREPARE/DEALLOCATE, use a direct connection.In all other cases, the small latency penalty of PgBouncer is well worth the scalability and connection safety.
The key configs to pay attention to are: max_connections (Postgres), plus max_client_conn, default_pool_size, max_db_connections, and max_user_connections (PgBouncer).
Ensure things are configured to allow for direct connections, even when all PgBouncer connections are in use.]]></content>
        <summary><![CDATA[PgBouncer is the perfect pairing for Postgres's biggest weakness: connection management. Tuning it just right is important to make this work well, and here we cover everything you need to know]]></summary>
      </entry>
    
      <entry>
        <title>Drizzle joins PlanetScale</title>
        <link href="https://planetscale.com/blog/drizzle-joins-planetscale" />
        <id>https://planetscale.com/blog/drizzle-joins-planetscale</id>
        <published>2026-03-03T00:00:00.000Z</published>
        <updated>2026-03-03T00:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[I am excited to announce that the Drizzle team is joining PlanetScale to continue their mission of building the best database tools for JavaScript and TypeScript. Drizzle’s ORM has risen to become the default thanks to the team’s obsession with performance and developer experience, two things PlanetScale also obsesses over. The alignment is clear.
The values of open source are immutable and special. Drizzle will remain an independent open source project with its own roadmap and goals. The Drizzle team joining PlanetScale ensures they can focus on the core priorities of the project, and PlanetScale will support this important work.
On behalf of the PlanetScale team, I want to thank the Drizzle team for their impact on our community. We are honored to call you colleagues.]]></content>
        <summary><![CDATA[Drizzle joins PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Video Conferencing with Postgres</title>
        <link href="https://planetscale.com/blog/video-conferencing-with-postgres" />
        <id>https://planetscale.com/blog/video-conferencing-with-postgres</id>
        <published>2026-02-27T00:00:00.000Z</published>
        <updated>2026-02-27T00:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Yesterday on X, SpacetimeDB tweeted that they had done "the world's first video call over a database" and, in their own way, invited anyone else to give it a try.
Credit to them - it's a cool idea! In short, they built a frontend that captures audio and video from the browser's media APIs, encodes them into compact frames (PCM16LE audio, JPEG video), sends them to a database that acts as a real-time message broker, and streams them back out to the other participant's browser for playback.
Fortunately, the implementation is open sourced (https://github.com/Lethalchip/SpaceChatDB), so I figured I'd see what it looked like for PostgreSQL, the world's most popular open source database, to host the world's second video call over a database.
How it works
In my implementation, I started with the same SvelteKit frontend, added a small Node.js WebSocket server (pg-relay) in the middle, and $5 PlanetScale PostgreSQL as the database.
When you're on a video call:
Your browser captures a camera frame, encodes it as a JPEG, and sends it as a binary WebSocket message to pg-relay.
pg-relay validates that you're in an active call, then runs:INSERT INTO video_frames (session_id, from_id, to_id, seq, width, height, jpeg)
VALUES ($1, $2, $3, $4, $5, $6, $7)


PostgreSQL writes this to the WAL (write-ahead log).
pg-relay is also running a logical replication consumer on the same database. It sees the new row appear in the replication stream, checks the to_id column, and forwards the raw JPEG bytes over WebSocket to the recipient.
The recipient's browser creates a blob URL from the JPEG and renders it in the browser.
Audio works the same way — PCM samples go into an audio_frames table and come out the other side via replication.
Logical replication?
PostgreSQL's logical replication gives us a reliable and ordered change stream. You get INSERT, UPDATE, and DELETE events for every table in the publication, delivered in commit order. This means we don't have to poll Postgres with SELECT statements from the table fast enough to render 15fps video.
This means the same mechanism that pushes video frames to call participants also pushes chat messages, user presence changes, and call state transitions. When someone sends a chat message, it gets INSERTed, appears in the replication stream, and gets forwarded to every connected client. When a user disconnects, their row gets DELETEd, and everyone sees them go offline.
For video, the table looks like this:CREATE TABLE video_frames (
    id          BIGSERIAL PRIMARY KEY,
    session_id  UUID      NOT NULL,
    from_id     TEXT      NOT NULL,
    to_id       TEXT      NOT NULL,
    seq         INT       NOT NULL,
    width       SMALLINT  NOT NULL,
    height      SMALLINT  NOT NULL,
    jpeg        BYTEA     NOT NULL,
    inserted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

There's nothing special about this table. It's just rows with a JPEG in a BYTEA column.This incurs a modest amount of egress, but nothing a database like this can't handle.

The media to PostgreSQL pipeline
On the capture side, the browser grabs camera frames, draws them to an offscreen canvas, and calls canvas.toBlob() to get a JPEG. Audio comes from an AudioWorkletNode that collects PCM samples, resamples them to 16kHz mono, and encodes them as 16-bit little-endian integers.
Both get packed into binary WebSocket frames with a small JSON header (session ID, sequence number, recipient) and shipped to the relay.
On the playback side, incoming JPEGs get turned into blob URLs and set as the src of an <img> tag. Audio samples get decoded back to floats and scheduled on a Web Audio AudioBufferSourceNode with a small jitter buffer.
The whole thing runs at 640x360 @ 15fps with JPEG quality 0.65. Each frame is roughly 25-40KB, which means it's pushing about 375–600 KB/s of video through Postgres per direction.
Accumulating rows
I didn't want to keep every video frame forever. At 15fps, you'd accumulate about 108,000 rows/hour per active call. So there's a cleanup job that runs every 2 seconds and prunes frames older than 5 seconds:DELETE FROM audio_frames WHERE inserted_at < NOW() - INTERVAL '5 seconds';
DELETE FROM video_frames WHERE inserted_at < NOW() - INTERVAL '5 seconds';

This means that for every call, we'd expect to have about 5-7 seconds of frames in the table at any given time, or about 150 rows total. Sure enough:SELECT
  from_id,
  COUNT(*) AS frames_5s,
  ROUND(COUNT(*) / 5.0, 1) AS approx_fps
FROM video_frames
WHERE inserted_at >= NOW() - INTERVAL '5 seconds'
GROUP BY from_id
ORDER BY frames_5s DESC;
                             from_id                              | frames_5s | approx_fps
------------------------------------------------------------------+-----------+------------
 06cad97a128947a58e8ff754ec1171d4200c4d774b5c43c9b7637f11bba61036 |        76 |       15.2
 4f984feaee1042939383b0fffb3f1fc172d28aa92b654551a02c618788021995 |        76 |       15.2
(2 rows)

Look at that - our own $5 PostgreSQL is streaming bidirectional 15fps video!
What's really cool about this is that if we wanted, we could keep the frames around. Each JPEG is being durably persisted to PostgreSQL, crash-safe, replicatable and ready for querying later on. My $5 PostgreSQL has the throughput to store hours of video that I can combine later.
I can even pull one of the frames right out of the database and render it in my terminal:

Could we have done this another way?
LISTEN/NOTIFY was my first idea. Postgres has a built-in pub/sub mechanism — NOTIFY on a channel, LISTEN on the other end, messages arrive in real time. We could skip the media tables entirely and just blast JPEG bytes through a notification channel.
The problem is an enforced 8KB payload limit. A video frame at 640x360 is 25-40KB. We'd have to chunk every frame into 4-5 separate notifications, reassemble them on the other end, handle ordering, handledropped chunks — at which point you've built a worse TCP on top of a notification system. Audio frames would mostly fit under 8KB, so we could do a hybrid approach, but splitting the media pipeline acrosstwo different transport mechanisms is the kind of complexity I wasn't interested in.
Unlogged tables go the other direction. Instead of changing how we get data out of Postgres, they change how data goes in. Unlogged tables skip the WAL entirely — no write-ahead logging, no fsync, no crash recovery. Inserts are faster because Postgres isn't making durability guarantees about video frames.
I didn't like that because logical replication reads from the WAL. If the table doesn't write to the WAL, it doesn't appear in the replication stream. To make this work, we'd have to fall back to polling — SELECT * FROM video_frames WHERE seq > $1 in a loop. This might have worked fine, maybe better - but something about rendering video from a polling loop of SELECT * didn't feel good.
How'd it go?
You be the judge. It exceeded my expectations.

Our $5 PlanetScale PostgreSQL was able to keep up with the insert rate of live video and audio, and browsers are optimized enough that they can take raw JPEG frames and turn them into video pretty convincingly.
The only adjustments I made after I got it working the first time were adding some boundaries to keep audio in sync. Video frames render instantly (we just swap the image), but audio needs to be buffered and scheduled ahead of time to avoid gaps. Getting them to stay in sync required clamping the audio scheduling buffer so it can't drift too far ahead of real time:  const now = audioCtx.currentTime;
  const clamped = nextPlayTime > now + 0.15 ? now + 0.02 : nextPlayTime;
  const startAt = Math.max(clamped, now + 0.02);

Should you do this?
No! Use WebRTC!
But if you want to understand how logical replication works and want to see how far you can push Postgres as a general-purpose real-time backend, this is a fun way to find out. The entire relay server is about 400 lines of TypeScript.
My fork is at github.com/nickvanw/PgVideoChat. If only Alexander Graham Bell could see us now.]]></content>
        <summary><![CDATA[Stream real time video and audio through PostgreSQL on PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Faster PlanetScale Postgres connections with Cloudflare Hyperdrive</title>
        <link href="https://planetscale.com/blog/cloudflare-hyperdrive-real-time" />
        <id>https://planetscale.com/blog/cloudflare-hyperdrive-real-time</id>
        <published>2026-02-19T00:00:00.000Z</published>
        <updated>2026-02-19T00:00:00.000Z</updated>
        
        <author>
          <name>Simeon Griggs</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Cloudflare recently launched Hyperdrive, which provides efficient pooling and fast queries for any MySQL or Postgres database—making optimizing how your application connects to your database incredibly easy. Once you're operating inside the Cloudflare network you get access to the full suite of features, including WebSockets—which we can use to build a real-time experience.
In this post, I'm going to outline the decisions made along the way while building a real-time application backed by PlanetScale Postgres Metal.
You may note an absence of “how to” code snippets for application code or SQL queries in this post. I’m anticipating that you’re building by describing what you want to an LLM, and so this blog post is describing what I did (and didn’t) do.
The infrastructure stack
PlanetScale Postgres Metal is the benchmarked fastest cloud Postgres, so it makes sense to pair it with the fastest cloud services. PlanetScale Metal databases are powered by blazing-fast, locally-attached NVMe SSD drives instead of network-attached storage.
Cloudflare Hyperdrive enables fast access and automated connection pooling to Postgres and MySQL databases, making it an excellent, almost zero-config way to build high-performance applications. When hosted on Cloudflare Workers you unlock access to network benefits of being inside of the Cloudflare global network, including WebSockets, which can power real-time applications in combination with Durable Objects.
That paragraph contains a lot of names of things—it's helpful to unpack each of these and understand how they intersect.
The Cloudflare global network is infrastructure that runs all Cloudflare services with physical locations within ~50ms of 95% of the world's internet-connected population, along with faster-than-average connections to almost every service and cloud provider.
Cloudflare Hyperdrive automates away the headaches of connection latency and pooling to turn your single-origin database into a globally distributed and cached source of truth. It is made of two separate yet equally important parts.
The edge component runs globally from within every Cloudflare data center and prepares the 7 round-trip steps of creating a connection to your database within the Cloudflare global network.
The connection pool is enabled physically close to your database and maintains warm connections to Postgres which are automatically given to incoming requests.
Cloudflare Workers is a platform for globally distributing and running your application. While it may be beneficial to run your application close to your users, performance could be improved by moving the application closer to the database through "smart placement," which we'll cover later in this post.
Durable Objects are stateful, single-origin request handlers for coordinating tasks. Each one lives in one data center and holds in-memory state—like a list of open WebSocket connections.
WebSockets are supported by Cloudflare and in this post we'll use them to coordinate real-time updates from the PlanetScale Postgres database via a Durable Object.
Hyperdrive vs direct connections
If you're interested in a demo comparing Hyperdrive vs direct connections to the database, Jilles Soeters at Cloudflare has a great demo you can check out: Globally Fast Applications with Cloudflare Hyperdrive and PlanetScale.
The frontend stack
Out of interest, I'm using React Router 7 as my front-end framework along with Tailwind CSS. I won't go into detail here because I'm not sure that these choices matter as much as they used to. I think React is an excellent choice because of the support it has for fast user interfaces.
Cloudflare has a React Router 7 starter template preconfigured for deployment to Workers.
I've historically enjoyed working with React Router 7, but while building this demo, rarely did I look at the internals of this application to see how it is written at the framework level.
Piecing the puzzle together
These are all excellent choices for building fast applications. But how and where you put these pieces together still greatly impacts the performance.
For example, if we're going to use a Durable Object to broadcast updates over WebSockets, it may be tempting to make the Durable Object the write path to the database. However, this negatively impacts performance. Durable Objects are single-threaded and hosted in a single location, making them a bad candidate for the write path.
Instead the Workers will send transactions to the database via the Hyperdrive connection.
There's also a question of when to leverage the global network or not. Globally distributing the application via Workers places it around the world, but these locations could be far from the database and the Hyperdrive connection pool—which both exist in a single origin, putting any latency between the Worker and your database. Through "smart placement" the Worker can be moved to a single location, moving latency between the user and the Worker while greatly reducing the latency to the database.
In this application I’ve chosen not to use smart placement and keep the Worker closer to every user. Monitor your own application’s performance and needs closely to see if keeping the connection between Worker and database is more useful than between Worker and user.
Finally, you need to determine whether WebSockets alone are considered a bulletproof source of real-time information in your application. In this demo, we'll leave out reconnection logic, which could help with any failed WebSocket updates. And we could have added periodic polling to run in addition to WebSockets to manually catch up any missed updates.
Essentially, we've got to draw the line somewhere. This app is going to demonstrate extreme performance, but as always your mileage may vary and how suitable this is for you will depend on the workload of your application.
Problem solving
It is one thing to build an app which returns fast queries through Hyperdrive's connection pooling and caching. I want this demo to include more.
In a nutshell, our demo app recreates the appearance of much of the functionality of a "prediction market," where users can buy options of either a "Yes" or "No" position on each market. The price of each option fluctuates based on the current volume of purchases on either position.
This functionality creates several layers of complexity which will need to leverage all of the previously mentioned Cloudflare functions.
Fast reads: Prediction markets are based on the outcome of real, live events. Therefore, the latency to retrieve the most up-to-date information is critical.
Real-time updates: Users should not have to refresh the page in order to see up-to-date information and should be able to observe changes in real-time such as trends and purchase activity across the markets.
Fast writes: Since the price of options can be volatile, it is important that transactions are sent with the currently correct price as well as updated in the database fast enough to broadcast updates to all other users.
Step 1: Connecting Hyperdrive
You can connect Hyperdrive to a database through the Cloudflare dashboard or CLI as well as during the creation of a PlanetScale Postgres database.
In the PlanetScale dashboard I'll select the smallest Metal database, which starts at $50/month. Though the $5/month non-Metal database could be an option for this demo too.
Once the cluster is ready I can choose to create a new role and follow the instructions to create a link between PlanetScale and Cloudflare.
The result is a connection string which I add to my application's wrangler.jsonc file{
  // ...
  "hyperdrive": {
    "binding": "HYPERDRIVE_PLANETSCALE",
    "id": "e4fe..."
  }
}

Note that during local development reads and writes to the remote database will be slower than they are in production. You can target a database on your machine during development for faster reads and writes, but you will not experience pooling and caching optimizations.
After some LLM planning I have the scaffolding of my app to do the very basics–read and write to a database. Deployment is as simple as running a command from the terminal.npx wrangler deploy

More terminology: What's "Wrangler?" It's the name for Cloudflare's CLI and local development environment. It is a dependency which came preinstalled with the framework template.
The alternative to Hyperdrive
We could have built an application which makes direct connections to the database primary for writes, and connections to the replicas for queries. While this can work and be optimized to be more efficient, we would have lost Hyperdrive's benefits around connection pooling and reuse, and query response caching.
Instead, with Hyperdrive we have a highly optimized database connection, just not one we had to build ourselves.
Multi-environment setup
When building there's a temptation to build everything as fast as possible using the defaults. But planning now for multiple environments will help you avoid surprises later.
Create a database branch in PlanetScale, either via the dashboard or the CLI, and configure the connection string in your wrangler.jsonc file. Add an env block to your configuration file.{
  // ... other settings
  "env": {
    "development": {
      "hyperdrive": {
        "binding": "HYPERDRIVE_PLANETSCALE",
        // Development branch database
        "id": "84c4..."
      }
    }
  }
}

Your local development server should target this environment by setting the CLOUDFLARE_ENV environment variable.{
  "scripts": {
    "dev": "CLOUDFLARE_ENV=development react-router dev"
    //... other scripts
  }
}

Now our deployed Worker should be targeting your main branch production database, while you target the development branch locally.
Step 2: Real-time with WebSockets
Our application simply wouldn't work if the latest data wasn't always rendered. The logic of our application should ensure at the database level that option purchases won't proceed if the purchase price as seen by the user is not a valid price at the time the transaction is written to the database.
Since the prices of options are volatile at peak times and so can change quickly, the latest data must be pushed to the browser.
This is where we need to configure a Durable Object and Cloudflare WebSockets. The Durable Object will set up the WebSocket connection to receive and send events.
In this demo, the Worker sends the transaction to the database via the Hyperdrive connection. When that transaction completes it pings the Durable Object via the WebSocket connection to fan out that update to all other connected browsers.
Reliability boundaries
One useful principle in real-time apps is to decide what is authoritative and what is just fast. In this architecture, Postgres is the source of truth and WebSockets are the low-latency notification layer.
That distinction matters. WebSocket messages can be delayed or dropped due to normal network behavior, but database writes are still durable. So the contract for clients should be: "updates are immediate most of the time, and eventually correct all of the time."
I’m using WebSockets as a fast notification channel, but not the authority. The PlanetScale Postgres database is the source of truth for all queries and transactions.
What we intentionally did not build (yet)
To keep this guide focused on principles and direction, I left out several production hardening layers which could’ve also been implemented.
Replay on reconnect: A reconnecting client could ask for events since a known cursor. We skipped this for simplicity.
Queue-backed fanout: A queue can improve durability and retries between write completion and broadcast. We skipped this to avoid extra moving parts. Cloudflare has a Queueing service, too!
Polling reconciliation: Periodic polling can catch any missed updates from WebSockets. We noted it earlier, but didn't implement it in the demo.
Each of these can make the system more resilient. They also increase operational complexity, and for this demo I chose to show the fastest path to a working real-time architecture first.
Step 3: Increasing load
We now have a working application with support for multiple environments, transactions and real-time updates. But we're not building a toy, we want to prove this application can handle some scale. Many users, a lot of transactions, and a lot of markets.
To simulate load, I've used a seed script that creates fake users and markets, then fans out a large volume of initial transactions across those markets.
Then in the application I have a Start Trading button that enables a client-side simulator that continuously posts trades at a fixed cadence to random open markets. Each request carries expected price/version data and slippage tolerance, so the backend can reject stale quotes while successful writes immediately broadcast over WebSockets to every connected browser.
This isn't a replacement for faking traffic from around the world, but it does give us a good sense of what it looks like to create many transactions in one tab with another tab open, which is receiving the same live updates. Because of the way we have configured the real-time updates, the user that sends the transaction will get their feedback faster than users who are only listening. It's a small trade-off I'm considering acceptable in this demo. See the earlier notes about making real-time more robust through queues and polling.
Monitoring PlanetScale performance
Cloudflare Hyperdrive made handling connections and requests fast and easy, but there's still plenty we can do on the PlanetScale side to improve performance.
Query Insights can help you understand how your database is performing under load. As soon as we've simulated traffic, there may be some insights for our queries to be improved. You can access these insights from the PlanetScale dashboard or the CLI.
Or better yet, with the PlanetScale MCP server, you can just ask your LLM. It has the capability of reading insights data and would then be able to make improvements or act on recommendations such as creating indexes.
Implementing database best practices
It's extremely likely that your application has a lot of AI-authored code. But this doesn't mean it has to be bad. PlanetScale offers a package of agent skills in order to train your LLM in best practices as PlanetScale sees them when it comes to data structures and writing queries. Make it a habit to routinely audit your application against these skills--before your application is a runaway success.
Increasing capacity
Should your application be so successful as to overwhelm your current cluster size, you can resize your cluster at any time through the PlanetScale dashboard or CLI. Increase CPU, memory, storage or add more replicas as needed.
You can get started with Cloudflare Workers, WebSockets, Durable Objects and Hyperdrive for free. See their documentation for resource limits and upgrade options.
Sharding Durable Objects
A lot of what I built for this demonstration is about global distribution. However, I'm still relying on a single Durable Object to send WebSocket updates to all users. Cloudflare's own documentation gives guidance that you can scale Durable Objects horizontally—sharded with a key—should you come close to exhausting their allocated resources.
Conclusion
This demo shows how to build a real-time application with Cloudflare and PlanetScale. It covers the basics of connecting to the database, setting up real-time updates, and simulating load. It also covers some of the best practices for building a real-time application, such as using a Durable Object to coordinate real-time updates and using a WebSocket to broadcast updates to all connected browsers.
As always, you may need to adjust the approaches in this demo for your own application, and introduce some additional features to make your application even more robust. But as you can see, the time to a quite complicated proof-of-concept which is relatively sound has never been lower and all built on a stack that's ready to scale.]]></content>
        <summary><![CDATA[Build a real-time application with PlanetScale and the Cloudflare global network. Infrastructure choices you won't need to migrate away from once you hit scale.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing the PlanetScale MCP server</title>
        <link href="https://planetscale.com/blog/introducing-planetscale-mcp-server" />
        <id>https://planetscale.com/blog/introducing-planetscale-mcp-server</id>
        <published>2026-01-29T00:00:00.000Z</published>
        <updated>2026-01-29T00:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today we're releasing the new PlanetScale MCP server, bringing your database directly into your AI tools. With the Model Context Protocol (MCP), Claude, Cursor, Open Code, and other AI tools can now connect to the PlanetScale API to help you improve and better understand your database.
What is the PlanetScale MCP server?
The PlanetScale MCP server is a hosted MCP server that exposes your PlanetScale organizations, databases, branches, schema, and Insights data to MCP-compatible tools. It's authenticated via OAuth for configurable access to permissions and scopes, and accessible from any client that supports MCP servers.
Configurable access
You have full control over which databases the MCP server is able to connect to and access with configurable read-only or full-access permissions which can be set for both production and development database branches.
We advise caution when giving LLMs write access to any production database. Always carefully review queries before execution.
Available tools
The MCP server exposes the following tools:
get_insights — Access query performance data and patterns from PlanetScale Insights
list_organizations / get_organization — List and inspect your PlanetScale organizations
list_databases / get_database — List and inspect databases within an organization
list_branches / get_branch / get_branch_schema — List and inspect database branches and their schemas
list_regions_for_organization — List the regions available for an organization
list_cluster_size_skus — List available cluster sizes for an organization (filter by engine, include rates, or specify a region)
search_documentation — Search across PlanetScale for documentation, code examples, API references, and guides
Available if granted permission:
execute_read_query — Run read queries (automatically routed to replicas when available)
execute_write_query — Run write queries with built-in safety checks
list_invoices / get_invoice_line_items — View billing and invoice details
We will be actively adding to this list of tools. Please let us know how you're using the MCP and what additional data would be useful to you.
Safe and intelligent query execution
Our MCP server includes built-in safeguards and optimizations for running queries:
Automatic replica routing — Read-only queries are automatically run against a replica if your database has replicas configured.
Ephemeral credentials — Each query uses short-lived credentials that are created on demand and deleted immediately after execution.
Built-in query tracking — All queries include source=planetscale-mcp SQL comments, making them easy to identify by tags in PlanetScale Insights.
Destructive query protection — UPDATE or DELETE statements without a WHERE clause are blocked, and TRUNCATE is not allowed.
Human confirmation for DDL — Any schema-changing operations (CREATE, DROP, ALTER, etc.) prompt the LLM to request human confirmation before proceeding.
PlanetScale MCP use cases
The MCP server gives your AI tools direct access to your database metadata, query patterns, and performance metrics. Here's how you can use it:
Optimize your database schema and queries
Use our MCP server to analyze your database structure, identify bottlenecks, and suggest schema improvements. Ask questions like "Why is this query slow?" or "How should I index this table?" and get answers based on your actual database schema and query patterns.
Use natural language to learn about your data
Pull metrics like daily signups, active users, or conversion rates without writing SQL. Read-only queries are automatically routed to a replica. Get instant insights into how your product is performing and identify trends early.
Debug with full context
Our MCP server has direct access to your schema, indexes, and query patterns. Combining this production context with your codebase gives the LLM the data it needs to find problems and suggest fixes.
Get started
The PlanetScale MCP server is available now. Check out the setup guide to connect your AI tools to your database in minutes.]]></content>
        <summary><![CDATA[Connect Claude, Cursor, and other AI tools directly to your PlanetScale database to optimize schemas, debug queries, and monitor app performance.]]></summary>
      </entry>
    
      <entry>
        <title>Database Transactions</title>
        <link href="https://planetscale.com/blog/database-transactions" />
        <id>https://planetscale.com/blog/database-transactions</id>
        <published>2026-01-14T00:00:00.000Z</published>
        <updated>2026-01-14T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Transactions are fundamental to how SQL databases work.Trillions of transactions execute every single day, across the thousands of applications that rely on SQL databases.
What is a database transaction?
A transaction is a sequence of actions that we want to perform on a database as a single, atomic operation.An individual transaction can include a combination of reading, creating, updating, and removing data.
In MySQL and Postgres, we begin a new transaction with begin; and end it with commit;.Between these two commands, any number of SQL queries that search and manipulate data can be executed.
The example above shows a transaction begin, three query executions, then the commit.You can hit the ↻ button to replay the sequence at any time.The act of committing is what atomically applies all of the changes made by those SQL statements.
There are some situations where transactions do not commit.This is sometimes due to unexpected events in the physical world, like a hard drive failure or power outage.Databases like MySQL and Postgres are designed to correctly handle many of these unexpected scenarios, using disaster recovery techniques.Postgres, for example, handles this via its write-ahead log mechanism (WAL).
There are also times when we want to intentionally undo a partially-executed transaction.This happens when midway through a transaction, we encounter missing / unexpected data or get a cancellation request from a client.For this, databases support the rollback; command.
In the example above, the transaction made several modifications to the database, but those changes were isolated from all other ongoing queries and transactions.Before the transaction committed, we decided to rollback, undoing all changes and leaving the database unaltered by this transaction.
By the way, you can use the menu below to change the speed of all the sessions and animations in this article.If the ones above were going too fast or too slow for your liking, fix that here!
A key reason transactions are useful is to allow execution of many queries simultaneously without them interfering with each other.Below you can see a scenario with two distinct sessions connected to the same database.Session A starts a transaction, selects data, updates it, selects again, and then commits.Session B selects that same data twice during a transaction and again after both of the transactions have completed.
Session B does not see the name update from ben to joe until after Session A commits the transaction.
Consider the same sequence of events, except instead of commiting the transaction in Session A, we rollback.
The second session never sees the effect of any changes made by the first, due to the rollback.This is a nice segue into another important concept in transactions: Consistent reads.
Consistent Reads
During a transaction's execution, we would like it to have a consistent view of the database.This means that even if another transaction simultaneously adds, removes, or updates information, our transaction should get its own isolated view of the data, unaffected by these external changes, until the transaction commits.
MySQL and Postgres both support this capability when operating in REPEATABLE READ mode (plus all stricter modes, too).However, they each take different approaches to achieving this same goal.
Postgres handles this with multi-versioning of rows.Every time a row is inserted or updated, it creates a new row along with metadata to keep track of which transactions can access the new version.MySQL handles this with an undo log.Changes to rows immediately overwrite old versions, but a record of modifications is maintained in a log file, in case they need to be reconstructed.
Let's take a close look at each.
Multi-row versioning in Postgres
Below, you'll see a simple user table on the left and a sequence of statements in Session A on the right.Click the "play sessions" button and watch what happens as the statements get executed.
Let's break down what happened:
begin starts a new transaction
An update is made to the user with ID 4, changing the name from "liz" to "aly".This causes a new version of the row to be created, while the other is maintained.
The old version of the row had its xmax set to 10 (xmax = max transaction ID)
The new version of the row also had its xmin set to 10 (xmin = min transaction ID)
The transaction commits, making the update visible to the broader database
But now we have two versions of the row with ID = 4.Ummm... that's odd!The key here is xmin and xmax.
xmin stores the ID of the transaction that created a row version, and xmax is the ID of the transaction that caused a replacement row to be created.Postgres uses these to determine which row version each transaction sees.
Let's look at Session A again, but this time with an additional Session B running simultaneously.Press "play sessions" again.
Before the commit, Session B could not see Session A's modification.It sees the name as "liz" while Session A sees "aly" within the transaction.At this stage, it has nothing to do with xmin and xmax, but rather because other transactions cannot see uncommitted data.After Session A commits, Session B can now see the new name of "aly" because the data is committed and the transaction ID is greater than 10.
If the transaction instead gets a rollback, those row changes do not get applied, leaving the database in a state as if the transaction never began in the first place.
This is a simple scenario.Only one of the transactions modifies data.Session B only does select statements!When both simultaneously modify data, each one will be able to "see" the modifications it made, but these changes won't bleed out into other transactions until commit.Here's an example where each transaction selects data, updates data, selects again, commits, and finally both do a final select.
The concurrent transactions cannot see each other's changes until the data is committed.The same mechanisms are used to control data visibility when there are hundreds of simultaneous transactions on busy Postgres databases.
Before we move on to MySQL, one more important note.What happens to all those duplicated rows?Over time, we can end up with thousands of duplicate rows that are no longer needed.There are several things Postgres does to mitigate this issue, but I'll focus on the VACUUM FULL command.When run, this purges versions of rows that are so old that we know no transactions will need them going forward.It compacts the table in the process.Try it out below.
Notice that when the vacuum full command executes, all unused rows are eliminated, and the gaps in the table are compressed, reclaiming the unused space.
Undo log in MySQL
MySQL achieves the consistent read behavior using a different approach.Instead of keeping many copies of each row, MySQL immediately overwrites old row data with new row data when modified.This means it requires less maintenance over time for the rows (in other words, we don't need to do vacuuming like Postgres).
However, MySQL still needs the ability to show different versions of a row to different transactions.For this, MySQL uses an undo log — a log of recently-made row modifications, allowing a transaction to reconstruct past versions on-the-fly.
Notice how each MySQL row has two metadata columns (in blue).These keep track of the ID of the transaction that updated the row most recently (xid), and a reference to the most recent modification in the undo log (ptr).
When there are simultaneous transactions, transaction A may clobber the version of a row that transaction B needs to see.Transaction B can see the previous version(s) of the row by checking the undo log, which stores old values so long as any running transaction may need to see it.
There can even be several undo log records in the log for the same row simultaneously.In such a case, MySQL will choose the correct version based on transaction identifiers.
Isolation Levels
The idea of Repeatable reads is important for databases, but this is just one of several isolation levels databases like MySQL and Postgres support.This setting determines how "protected" each transaction is from seeing data that other simultaneous transactions are modifying.Adjusting this setting gives the user control of the tradeoff between isolation and performance.
Both MySQL and Postgres have four levels of isolation:From strongest to weakest, these are: Serializable, Repeatable Read, Read Committed, Read Uncommitted.
Stronger levels of isolation provide more protections from data inconsistency issues across transactions, but come at the cost of worse performance in some scenarios.
Serializable is the strongest.In this mode, all transactions behave as if they were run in a well-defined sequential order, even if in reality many ran simultaneously.This is accomplished via complex locking and waiting.
The other three gradually loosen the strictness, and can be described by the undesirable phenomena they allow or prohibit.
Phantom reads
A phantom read is one where a transaction runs the same SELECT multiple times, but sees different results the second time around.This is typically due to data that was inserted and committed by a different transaction.The timeline below visualizes such a scenario.The horizontal axis represents time passing on a database with two clients.Hit the ↻ button to replay the sequence at any time.
After serializable, the next least strict isolation level is called repeatable read.Under the SQL standard, the repeatable read level allows phantom reads, though in Postgres they still aren't possible.
Non-repeatable reads
These happen when a transaction reads a row, and then later re-reads the same row, finding changes by another already-committed transaction.This is dangerous because we may have already made assumptions about the state of our database, but that data has changed under our feet.
The read committed isolation level, the next after repeatable read, allows these and phantom reads to occur.The tradeoff is slightly better database transaction performance.
Dirty reads
The last and arguably worst is dirty reads.A dirty read is one where a transaction is able to see data written by another transaction running simultaneously that is not yet committed.This is really bad!In most cases, we never want to see data that is uncommitted from other transactions.
The loosest isolation level, read uncommitted, allows for dirty reads and the other two described above.It is the most dangerous and also most performant mode.
Concurrent writes
The keen-eyed observer will notice that I have ignored a particular scenario, quite on purpose, up to this moment.What if two transactions need to modify the same row at the same time?
Precisely how this is handled depends on both (A) the database system and (B) the isolation level.To keep the discussion simple, we'll focus on how this works for the strictest (SERIALIZABLE) level in Postgres and MySQL.Yet again, the world's two most popular relational databases take very different approaches here.
MySQL: Row-level locking
Simply put, MySQL handles conflicting writes with locks.
A lock is a software mechanism for giving ownership of a piece of data to one transaction (or a set of transactions).Transactions obtain a lock on a row when they need to "own" it without interruption.When the transaction is finished using the rows, it releases the lock to allow other transactions access.
Though there are many types of locks in practice, the two main ones you need to know about here are shared locks and exclusive locks.
A shared (S) lock can be obtained by multiple transactions on the same row simultaneously.Typically, transactions will obtain shared locks on a row when reading it, because multiple transactions can do so simultaneously safely.
An exclusive (X) lock can only be owned by one transaction for any given row at any given time.When a transaction requests an X lock, no other transactions can have any type of lock on the row.These are used when a transaction needs to write to a row, because we don't want two transactions simultaneously messing with column values!
In SERIALIZABLE mode, all transactions must always obtain X locks when updating a row.Most of the time, this works fine other than the performance overhead of locking.In scenarios where two transactions are both trying to update the same row simultaneously, this can lead to deadlock!
MySQL can detect deadlock and will kill one of the involved transactions to allow the other to make progress.
Postgres: Serializable Snapshot Isolation
Postgres handles write conflicts in SERIALIZABLE mode with less locking, and avoids the deadlock issue completely.
As transactions read and write rows, Postgres creates predicate locks, which are "locks" on sets of rows specified by a predicate.For example, if a transaction updates all rows with IDs 10–20, it will take a lock on the predicate WHERE id BETWEEN 10 AND 20.These locks are not used to block access to rows, but rather to track which rows are being used by which transactions, and then detect data conflicts on-the-fly.
Combined with multi-row versioning, this lets Postgres use optimistic conflict resolution.It never blocks transactions while waiting to acquire a lock, but it will kill a transaction if it detects that it's violating the SERIALIZABLE guarantees.
Let's look at a similar timeline from the MySQL example, but this time watching Postgres' optimistic technique.
The difference is subtle visually, but implemented in quite different ways.Both Postgres and MySQL leverage the killing of one transaction in favor of maintaining SERIALIZABLE guarantees.Applications must account for this outcome, and have retry logic for important transactions.
Conclusion
Transactions are just one tiny corner of all the amazing engineering that goes into databases, and we only scratched the surface!But a fundamental understanding of what they are, how they work, and the guarantees of the four isolation levels is helpful for working with databases more effectively.
What esoteric corner of database management systems would you like to see us cover next?Join our Discord community and let us know.
Happy databasing.]]></content>
        <summary><![CDATA[What are database transactions and how do SQL databases isolate one transaction from another?]]></summary>
      </entry>
    
      <entry>
        <title>Automating our changelog with Cursor commands</title>
        <link href="https://planetscale.com/blog/automating-with-cursor-commands" />
        <id>https://planetscale.com/blog/automating-with-cursor-commands</id>
        <published>2026-01-07T00:00:00.000Z</published>
        <updated>2026-01-07T00:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Ever since Cursor commands were released, we've been usingthem to find ways to shortcut common tasks at PlanetScale.
A Cursor command allows you to add slash (/actions) commands directly into Cursor. Each command guides the LLM on completing a common and repeatable workflow for you.
Updating our changelog
Each time we ship a feature or improvement at PlanetScale, we release an update on our changelog. Each changelog entry is a markdown file in a single directory checked into a Git repository.
Whenever someone wants to write a new entry, they add a markdown file, open a GitHub pull request and merge it. Shortly after, it gets published on our website and sent out on the changelogRSS feed.
Example changelog:---
title: 'Webhook API endpoints'
createdAt: '2025-04-25'
---

We've just added new...

Each changelog follows a similar format and almost always ships after we have already written the docs for the feature.
This makes it a perfect use case for Cursor to generate. The format is specific and the contextabout the feature is already in our documentation repo. Cursor has everything it needs to publish for us.
Iterating to perfection
We found the easiest way to make a rule is to have Cursor create the initial version itself based on a task we just had it complete.
For example, if I recently used Cursor to update our API docs, I'd finish up the conversation by having it make it repeatable with a command in the future.
Please create a new Cursor command for the process we just went through, so that it is easy to replicate in the future for others. Call it /updateapi.
This gives you a good starting point. From there, each time you run it, you can update the command if the results aren't satisfactory. We've found it only takes a couple tweaksto get the workflow into a place that we can rely on it.
Launching from Slack
Another benefit of using commands is that they can be kicked off from the Slack bot.

This resulted in Cursor automatically opening the pull request on my behalf. All I needed to do was review and merge it.
Our changelog command
Here's an example of what our changelog command looks like.## Create a changelog

This command creates a new changelog entry following PlanetScale's established format and style guidelines.

### Changelog Format Requirements

**File Structure:**
- Filename: `kebab-case-title.md` (descriptive, lowercase with hyphens)
- Location: `content/changelog/`

**Frontmatter:**
---
title: 'Human-readable title'
category: 'Feature|Enhancement|Bug Fix' # Optional
createdAt: 'YYYY-MM-DD' # Current date, sometimes with time
---

**Content Guidelines:**
- **Concise**: 1-3 paragraphs maximum
- **Consistent**: Examine recent similar changelogs to understand format
- **Simple language**: Avoid jargon, be conversational
- **Human tone**: Informal, not corporate-sounding
- **Avoid "programmatically"**: Do not use this word in changelog entries
- **Clear scope**: Explicitly mention if feature is Vitess-only or Postgres-only
- **External links**: Link to relevant documentation when available
- **Screenshots**: Include if available, using `![Alt text](./filename.png)` format

**Common Patterns:**
- Start with what was added/changed
- Explain the benefit or use case
- Include links to documentation with `**[Read more](/docs/path)**`
- For API features, link to API docs
- For UI features, include screenshots
- Use bullet points for multiple related items
]]></content>
        <summary><![CDATA[How PlanetScale uses Cursor commands to automate our changelog entries]]></summary>
      </entry>
    
      <entry>
        <title>Postgres 18 is now available</title>
        <link href="https://planetscale.com/blog/postgres-18-is-now-available" />
        <id>https://planetscale.com/blog/postgres-18-is-now-available</id>
        <published>2025-12-17T00:00:00.000Z</published>
        <updated>2025-12-17T00:00:00.000Z</updated>
        
        <author>
          <name>Chris Sinjakli</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Postgres 18 is now available on PlanetScale.Starting today, when you create a new database, the default version will be 18.1.You can select a prior version using the dropdown on the database creation page.

This combined with our recent launches of $5 single-node databases and $50 Metal databases makes it the perfect platform to power your Postgres-backed applications.
What's new in 18?
The Postgres 18 release notes have all the details on what's changed since version 17.The highlights include:
A new asynchronous I/O system that can improve query performance
Built-in support for UUIDv7 with the uuidv7() function
More queries can make use of multi-column indexes thanks to the new Skip Scan optimization
Read our benchmarks comparing Postgres 17 vs 18 to learn more about performance improvements and what to expect.
Upgrading from Postgres 17 to 18
We don't currently offer an automated, in-place upgrade from Postgres 17 to 18.To upgrade, create a new Postgres 18 database and perform an online migration from your existing PlanetScale Postgres 17 database using our import guides.]]></content>
        <summary><![CDATA[Postgres 18 is now available on PlanetScale]]></summary>
      </entry>
    
      <entry>
        <title>Using MotherDuck with PlanetScale</title>
        <link href="https://planetscale.com/blog/using-motherduck-with-planetscale" />
        <id>https://planetscale.com/blog/using-motherduck-with-planetscale</id>
        <published>2025-12-16T00:00:00.000Z</published>
        <updated>2025-12-16T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[DuckDB has gained significant traction for OLAP workloads.It's powerful, flexible, and has a feature-rich SQL dialect, making it perfect to use for analytics alongside OLTP-oriented relational databases.
Today, we're excited to announce support for the pg_duckdb extension for Postgres databases on PlanetScale alongside our partnership with MotherDuck.
DuckDB in Postgres
DuckDB can be run as a standalone OLAP database, but also alongside Postgres via the pg_duckdb extension.The extension integrates DuckDB's column-store analytics engine right inside of Postgres, allowing you to seamlessly combine OLTP and OLAP queries over Postgres connections.
When enabled, tables can be created either using the standard Postgres table format or temporary tables in the DuckDB vectorized column format.Queries can then be selectively executed either using the Postgres engine or DuckDB.pg_duckdb can also be used to work with and query external datasources in popular formats like Apache Parquet and Iceberg.
Having DuckDB as a built-in extension makes data movement between Postgres and DuckDB formats simpler, and unifies the experience of combining analytics results with the rest of your relational data.
MotherDuck
Though DuckDB is extremely powerful, many prefer to separate analytical compute from OLTP compute.This is useful to ensure that heavy analytics queries don't negatively impact application performance, and vice-versa.
MotherDuck is a cloud data warehouse with deep integration and support for DuckDB, and is a perfect solution to this problem.The pg_duckdb extension supports offloading analytics queries to the MotherDuck cloud.Analytics queries can be executed from within your PlanetScale Postgres database, but the analytics query execution can be offloaded to your data sets stored in the MotherDuck cloud.The results can then be returned to Postgres for further processing.
To use DuckDB and MotherDuck together with your PlanetScale database:
Enable pg_duckdb via the "Extensions" table on the "Clusters" page of your database.

Connect to your Postgres database and run GRANT CREATE ON SCHEMA public to pscale_superuser; to allow the addition of the MotherDuck catalog in Postgres and CREATE EXTENSION pg_duckdb; to create the extension.
Add your MotherDuck token with CALL duckdb.enable_motherduck('YOUR_TOKEN');
Start running your analytics queries!
Check out our docs and the MotherDuck docs for more information on how to use pg_duckdb with MotherDuck.]]></content>
        <summary><![CDATA[Using MotherDuck with PlanetScale]]></summary>
      </entry>
    
      <entry>
        <title>$50 PlanetScale Metal is GA for Postgres</title>
        <link href="https://planetscale.com/blog/50-dollar-planetscale-metal-is-ga-for-postgres" />
        <id>https://planetscale.com/blog/50-dollar-planetscale-metal-is-ga-for-postgres</id>
        <published>2025-12-15T00:00:00.000Z</published>
        <updated>2025-12-15T00:00:00.000Z</updated>
        
        <author>
          <name>Richard Crowley</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today we’re making PlanetScale Metal for Postgres available in smaller sizes and at much lower price points, all the way down to the new M-10 for as little as $50 per month.We’ve lowered the floor from 16GiB of RAM with four sizes all the way to 1GiB and paired these with eight storage capacities ranging from 10GB to 1.2TB.
These new sizes are powered by the same blazingly fast, locally attached NVMe drives that customers like Cash App, Cursor, and Intercom use to decrease latency, increase reliability, and decrease costs, too.
Decoupling CPU, RAM, and Storage
This release is the first step towards decoupling CPU and RAM from storage capacity, while maintaining all the benefits of PlanetScale Metal.Each of these new CPU and RAM sizes can choose from at least five storage capacities, all of which still use locally attached NVMe drives.Customers can spec their PlanetScale Metal database to perfectly match their workload while still enjoying the lowest possible latency, the fewest possible failure modes, and online resizing.

Decoupling CPU and RAM from storage capacity means you can get as much as 300GB of storage per GiB of RAM, almost four times the highest density AWS offers natively.Or you can max out CPU and RAM on minimal storage to serve small, high-traffic workloads.The choice is finally yours.
Since we launched PlanetScale Metal, customers have asked loudly for two things:
A lower starting price, which we’re reducing today from $589 per month to $50 per month.
Flexibility to buy more storage without also buying more CPU and RAM.
Today we’re proud to deliver on both requests.PlanetScale Metal is now available for Postgres Databases in AWS regions on both Intel and ARM CPUs with more I/O capacity than you can possibly use.Support for GCP is in the works and Vitess will follow soon.
Create a new database or resize one you already have today.]]></content>
        <summary><![CDATA[We've lowered the entry price for using PlanetScale Metal to $50 and added more flexibility in storage-to-compute ratios.]]></summary>
      </entry>
    
      <entry>
        <title>AI-Powered Postgres index suggestions</title>
        <link href="https://planetscale.com/blog/postgres-new-index-suggestions" />
        <id>https://planetscale.com/blog/postgres-new-index-suggestions</id>
        <published>2025-11-21T00:00:00.000Z</published>
        <updated>2025-11-21T00:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Today we're releasing a new feature in PlanetScale Insights: AI-powered index suggestions for PostgreSQL databases. This feature monitors your database workload and periodically suggests new indexes to improve query execution speed and overall database performance. This blog post describes how we used LLMs (large language models) combined with robust validation to generate high quality index suggestions.
Overview
Finding the right indexes for your database is one of the most common and critical operations in maintaining database health. It's also something that needs to be done not just when tables are created, but consistently revisited as an application evolves and query patterns change.
Our experience using LLM chat tools made it clear that LLMs are capable of finding indexes to improve query performance. Given the right set of queries to improve, LLMs frequently find an optimal index. There were two issues we needed to solve, though.
First, LLMs are not always great at deciding when something needs to be modified. Given a problem, LLMs are great at finding an answer but can't always be trusted to decide if maybe nothing needs to change at all.  We are well positioned to solve this issue by using the query performance data Insights collects to only ask for index recommendations after first verifying that there are queries present that are likely to need an index.
Second, sometimes LLMs produce inaccurate results. As anyone who's used LLM tools for software development can tell you, it's crucially important to validate LLM generated solutions before shipping them to production. Before we suggest that a customer make changes to their production database, we need to make sure that our suggestions will actually have the desired effect. To accomplish this we measure the estimated performance of the relevant queries with and without each suggested index, and only show suggestions that result in a substantial improvement.
Our goal with Postgres index suggestions is not just to save you the trouble of asking LLMs for suggestions on your own, but to use the unique data and capabilities available in Insights to produce the best overall results.
Asking the right question
Two things determine the right set of indexes for a given database:
The database schema
The workload (i.e. what queries are being run)
LLMs, like humans, produce better answers if they are asked better questions. Our first task, then, was to filter down the list of query patterns to those that were most likely to actually benefit from an index. When we manually examine a database to find queries that might benefit from an index, we typically look for query patterns with a high ratio of rows read to rows returned. If a query is reading a much larger set of rows than it is returning, it's an indication that an index (or a more selective index) could improve response times.
We also filter the set of query patterns to those are using significant resources. In particular we require that the query pattern is responsible for at least 0.1% of the aggregated runtime of all queries, and that it has been run a minimum number of times. Since indexes incur storage, memory and write overhead, we want to avoid suggesting indexes for ad-hoc or infrequently run queries.
Once we have a set of candidate queries, we filter the schema down to the tables referenced by the query. This keeps the prompt smaller and more focused.
The final component of our prompt is that we ask the LLM to include a reference to the queries that each new index is designed to improve. This data is used in the validation step.
Validation
Now that we have a list of candidate indexes, we can perform the most crucial step: validation. Asking users to create indexes on their production database is an inherently high-stakes activity, and we want to be certain about the quality of the underlying suggestions.
To this end, we perform 2 validation steps:
Parse the generated CREATE INDEX statements to ensure that they are syntactically valid and of the correct form.
Evaluate each candidate query with and without the related index, and ignore any index suggestion that doesn't improve at least one candidate query.
To evaluate the effect of each index, we use the HypoPG extension. HypoPG lets us create hypothetical indexes that do not actually exist (and therefore have no overhead) but which the planner can use in the context of EXPLAIN commands. This allows us to find an estimated cost using the actual Postgres planner, and determine if the predicted cost improvement is substantial enough to justify recommending a new index.
If an index suggestion passes both phases of validation, Insights generates a new index recommendation. Insights shows index suggestions with a table of the queries they are designed to improve, including the estimated reduction in query cost.
]]></content>
        <summary><![CDATA[Introducing AI-powered index suggestions for PostgreSQL]]></summary>
      </entry>
    
      <entry>
        <title>$5 PlanetScale is live</title>
        <link href="https://planetscale.com/blog/5-dollar-planetscale-is-here" />
        <id>https://planetscale.com/blog/5-dollar-planetscale-is-here</id>
        <published>2025-11-14T09:00:00.000Z</published>
        <updated>2025-11-14T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Two weeks ago we announced we'd soon be shipping $5 single node Postgres databases across our global fleet. Today we are announcing that this process is now complete.
You can now spin up single node PlanetScale Postgres databases starting at just $5/month. These non-HA databases are production-ready and a cost-effective option for startups, side projects, proof of concepts, or development. You get all of PlanetScale's developer-friendly features like Query Insights, schema recommendations, in-depth metrics, branching, and reliability, now starting at just $5 per month.

We are also lowering the price of development branches from $10 per month to $5 per month, making it more cost-effective to run additional staging and development environments.
Scaling single node
As your company or project grows, you can easily scale up on PlanetScale. Single node databases can vertically scale if you need to grow beyond the base $5 size. Just go to your Clusters page, select the cluster size you want, and click "Queue instance changes".

You can also switch to HA mode on that same page by selecting "Primary + multi-replica". This will add 2 replicas to your cluster for high availability. You can continue to upsize your cluster and add even more replicas as needed. And when the time comes that you need horizontal scale, we'll soon have Neki, our sharded Postgres solution.
This means you can start your business on PlanetScale and feel at ease knowing you'll never have to worry about a painful migration to a new database provider when you begin to hit scaling issues. PlanetScale runs some of the web's largest workloads, so you'll always be in great hands.
Get started
To get started, sign up for a PlanetScale account and select "Single node" during database creation. For single node pricing, see the pricing page.]]></content>
        <summary><![CDATA[You can now create single node Postgres databases on PlanetScale starting at just $5.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 23</title>
        <link href="https://planetscale.com/blog/announcing-vitess-23" />
        <id>https://planetscale.com/blog/announcing-vitess-23</id>
        <published>2025-11-04T00:00:00.000Z</published>
        <updated>2025-11-04T00:00:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We’re excited to release Vitess 23.0.0 — the latest major version of Vitess — bringing new defaults, better operational tooling, and refined metrics.This release builds on the strong foundation of version 22 and is designed to make deployment and observability smoother, while continuing to scale MySQL workloads horizontally with confidence.
Why this release matters
For production users of Vitess, this release is meaningful in several ways:
Upgrading defaults: Moving to MySQL 8.4 as default future-proofs deployments and signals forward compatibility.
Better metrics: The added observability enables deeper insights into transaction routing, shard behavior, and recovery actions — making debugging and alerting more precise.
Clean-ups & deprecations: Removing legacy metrics and APIs simplifies monitoring and avoids confusion.
Operational strength: Enhanced VTOrc and topology controls reduce risk in large-scale fleets and tighten security boundaries.
What’s new in Vitess 23
Here are some of the standout changes you should know about:
New default versions
The default MySQL version for the vitess/lite:latest image has been bumped from 8.0.40 to 8.4.6.→ PR #18569
VTGate now advertises MySQL version 8.4.6 by default (instead of 8.0.40). If your backend uses a different version, set the mysql_server_version flag accordingly.→ PR #18568
Important upgrade detail for operator users:When upgrading from MySQL 8.0 → 8.4 with the Vitess Operator, you must:
Add innodb_fast_shutdown=0 to your extra .cnf in the YAML.
Apply the file and wait until all pods are healthy.
Switch the image to vitess/lite:v23.0.0.
Remove innodb_fast_shutdown=0 and re-apply.This is only required once when crossing 8.0 → 8.4. See the official release notes.
New and improved metrics
VTGate: new metric TransactionsProcessed (dimensions: Shard, Type) counting transactions processed at VTGate by shard and transaction type.→ PR #18408
VTOrc: new metric SkippedRecoveries (dimensions: RecoveryName, Keyspace, Shard, Reason) tracking how many recoveries were skipped and why.→ PR #18405
These improvements strengthen observability and help operators track system behavior with finer granularity.
Deprecations and removals
VTOrc metric rename: DiscoverInstanceTimings → DiscoveryInstanceTimings.→ PR #18406
Removed deprecated VTGate metrics: QueriesProcessed, QueriesRouted, QueriesProcessedByTable, QueriesRoutedByTable.→ PR #17727
Removed VTOrc API endpoint: /api/aggregated-discovery-metrics.→ PR #18407
Topology & VTOrc enhancements
The --consul_auth_static_file flag now requires at least one credential in the provided JSON.→ PR #18409
VTOrc now supports dynamic control of EmergencyReparentShard-based recoveries.→ PR #18410These changes improve operational safety and resilience for cluster management.
VTTablet and CLI / Docker updates
Managed MySQL configuration now defaults to caching-sha2-password.→ PR #18403
MySQL timezone environment propagation improved.→ PR #18404
gRPC tabletmanager client error behaviors clarified.→ PR #18402
Docker image workflows and flags updated for consistency.→ PR #18411
Upgrade notes
Review custom dashboards: if you relied on removed metrics, update them to new ones (TransactionsProcessed, etc.).
Operator users upgrading 8.0 → 8.4: follow the four-step sequence.
If you override mysql_server_version in VTGate, ensure it matches your backend MySQL version.
Test changes involving reparenting, recovery, or Consul integration in staging first.
What’s next
We continue to evolve Vitess toward:
Deeper MySQL 8.4 compatibility.
Expanded observability across VReplication, MoveTables, and Resharding.
Ongoing Operator improvements for reliability and clarity.
Thanks and acknowledgements
This release was made possible by dozens of contributors from the Vitess community and the PlanetScale team.Thank you for filing bugs, testing RCs, and helping keep Vitess robust and scalable.
Let’s keep scaling.
“Scale beyond single MySQL instances — without giving up SQL semantics.”
– The Vitess Team
To explore every detail, see the👉 Full Release Notes for Vitess 23.0.0]]></content>
        <summary><![CDATA[Vitess 23 is now generally available]]></summary>
      </entry>
    
      <entry>
        <title>$50 PlanetScale Metal</title>
        <link href="https://planetscale.com/blog/50-dollar-planetscale-metal" />
        <id>https://planetscale.com/blog/50-dollar-planetscale-metal</id>
        <published>2025-11-03T08:00:00.000Z</published>
        <updated>2025-11-03T08:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[$50 Metal databases are now available.Learn more
We’re making Metal's performance more accessible with a new entry point: $50/month.Additionally, we are launching much more granular storage sizing for Metal so customers can allocate CPU and Memory independently from storage size.
When PlanetScale Metal was announced we set a new bar for database performance inside AWS and GCP. Products like Cash App, Intercom, and Cursor have seen unprecedented performance improvements by running on Metal.
Metal performance
Our Metal benchmarks showed drastic drops in latency and increases in QPS compared to every provider we tested against. But the $600/month entry price often made it inaccessible to budget-conscious startups that would otherwise benefit from the performance improvements.


Pricing for smaller Metal sizes
Customers can access the new M- class clusters starting at $50/month. These clusters are available in both AWS and GCP and have a smaller footprint than the current M- class clusters.
Unlimited I/O on every M- class means you can expect exceptional performance while your product grows. We've made it so you can interleave different amounts of disk size and vCPUs/RAM to meet your needs:
Available node types:
M-10: 1/8 ARM vCPU, 1 GB RAM, 3 nodes
M-20: 1/4 ARM vCPU, 2 GB RAM, 3 nodes
M-40: 1/2 ARM vCPU, 4 GB RAM, 3 nodes
M-80: 1 ARM vCPU, 8 GB RAM, 3 nodes
M-160: 2 ARM vCPU, 16 GB RAM, 3 nodes
Disk Size
M-10
M-20
M-40
M-80
M-160
10 GB
$50
$80
$150


25 GB
$60
$90
$160


50 GB
$80
$110
$180


100 GB
$110
$140
$200
$320
$570
200 GB
$180
$210
$270
$390
$630
400 GB


$330
$460
$680
800 GB


$530
$650
$890
1200 GB


$610
$740
$980
Smaller sizes are now generally available for Postgres, with smaller sizes for Vitess to follow. Our Vitess fleet is significantly larger than our Postgres fleet, so enabling smaller Metal sizes for Vitess will take more time.
We are excited to see what you build with this new level of performance. See our $50 PlanetScale Metal is GA for Postgres announcement for more details.]]></content>
        <summary><![CDATA[Introducing $50 PlanetScale Metal]]></summary>
      </entry>
    
      <entry>
        <title>Report on our investigation of the 2025-10-20 incident in AWS us-east-1</title>
        <link href="https://planetscale.com/blog/aws-us-east-1-incident-2025-10-20" />
        <id>https://planetscale.com/blog/aws-us-east-1-incident-2025-10-20</id>
        <published>2025-11-03T00:00:00.000Z</published>
        <updated>2025-11-03T00:00:00.000Z</updated>
        
        <author>
          <name>Richard Crowley</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[On 2025-10-20, there was an incident that affected PlanetScale, initially caused by DNS misconfiguration in one of PlanetScale’s service providers, followed by several hours of capacity constraints and network instability. The incident occurred in two distinct phases, with the first affecting the PlanetScale control plane and the second affecting some database branches hosted in AWS us-east-1.
Our design focus on isolation and static stability put us in a good position to weather this incident with minimal impact. During the first phase of the incident, our control plane was impacted, but customer database branches remained fully available. During the second phase, some customer database branches in AWS us-east-1 were impacted by network partitions.
Phase 1
PlanetScale engineers were alerted to problems with our control plane at 7:13 UTC. Our continuous testing in production had started to fail. The tests cover a wide range of PlanetScale’s functionality but at this point were unable to create new database branches. Further investigation showed this was a near-total control plane outage. The service responsible for creating, resizing, and configuring database branches, which is hosted in AWS us-east-1, was unavailable. It depends on our internal secret-distribution service which depends on Amazon S3 which depends on AWS STS which was impacted by the Amazon DynamoDB outage.
Throughout this period, no database branches lost capacity or connectivity.
The PlanetScale dashboard was intermittently available during this phase of the incident. It’s hosted by a provider that, like the PlanetScale control plane, is hosted in AWS us-east-1. Additionally, PlanetScale customers using SSO were unable to login if they weren’t already.
Finally, during this phase we were unable to post updates to https://planetscalestatus.com, though even if we were the site itself was unavailable for at least half an hour.
PlanetScale engineers investigated thoroughly but did not take any corrective actions during this phase of the incident. Service was restored at 9:30 UTC after upstream service providers recovered.
Phase 2
About half an hour after, at 10:05 UTC, PlanetScale engineers were alerted that one of our Kubernetes operators in one of our customers’ large single-tenant installations was exhausting all its available resources. This typically means latency has increased or control plane requests are failing and being retried. Responders quickly identify that we were unable to launch new EC2 instances in us-east-1.
Customers could attempt to create or resize database branches but, because we could not launch new EC2 instances, these requests could not be completed; they remained queued until the incident was resolved. Their existing MySQL or Postgres servers remained available while requests to launch new EC2 instances were queued.
Given that the US East Coast was about to start their Monday, the inability to launch new EC2 instances presented a risk to some of our largest customers who use diurnal autoscaling for the vtgate component of their Vitess clusters. Some were going to be coming into their peak weekly traffic with less than half the vtgate capacity they had the week prior.
PlanetScale engineers made several interventions to minimize the number of EC2 instances we needed to launch in AWS us-east-1:
Temporarily disallowed creating new databases in AWS us-east-1 and changed the default region for new databases to AWS us-east-2.
Delayed scheduling additional backups and canceled pending backups that were waiting to launch an EC2 instance. (PlanetScale’s standard backup procedure launches an additional replica which restores the previous backup and catches up on replication before taking a new backup to avoid reducing the capacity and fault-tolerance of the database during backups.)
Advised PlanetScale Managed customers using vtgate autoscaling to shed whatever load they could by e.g. delaying queue processing or pausing ETL processes.
We also took steps to avoid terminating any running EC2 instances:
Paused our continuous process of draining and terminating EC2 instances more than 30 days old.
Stopped terminating any EC2 instances that became vacant, instead holding them for reuse.
The most important intervention, though, was to temporarily change how we schedule vtgate processes for customers with autoscaling configured. We bin-packed vtgate processes more tightly than usual, running closer to CPU capacity than is typical, in order to provide ample capacity for the US work day.
Alongside issues launching EC2 instances, we observed partial network partitions in AWS us-east-1. Impact appears to begin around 14:30 UTC; however we can’t know if fewer queries reached PlanetScale because of a network partition or because fewer queries were sent due to upstream customer impact. These partitions healed gradually between about 18:30 and 19:30 UTC. During this time, some database servers were reachable from the Internet but couldn’t communicate across availability zones for query routing, replication, or both. Some replicas could reach container registries when they started up but could not replicate from their primary MySQL or Postgres. Some servers had trouble resolving internal DNS names and others had trouble connecting to the internal services those DNS names resolved.
The network partitions caused a significant percentage of some customers’ queries to fail. Not all database branches were affected as the impact depended heavily on which availability zones were in use and whether traffic was crossing between zones. Where possible, we manually sent reparent requests to move primary databases to availability zones known to be healthier or known to be colocated with the customer’s application.
Once the network partitions healed, we found a small number of processes (PlanetScale’s edge load balancer as well as vtgate) which were not able to recover on their own due to the way they experienced the network partition. We restarted these and restored service.
PlanetScale’s incident commander declared the incident was resolved at 20:32 UTC.
Reflecting on our resilience
PlanetScale weathered this incident well. Strong separation of control and data planes meant an outage in our control plane did not affect our customers’ databases. Redundancy and battle-tested automated failover allowed primary database servers to move to the majority side of network partitions. Careful zonal traffic routing avoided network partitions as well as possible.
But every incident offers opportunities for improvement. We are taking steps to better understand and become resilient to the failure modes of SaaS we depend on, including for CI/CD, SSO, Web application hosting and incident communication.
We are investigating more ambitious ways to reduce our runtime dependence on both internal and AWS services.
Network partitions are one of the hardest failure modes to reason about, test, and tolerate. Per AWS's Well-Architected Framework, the use of three availability zones allows us to tolerate the failure of one but only if network connectivity between the other two remains reliable. AWS us-east-1 happens to have six availability zones and we’re looking into how PlanetScale can better use them all to become more resilient to both zonal outages and network partitions between them.
If you want to read more about how we engineer for resilience, read PlanetScale’s Principles of Extreme Fault Tolerance.]]></content>
        <summary><![CDATA[On 2025-10-20, there was an incident that affected PlanetScale, initially caused by DNS misconfiguration in one of PlanetScale’s service providers, followed by several hours of capacity constraints and network instability.]]></summary>
      </entry>
    
      <entry>
        <title>$5 PlanetScale</title>
        <link href="https://planetscale.com/blog/5-dollar-planetscale" />
        <id>https://planetscale.com/blog/5-dollar-planetscale</id>
        <published>2025-10-30T09:00:00.000Z</published>
        <updated>2025-10-30T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[$5 PlanetScale Postgres databases are now generally available.
PlanetScale is synonymous with quality, performance, and reliability. Up until now, the entry level PlanetScale cluster configuration was 3 node, multi-AZ, and highly available. At $30 a month this is incredible value, however, not everyone wants or needs HA.
Every day we get requests for an entry level tier that is more accessible to builders on day 1. People want the quality of PlanetScale and our game changing features like Insights without the cost overhead of 3 nodes.
Over the next couple of months we will be rolling out a single node, non-HA mode for PlanetScale Postgres and introducing a new node type: The PS-5 which is priced at $5 a month. Single node is perfect for development, testing, and non-critical workloads. Customers will be able to vertically scale a single node to meet their needs without having to add replicas or sacrifice durability.
Our starter pricing is now:
Node Class
Mode
Price
PS-5 (arm and intel)
Single node
$5
PS-10 (arm)
Single node
$10
PS-10 (intel)
Single node
$13
PS-10 (arm)
HA (3 node)
$30
PS-10 (intel)
HA (3 node)
$39
If you're bullish on your company's future, you know you'll need to scale eventually, and the database is usually the first bottleneck. We talk to startups daily who experienced unexpected fast growth and have to scramble through emergency migrations to PlanetScale to handle the load, a stressful process when you're in the spotlight. With more approachable pricing from day 1, you can now start small and grow to hyper scale without ever changing your database platform or dealing with a complex migration.]]></content>
        <summary><![CDATA[Introducing the $5 PlanetScale plan.]]></summary>
      </entry>
    
      <entry>
        <title>Benchmarking Postgres 17 vs 18</title>
        <link href="https://planetscale.com/blog/benchmarking-postgres-17-vs-18" />
        <id>https://planetscale.com/blog/benchmarking-postgres-17-vs-18</id>
        <published>2025-10-14T00:00:00.000Z</published>
        <updated>2025-10-14T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Postgres 18 released a few weeks ago, and there's plenty of hype around the improvements it's bringing.Most notably, Postgres 18 introduces the io_method configuration option, allowing users more control over how disk I/O is handled.
Setting this to sync results in the same behavior as 17 and earlier versions.With this, all I/O happens via synchronous requests.
18 introduces two alternatives: worker and io_uring.worker (the new default) causes Postgres to use dedicated background worker processes to handle all I/O operations.io_uring is the change many are excited about for performance reasons, as it uses the Linux io_uring interface to allow all disk reads to happen asynchronously.The hope is that this can lead to significantly better I/O performance.
We conducted detailed benchmarks to compare the performance on Postgres 17 and 18.Let's see if these improvements are all they're hyped up to be.
Benchmark configurations
sysbench was used for executing the benchmarks.The io_uring improvements only apply to reads, so the focus here will be on the oltp_read_only benchmark.This includes both point selects and queries that do range scans and aggregations controlled by the --range_size argument.Though it would be interesting to also benchmark write and read/write combo performance as well, sticking to read-only helps focus this discussion.We set the data size to be TABLES=100 and SCALE=13000000 which produces a ~300 GB database (100 tables with 13 million rows each).
The benchmarks were conducted on four different EC2 instance configurations:
Instance
vCPUs
RAM
Disk
Disk type
IOPS
Throughput
r7i.2xlarge
8
64 GB
700 GB
gp3
3,000
125 MB/s
r7i.2xlarge
8
64 GB
700 GB
gp3
10,000
500 MB/s
r7i.2xlarge
8
64 GB
700 GB
io2
16,000
-
i7i.2xlarge
8
64 GB
1,875 GB
NVMe
300,000
-
All these instances run on the same (or extremely similar) Intel CPUs.We include the i7i instance to show what Postgres is capable of with fast, local NVMe drives.This is what we use for PlanetScale Metal and have seen amazing performance results for Postgres 17.Many other cloud providers only provide a form of network-attached storage, whereas we provide both options.
Each server is warmed with 10 minutes of query load prior to benchmarking.
On each one of these configurations, we ran the sysbench oltp_read_only benchmark with the following configurations for 5 minutes each:
Single connection and --range_size = 100
10 connections and --range_size = 100
50 connections and --range_size = 100
Single connection and --range_size = 10,000
10 connections and --range_size = 10,000
50 connections and --range_size = 10,000
This leads to a total of 24 unique 5-minute benchmark runs.These 24 configurations were run four times each!Once on Postgres 17 and once on each of Postgres 18 with io_method=worker, io_method=io_uring and io_method=sync.This makes a total of 96 benchmark combinations.Go ahead, tell me I'm crazy!
This is an extremely I/O-intensive workload.The data size (300 GB) far exceeds RAM size (64 GB), so there will be significant disk accesses for the queries being executed here.
Single connection
Though a single connection is an unrealistic production workload, it offers a baseline for how the different I/O settings affect straight-line performance.Let's assess the QPS we can achieve here.
Below is the average QPS for all single-connection runs where the --range_size value is set to the default of 100.This means that the full read workload is composed of a combination of point-select queries, and queries that do scans / aggregations of 100-row sequences.

A few things are clear:
On network-attached storage (gp3, io2) Postgres 18 in sync and worker modes perform noticeably better than 17 and 18 with io_uring.I'll admit, this surprised me!My expectation was that io_uring would perform as well as if not better than all these options.
The latency of gp3 and even io2 is clearly a factor in this difference.On an instance with a low-latency local NVME drive, all options are much more evenly matched.
The latency / IOPS of gp3 and even the very expensive io2 drive are a limiting factor.Local disks outperform in all configurations.
For straight-line performance with short-lived queries, Postgres 18 is much faster. Welcome improvements!
Here is the same test except with --range_size=10000.This means the workload has much larger scans / aggregates, which means more sequential I/O and lower QPS:

Local disks still clearly outperform, but the difference between the other three options is less stark.This is due to a combination of (a) more sequential I/O but more importantly (b) more CPU work (aggregating 10k rows is more CPU-intensive than aggregating 100 rows).Additionally, the delta between postgres 17 and 18 is much smaller.
Below is an interactive visual comparing the Postgres 17 results for all instance types with the best performer on Postgres 18, workers.Click on the thumbnails to add or remove lines from the graph and compare various combinations.This is based on a 10-second sample rate.
High concurrency
In real-world scenarios, we have many simultaneous connections and many reads happening at once.Let's look at how each of these servers handles the same benchmark but with much higher load across 50 connections.
Of course, oltp_read_only does not capture a realistic OLTP workload, especially since it does not include writes, but we use it as a proxy for workloads with high read demands.Below, we show the average QPS for all of the 50-connection oltp_read_only with --range_size=100.

Now with a high level of parallelism and increased I/O demand, several additional things are clear:
IOPS and throughput are clear bottlenecks for each of the EBS-backed instances.The different versions / I/O settings don't make a huge difference in such cases.
As we increase the EBS capabilities, the QPS grows in lockstep, and the local-NVME instance outperforms them all.
Postgres 18 with sync and worker have the best performance on all the EBS-backed instances by a small margin.
Again, the same benchmark but with --range_size=10000.

The gp3-10k and io2-16k instances get much closer to local-disk performance.However, this is because we have made the benchmark much more CPU-bound vs I/O-bound, so the low latency of local disks gives less of an advantage (though still the best!)But importantly, we finally have a scenario where io_uring wins!On the NVMe instance, it slightly outperforms the other options.
Below we again compare these results for Postgres 17 and Postgres 18, workers.Click on the thumbnails to add or remove lines from the graph and compare various combinations.
Moderate concurrency
These same benchmarks were also executed with 10 concurrent connections.The results are pretty similar so they will not all be shown, but I do want to point out this graph where --range_size=100:

Look carefully at the first bar group (for gp3-3k).The io_uring setting performed significantly worse than the rest.But if you look at that same part of the graph when there were 50 connections, io_uring performs only slightly worse than the rest.To me, this indicates that io_uring performs well when there's lots of I/O concurrency, but in low-concurrency scenarios it isn't as beneficial.
Cost
Cost should always be a consideration when comparing infrastructure setups.Here are the on-demand costs of each server configuration in AWS:
r7i with gp3 3k IOPS and 125 Mbps: $442.32/mo
r7i with gp3 10k IOPS and 500 Mbps: $492.32/mo
r7i with io2 16k IOPS: $1,513.82/mo
i7i with local NVMe (no EBS): $551.15/mo
And keep in mind, the first three only have 700 GB of storage, whereas the i7i has a 1.8 TB volume!The server with a local NVMe disk is the clear price-performance winner.
Why isn't io_uring the winner?
Given my excitement over the new io_uring capabilities of Postgres 18, I was expecting it to win in many more scenarios.So what's going on here?
For one, this is a very specific type of workload.It is read only, and does a combination of point-selects, range scans, and range aggregations.io_uring surely has other workloads where it would shine.It's also possible that with different postgresql.conf tunings, we'd see improvements from io_uring.
While writing this, I stumbled across Tomas Vondra's excellent blog discussing the new io_method options, how to tune them, and the pros and cons of each.He makes several good points regarding why workers would outperform io_uring, and I recommend you read it.In short:
Index scans don't (yet) use AIO.
Though the I/O happens in the background with io_uring, the checksums / memcpy can still be a bottleneck.
workers allows better parallelism for I/O from the perspective of a single process.
So there are legitimate cases where io_uring won't always perform better!I'd love to see further benchmarks from others, spanning other workload types on configurations.You can find many of the configs used for these tests in the appendix.
Conclusions
Though narrow, this was a fun experiment to compare performance of Postgres versions and I/O settings.My key takeaways are:
Postgres 18 brings nice I/O improvements and configuration flexibility.Great job to the maintainer team!
Local disks are the clear winner.When you have low-latency I/O and immense IOPS, the rest matters less.This is why PlanetScale Metal makes for best-in-class database performance.
Using io_method=worker was a good choice as the new default.It comes with a lot of the "asynchronous" benefits of io_uring without relying on that specific kernel interface, and can be tuned by setting io_workers=X.
There's no one-size-fits-all best I/O configuration.
Though they do benefit, the new workers I/O configuration doesn't help network-attached storage scenarios as much as one might hope.
What else do you want to see benchmarked?Reach out to let us know.
Appendix: configuration
Here are a selection of the critical custom-tuned Postgres configs used for this benchmark:shared_buffers = 16GB          # 25% of RAM
effective_cache_size = 48GB    # 75% of RAM
work_mem = 64MB
maintenance_work_mem = 2GB

wal_level = replica
max_wal_size = 16GB
min_wal_size = 2GB
wal_buffers = 16MB
checkpoint_completion_target = 0.9

random_page_cost = 1.1
effective_io_concurrency = 200
default_statistics_target = 100

max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4

bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2.0

autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.025

logging_collector = on
...more log configs...

shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 2048
track_io_timing = on

jit = on

# io_workers left at default = 3
]]></content>
        <summary><![CDATA[Postgres 18 brings a significant improvement to read performance via async I/O and I/O worker threads. Here we compare its performance to Postgres 17.]]></summary>
      </entry>
    
      <entry>
        <title>Larger than RAM Vector Indexes for Relational Databases</title>
        <link href="https://planetscale.com/blog/larger-than-ram-vector-indexes-for-relational-databases" />
        <id>https://planetscale.com/blog/larger-than-ram-vector-indexes-for-relational-databases</id>
        <published>2025-10-01T00:00:00.000Z</published>
        <updated>2025-10-01T00:00:00.000Z</updated>
        
        <author>
          <name>Vicent Martí</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[With the advent of modern embedding models, capable of distilling the key traits of arbitrary data (including text, images and audio) into multi-dimensional vectors, the capability to index these vectors and perform similarity queries on them has become table stakes for all database products.
There has been a lot of research on this topic over the past decade, but most of it happens in a vacuum: new data structures and algorithms are designed, tested and benchmarked as standalone implementations, attempting to maximize recall and performance without taking in consideration the requirements for real-world usage of such research.
When we started the project to add vector indexes to MySQL to PlanetScale two years ago, we knew we had to implement them as a native MySQL index, because our sharded PlanetScale clusters are backed by native MySQL instances. However, we were surprised to find no existing papers discussing the trade-offs required to implement a vector index in the context of a relational database.
The majority of existing publications discuss data structures that must fit in RAM --- a non starter for any relational database! Many of them expect the indexes to be static, and indexes in a relational database very much are not. Some very few papers discuss how to continuously keep an index up to date with inserts and deletes, but they pay no attention to the transactional requirements of such operations, nor how to store them in a reliable and crash resilient way.
The lack of existing research means that we had to come up with novel solutions for all these issues in order to implement a vector index inside of MySQL that actually behaves like you'd expect an index to behave. We're stoked to be able to share our findings and our design with the community, because we believe that relational databases are the past, the present and the future of data storage, and they're long overdue for a well tailored solution for vector indexing and approximate nearest neighbor search.
Hierarchical Navigable Small Worlds
Let's start from the beginning: HNSW is a industry standard graph-based data structure that enables very efficient approximate nearest neighbor search. When discussing HNSW, it’s crucial to focus on the key word “data structure”. Just like a B-tree is not a relational database, an HNSW graph is not a vector database. There’s a big gap of functionality between the data structure and a fully featured software product that can be used by end users. Nonetheless, many standalone vector databases and embedded vector indexes in relational databases are based on this data structure, and for good reasons. HNSW has been widely adopted because it has many advantages: it has good performance and good recall, and it is simple to implement and maintain.
Like all ANN algorithms, HNSW works for vectors of arbitrary dimensionality. Even though in real-life systems the dimensionality of these vectors is huge (OpenAI embedding vectors have either 1536 or 3072 dimensions!), in this blog post we’re going to use three-dimensional vector data sets, because they’re by far the easiest to visualize and understand. If you’ve somehow managed to escape this cursed prison of flesh and currently exist as an eternal being outside of time and space, please feel free to project the following examples into whichever dimensionality seems more intuitive according to your understanding of reality.
HNSW is a multi-layered graph data structure. Every vector in the indexed dataset maps to one node in the graph, and each node of the graph may exist in one or more layers. The bottom layer of the graph contains all the nodes, and every layer above it contains a strictly decreasing subset of them. For each node in the graph, a bounded amount of edges K links it to the nearest K vectors in the N-dimensional space.
Let’s see an example of the graph and its relations:
On the top, you can see the vectors laid out in the three-dimensional space. On the bottom, you can see the actual graph. Each node in the graph maps to a vector in the 3D space (hover any of them to see the relation!), but keep in mind that the position of the nodes in the graph is arbitrary. We’ve arranged them like that so their relationships are clear and their edges don’t overlap, but the only positioning that really matters is for the points in the 3D space; the distances between these 3D points are what guides our ANN search.
You can also see how from this very simple graph we have a solid starting point to perform ANN searches. If we have a vector V as the target for our query, we can find a node in the graph that is close to V in N-dimensional space. By traversing all the neighbors of that node, we find K vectors that are close to our target. We can then compute the distance between V and each of those vectors and sort these distances to obtain a good result for our ANN query.
The problem is now finding that node in the bottom layer that is close to our target vector. This is where the multi-layering of the data structure comes in. The bottom layer of the graph, as we’ve seen, contains all the nodes, but every subsequent layer on top of it contains a random subset of all the nodes below. How many layers are there in the graph then? It depends on the total node count, but the goal is constructing this multi-level graph so that the graph on the top level is quite sparse and fully connected.
Since the nodes at all layers of the graph follow the same connection rule as the bottom layer (each node has edges up to K nodes that are actually nearby in the N-dimensional space), this produces an efficient approach to get progressively closer to our target node in the bottom layer.
We start on the top layer of the graph. Since the layer is fully connected, we can pick an arbitrary node as a starting point and then traverse all its neighbors looking for the one whose vector is closest to our target vector. When we find such node, we descend one layer through the graph. We now perform the same operation on the same node, but we have a larger set of neighbors because each layer of the graph is denser than the previous one. With this approach, every time we go down a layer we’ve found a node that is closer to our target vector, until we finally reach the bottom layer.
Throughout this descent, we keep a priority queue and a set of all the vectors we’ve seen and their distances. The set ensures that we don’t visit the same node twice, and the priority queue incrementally stores the result of our ANN query. Once we’ve visited the K closest neighbors in the bottom layer, the top K nodes in our priority queue, sorted by distance, are the output of the algorithm.
I hope this breakdown was useful! As you’ve seen, one of the key advantages of HNSW as a data structure is that it’s very easy to understand and very easy to implement. But this simplicity comes with its fair share of shortcomings.
HNSW in a relational context
In a vacuum, HNSW is often the king of benchmarks. If you build a properly optimized HNSW graph in memory, and all of it fits in RAM without paging, it’s very hard to beat both the recall and the performance of this data structure. But here we’re talking about building a vector index for a relational database, and we have to consider technical constraints that make HNSW much less appealing.
First and foremost: the datasets that are often stored in a relational database do not fit in RAM. One of the core design choices for our implementation of vector indexes in MySQL is that we need to support indexes that are significantly larger than RAM, because that’s what users usually expect from a relational database. It’s extremely frequent to have terabyte-sized MySQL instances running in a host with just hundreds of GB of RAM, and the same applies at smaller scales (hundreds of GBs of data on-disk in a MySQL instance with 32GB of RAM). InnoDB, the default storage engine for MySQL, manages swapping in and out data from disk into memory to ensure that the system performs well despite this mismatch between available memory and total size of the working set.
Secondly, HNSW is a mostly static data structure, very much unlike the behavior of your average database table. A relational database operates under a constant stream of inserts, updates and deletes, and has been designed to perform all these operations with good performance and following a set of data consistency constraints:atomicity,consistency,isolation,durability.None of this applies to HNSW out of the box unless you put a lot of thought into making this apply!
There are roughly two ways to make this happen:
You can try to store the graph inside the database. Although this solves the transactionality issues, this is also, broadly speaking, a bad idea. Relational databases are not graph databases, as anybody who’s ever tried to store a graph as SQL can attest. The data model by itself is straightforward, but actually running algorithms on the data is a world of pain, because every hop through the graph requires one look-up on the B-tree, and these look-ups often have a dependency chain: you cannot perform the next lookup in parallel because it depends on the previous one.
Alternatively, you can reserve enough memory upfront and keep the graph in-memory. This gives you the best possible performance by far, but it requires a lot of nuance when it comes to maintaining the index and its transactional guarantees.
We know that the performance and recall of HNSW is at this point expected for many AI-related workloads, particularly those with small datasets, so we’ve made sure that our implementation of Vector Indexes in MySQL for PlanetScale supports fully in-memory HNSW indexes with transactional guarantees, with high performance and 99.9% recall if you’re willing to support the trade-offs: you need to allocate enough memory to vector indexes in your database to fit your whole vector dataset, and incrementally updating these indexes is very expensive.
But we also know that in many workloads, allocating so much memory up-front for vector indexes is not a reasonable trade-off, so we’ve made a significant effort to design a new kind of hybrid vector index that really behaves like you’d expect an index in a relational database to behave.
Our design philosophy
A vector index is very different from any other kind of index used in a relational database because it’s approximate. You’re not expecting the results of any query that uses the index to be 100% accurate, and this makes it very appealing to start cutting corners when it comes to the consistency and availability of the data. When a user inserts a thousand vectors into a table in a single database commit, this will make the commit an expensive operation, and it would be very easy to play it fast and loose when updating the vector index for that table accordingly. This is approximate nearest neighbor search after all, isn’t it? If you do a SELECT after that commit and a few of the vectors are missing, hey, that’s recall loss. Shit happens. The vectors will appear in other queries, eventually.
We don’t think this is a reasonable approach when implementing a vector index for a relational database. Beyond pragmatism, our guiding light behind this implementation is ensuring that vector indexes in a PlanetScale MySQL database behave like you’d expect any other index to behave. This means that if you’ve just committed a thousand vectors, the next SELECT will consider all of them for a similarity query, even if not all of them are returned because the recall is not perfect. This means that if you abort the transaction with the thousand vectors, they will not appear in the index at all. And this means that this all applies always, whether your database is in the process of failing over or recovering from a crash.
This engineering philosophy comes with a clear set of trade-offs: the disk-backed, transactional indexes of our implementation are necessarily slower than any other implementation that operates fully in-memory, including our very own HNSW indexes which you can always opt-in into. After all, our larger-than-RAM indexes store their actual vector data inside of InnoDB, which is a fully fledged ACID storage engine. This is not an engineering decision we took lightly; we arrived at this design after a lot of research — looking at existing databases and at the needs of our customers. But it is a decision we’re happy with. Let’s look into it in detail.
PlanetScale Hybrid Vector Search
Like most vector index designs used in production systems, ours is a hybrid index. This index is composed of two layers: an in-memory HNSW index that stores vectors as a graph, based on our transactional HNSW index implementation, and an on-disk index that stores vectors as posting lists.
The in-memory HNSW index is what we call a ”head index”. This data structure contains a subset of all vectors of the full index. The default is 20%, although you can tune that to be smaller at index construction, trading off reduced memory usage for worse recall. The remaining 80% of the vectors are stored as posting lists (i.e. just raw binary blobs where the vectors appear one after the other) in MySQL’s default storage engine, InnoDB.
The head index contains a random sampling of all the vectors in the dataset, each vector being a "head" or "centroid". For each head in this index, a posting list is generated containing K vectors which are nearby in the N-dimensional space. The value of K depends on the dimensionality of the vectors; the goal is ensuring that all posting lists are of similar size in bytes and that they're never large enough to slow down InnoDB lookups for their blobs.
This design is similar to the one proposed in SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search, but with some key differences. The original paper from Microsoft Research proposes several approaches to construct and store the head index, based on alternative graph-based data structures such as BK Trees and K-D Trees. It is not clear to us why HNSW was not evaluated in the paper, but in our experiments, an optimized implementation of HNSW beats these other data structures in every possible metric (performance, recall, construction time), so our hybrid indexes use HNSW for the head index.
Similarly, the SPANN paper discusses a complex clustering algorithm based on BK Trees to select the subset of vectors that will belong to the head index, but after thorough testing with real-life datasets, we concluded that random sampling provides good results in practice: when the dataset is large enough, the law of large numbers ensures that our random sampling is representative, whilst for small datasets, a slightly unbalanced head index has little effect on the total recall of the index. Most importantly: by sampling randomly from the original dataset of vectors, which are stored directly in the user’s table in InnoDB, we can construct larger-than-RAM indexes requiring only up to 30% of the memory size used by the dataset.
Once constructed, the operational index only maintains allocated 20% of the total dataset (i.e. the size of the head index), fulfilling our first technical goal. With this design, the vast majority of the vector data is stored as posting lists and managed by InnoDB. You do not need enough memory in your database instances to contain the full size of your indexes, as only 20% or less needs to be reserved for queries to succeed.
This design acts as a solid starting point to provide ANN queries on a larger-than-RAM vector dataset. The subsequent challenge now is ensuring that the index can remain fresh, i.e. that it can be continuously updated in real time while serving queries with high recall.
Inserts
The high-level logic for inserting new vectors into the index is as follows: we perform an ANN search on the Head index for every vector we’re trying to append, to find the posting lists where it belongs. Note the plural here! A single vector can land in one or more posting lists, because vectors that are too close to the edge of a cluster will be replicated in neighboring clusters to increase recall. Then, we insert a copy of the vector on each of the posting lists.
So far so easy, but the devil is in the details. What does it mean to insert the vector in a posting list? As we’ve seen, posting lists are serialized blobs stored as values in InnoDB, where their key is the vector ID for the centroid of that specific cluster of vectors. Hence, inserting another vector into the cluster means appending it at the end of the serialized blob. This is a very simple operation in theory, but very complex in the context of a relational database.
If we were to write this operation as a normal SQL query, it’d look something like this:UPDATE vector_index_table
SET postings = CONCAT(postings, :new_vector_data)
WHERE head_id = :target_head_id;

This is because our posting lists are stored into something that looks a lot like a vector_index_table, just lower level: we use InnoDB’s B-tree API directly to read and write our postings. If you’re unsure what a B-tree is or how it relates to storing data in a relational database, my friend Ben has a very good post in this same blog called “B-trees and database indexes” which you should definitely check out. Seriously, I mean it, it’s very good and it has a lot of fun, interactive animations. So go forth and read the things and click the buttons, I’ll wait.
Back already? I hope you’re enjoying your newfound or refreshed knowledge about B-trees. Ben’s blog talks a lot about keys in the B-tree, so let me expand it by talking briefly about values. One major shortcoming of B-trees in the real world is that updating their values can be very expensive. Changing the value of an integer column, for instance, is perfectly fine because all integers take up the same space on the leaves of the B-tree. But changing the value of a BLOB can be a pain in the ass. Blobs can be stored inline in the leaves of the tree, or out-of-tree, where the actual content of the blob is stored in several dedicated InnoDB pages and a pointer to these contents is stored in the B-tree (these are called LOBS in InnoDB, short for Large blOBS). But regardless of their location, the situation is the same: adding data to the end of a blob is just not possible, because after each blob there’s always… huh, something else. Maybe another blob, maybe another row in the B-tree. But there just isn’t space!
So, breaking down the simple SQL query which we’ve just seen above, it turns out that it’s not so simple after all. That CONCAT operation is not “appending” data at the end of the existing blob, it’s creating a brand new blob by copying the existing one and leaving enough space at the end to place the new vector data. The old blob is then left to rot in the B-tree — hopefully it’ll get garbage collected for further inserts in the future. And to make things worse, this is a two-step operation: we must first read the existing blob, then write back a brand new one, so the only way this would be safe to do in practice is by acquiring a row-level lock for the whole operation. InnoDB in fact does this implicitly when running such an SQL query.
In summary: the performance implications of appending data at the end of our posting lists are atrocious. Clearly the original SPANN paper did not take into account the characteristics of a relational database when designing their Insertion algorithm, so we’ll have to come up with a plan B. Let’s start by the most obvious place: how do they actually perform Inserts in the SPANN paper? Do they do it efficiently? And if so, can we copy that design?
Efficient updates on top of LSM trees
The answer lies in the dastardly cousin of the B-tree: the Log-Structured Merge Tree (or LSM tree for short). Unfortunately Ben hasn’t yet gotten around to writing the blog post about LSM trees, so you don’t get to click any more buttons while learning about this topic. Sorry! I hope you’ve at least heard of them (you can peruse the Wikipedia entry on the topic if not), but I’ll give you a quick rundown:
LSM trees maintain key-value pairs (like B-trees), but they’re a hybrid data structure. The K/V data is stored in two places: an in-memory table, that is optimized to stay in memory, and a set of on-disk tables, that are optimized to stay on disk. The key idea behind the performance characteristics of an LSM tree is that data is always flushed in batches between each layer. For instance, once the memory table grows large enough, it is flushed into a single table on disk, which can be written sequentially (this is very fast!). Once there are too many tables on disk, they’re all aggregated into fewer larger tables, which are once again written sequentially to disk. This process, called “compaction”, runs continuously to ensure that there’s a minimal amount of tables on disk.

We can gather some interesting performance insights already with this basic knowledge about LSM trees: write performance must be very good because data is always written sequentially (unlike with a B-tree, which writes data all over the place). Lookup performance must kinda suck, because to look-up an individual key you’d often need to check several tables, both in memory and on-disk (in real-world implementations, this is fixed by using bloom filters to quickly figure out which tables can contain a given key). And range scans must really suck because, well, data is not globally sorted (in real-world implementations, this is fixed by not doing range scans and using a B-tree instead, heehee).
Now, how does all of this mix with the SPFresh paper? We’re getting there. It turns out that the efficient Insert in SPANN vector indexes was designed to be performed on top of LSM trees. That kind of makes sense – we’ve seen LSM trees are very good at writes, but there’s quite a bit more to it. LSM trees are actually uniquely good at the exact same problem we’re trying to solve when inserting data into our vector index: appending a value at the end of an existing value!
Here’s how this works: usually an update for a key/value in an LSM tree means just adding a new key/value pair to its memory table. Once this memory table is eventually flushed to disk and merged with other tables that were previously on-disk, the compaction algorithm will notice that the new key/value pair we’ve inserted is newer than the one that was sitting on disk, and will drop the old one during compaction, leaving only the new.
But there’s this super-cool magic trick you can do with a key/value pair in a LSM tree, which is tagging the pair with a special flag (most implementations call it a “merge” flag) when inserting it into the memtable. What happens then is that the compaction algorithm can see the “merge” flag and instead of considering the value for the key as newer, and replacing any existing on-disk values, it just… merges it! When flushing the new table to disk, it first flushes the old value, and then the new one. We get a lot from this simple trick. The append no longer needs to remove the old value, and it doesn’t need to insert a brand new one either; you just insert the new vector data into the memtable and eventually they’ll be merged on disk. Most critically, you don’t even need to read the old value at insert time so you can append the new data at the end; the concatenation will be performed atomically by the compaction algorithm. So there’s no locking involved during inserts either.
Clearly, it seems like LSM trees are a match made in heaven for this specific vector index algorithm. They just provide an incredibly efficient way to continuously perform insertions that append data to existing values, which is exactly what we want to do here. So this brings our next question: can you use a LSM tree in a relational database?
The answer is a resounding YES. Meta (the company formerly known as Facebook) develops an open-source LSM-based storage engine for MySQL called MyRocks, based on RocksDB, the state-of-the-art when it comes to LSM implementations. However, it’s not as easy as just picking up MyRocks and building our vector indexes on top of it. A big constraint of storage engines in MySQL is that indexes and their tables must use the same storage engine. This is a problem in practice because MyRocks comes with a large set of trade-offs when it comes to performance — the ones we’ve just discussed. It is extremely write-optimized, and point look-ups work quite well, but range scans are very slow.
Unfortunately, the average CRUD app running on PlanetScale just performs better on InnoDB than on MyRocks, and forcing users to switch storage engines from InnoDB to MyRocks if they want to use vector indexes seemed like an unacceptable barrier to adoption.
Because of this, we’ve developed an alternative approach to emulate LSM compaction on top of InnoDB, enabling very fast inserts when building vector indexes without requiring switching storage engines.
Emulating LSM compaction in a B-tree
The approach we’ve employed in our InnoDB-based index is not complex, and yet provides very good results in practice. Instead of structuring our B-tree index for posting lists as a simple mapping between head vector IDs to posting lists, we’ve constructed a composite index. This is an essential feature of all B-trees for relational databases, including InnoDB: it means using more than one value as the key for the index, and since a B-tree is always sorted, it allows us to perform queries on a prefix of our keys, instead of having to match the keys exactly.
Our composite index uses key a tuple of (head_vector_id, sequence), where sequence is a table-local strictly increasing sequence number. With this, we can approach posting list updates in a very efficient way: we insert a new row into the table with the head_vector_id we’re appending to and the next sequence number for the table. This is incredibly friendly to the B-tree compared to the naive approach of looking up the head_vector_id, appending the data to the existing posting list, and then writing it back to the B-tree. It gets rid of the initial lookup altogether, and consequently of all the locking!
This optimization when updating posting lists comes with a price at query time, but it happens to be quite small in practice. When we’re looking up the posting list for a specific head_vector_id, it no longer becomes a point query on the B-tree; we now must perform a range scan for all rows that begin with head_vector_id. The actual posting list is the result of joining all the posting list values in each row together. Fortunately, B-trees are very good at range scans, because adjacent keys are stored next to each other in the pages.
The other problem with this approach is that you can end with a lot of rows on your posting tables, slowly degrading the performance of index queries. Do not fear! This is also solvable. In fact, continuously adding data to the vector index comes with many side effects that degrade its recall and performance, but they can all solved with a similar and elegant approach. Let’s look at these issues one by one.
Splits
There’s one essential way in which this kind of vector index degrades when continuously inserting new data into it. We’ve seen how to efficiently append vector data to our posting lists, but although each individual insert happens very quickly, this is not an operation we can continuously perform. If we keep appending data to each posting list in the index, eventually the postings will become large enough to affect query performance!
One of the key invariants for our index design is that query performance must be bounded, regardless of the size of the index, which means we must prevent any single posting list from growing too large. To accomplish this, we’ve implemented the key idea from a follow-up paper from Microsoft research: SPFresh. This paper expands on the original SPANN index by layering the LIRE protocol (Lightweight Incremental Re-balancing). LIRE is composed of a set of small incremental operations which are continuously performed in the background in order to maintain the key properties of the index and ensure high recall and good query performance.
The most important of these operations is the Split. A split may be triggered during any insert operation to the index, whenever we detect that a posting list has grown too large after appending new vectors to it. Like all LIRE operations, the split is triggered to run in the background; it does not happen as part of the commit that appended the vectors. This means that the vectors become visible for queries as soon as the commit is finalized, but that the posting list that contains them and which is now too large, will soon be picked up by a background job and optimized.
The logic behind a split is quite self-explanatory: we gather all the vectors contained in the large posting list and split them. The original LIRE protocol in the paper always performs this split using K-means clustering to detect two new clusters. The centroids for these clusters are then inserted into the Head Index, and the original posting list is written back to InnoDB as two posting lists of very similar sizes, each keyed by one of the new Heads. For our implementation, we’ve improved this process further by allowing the K-means clustering to split the posting list into K different clusters, where K is calculated based on the expected size of each posting list. During normal operation, this K is often 2, but when the index is under high insertion load, a single posting list can grow very large before its corresponding background job gets to pick it up for splitting, so being able to immediately split it into multiple smaller postings is an important optimization.
By continuously performing splits whenever an individual posting list grows too large, we ensure that the performance of index queries remains stable. But these splits actually have a negative and counter-intuitive impact on the other key property of the index: they degrade its recall.
Reassignments
Consider this three-dimensional dataset. We can see three clusters, each keyed by a different Head from the HNSW index. The largest of these clusters is actually too large compared to the other two (it just contains more vectors), so it has been scheduled for splitting.
The result of the split is as we’d expect: two new heads have been created and inserted into the head index, and we now have two smaller clusters, each of them stored in a separate posting list. But pay close attention to the highlighted vectors in the newly formed clusters. Can you see what’s off about them?
Hovering over the highlighted vectors gives you a big clue, as you can see the distances to the head of their cluster and to all nearby heads of other clusters. It turns out that since the centroid of the original cluster has been split into two, some of these vectors are no longer assigned to their nearest centroid in three-dimensional space! This breaks the NPA (nearest partition assignment) invariant of the index.
This is an unfortunate side-effect of performing a split that only considers one posting list of the index. When clustering all the vectors in this posting list, the K-means result yields optimal results locally, but if we start considering other surrounding vectors, we see that their distances to the new centroids are a local minimum. Hence, performing splits over and over again eventually degrades the recall of the index, because it places many vectors in posting lists where they don’t belong.
To fix this incrementally, LIRE introduces another background operation called Reassign. The goal of a re-assign operation is finding individual vectors in posting lists that do not belong there, because there’s another nearby cluster whose centroid is actually closer to them, and moving them from their original posting list to their correct one.
There are two performance issues which must be solved for this Reassign operation. The first one is figuring out efficiently which vectors must actually be moved. The original SPFresh paper has a very elegant mathematical proof of how to do this, but we’ll summarize it here for your convenience: whenever a split is performed, we need to keep track whether the distance between each vector and its new head is shorter than the distance between that vector and the old head. Because of the NPA property of the index where head distances are always minimized, if the split results in a shorter distance, we know that it is optimal and does not need to be reassigned. If it results in a longer distance, maybe (but not necessarily!) the vector is in a wrong posting list and needs to be reassigned. This is a very simple heuristic that greatly cuts down on the amount of vectors we need to consider for reassignment operations, and often allows us to skip enqueuing a reassignment operation altogether.
The second performance issue is trickier: when a reassignment must happen, we need an efficient way to move the vectors from their original posting list to their new one. Adding one or more vectors to an existing posting list is the easy part. We’ve seen during Insert that this operation has been extensively optimized for vectors that the user inserts, and those optimization efforts also apply here. The hard part is then removing the vectors from the posting list where they no longer belong.
There is no free lunch when it comes to removing data from a posting list; we've seen the optimization potential that LSM trees provide when it comes to appending data without having to update the tree in-place, but when it comes to removing it, you must pay the full price of loading the posting list, cleaning it up, and writing it back to disk, whether you're using an LSM or a B-tree. Fortunately, the SPFresh paper has an elegant solution to remove this performance bottleneck: versioning all the vectors in each posting list!
The approach is as follows: the data inside the posting lists is actually a tuple of (vector_id, version, vector_data), where version is a 1-byte version counter that keeps increasing until it wraps around. The fresh version for each vector in the index is kept in an in-memory table (a table which in practice is tiny because each entry only occupies 1 byte). When we perform a reassignment, we increase the version count of each moved vector by one, and the posting data appended on its new location has the new version count. Any subsequent queries that load the posting list for the old location will see vectors whose version numbers are stale, and these vectors won’t be considered when performing distance calculations for the query.
With our LSM-like optimization when appending data to posting lists, and the individual vector versioning when removing it, we have a very clean and efficient solution to continuously move vectors around in the index whilst minimizing on-disk write amplification. This combination really is the key to the whole implementation, and what allows the index to remain performant and keep high recall while new vectors are continuously inserted into it.
Merges
Sadly, the design choices we’ve made to keep reassignments efficient come with a drawback, and because of this we must derive one last (I promise!) background operation. As we’ve seen, all vectors stored in our posting lists carry a version number that allows us to mark them as stale, allowing us to move them between posting lists simply by appending them to a nearby posting list with a higher version number, without having to rewrite the original posting list to remove them.
In practice, this means that any posting list in the index can contain a very high amount of useless stale data. This is not only wasteful, but it also lowers the recall of the index because during a query, we can end up looking up a posting list which is large as stored in InnoDB but contains very few actually useful vectors! This is something we want to fix because again, one of the key properties we’re trying to maintain in the index is that all posting lists have similar sizes.
Hence, we introduce a Merge operation. Whenever we perform a read query, we must determine how many vectors of the loaded posting lists are stale (because they must be skipped from the query). This is the perfect place to detect posting lists that contain very few live vectors, and trigger background jobs to Merge them.
A merge operates using a single Head as a starting point. We incrementally look up the nearest heads to that one on the head index and join these neighboring posting lists whilst removing any stale vectors. Once a posting list of our desired size for the index is generated, we calculate a new centroid for it; if any of the old centroids from the original posting lists is within a margin of error to the new centroid, that’s ideal. We can remove all the heads from the head index except that one, and replace the posting list in InnoDB with the new merged posting list. If all the old centroids were too far away, we need to insert a new head into the head index, before deleting all the old ones.
In the original paper, Merges are important to remove stale data from the index, but for our implementation, they have a second, critical effect: they act as the "delayed merge operator" that you'd see in an LSM tree. During normal queries, we not only detect how many vectors in the posting list are stale; we also consider how many rows of the underlying table had to be loaded to compose the resulting posting list. Whenever this number grows too large, we trigger a merge and we get to compact the underlying row storage and remove stale vectors for free. How convenient! A similar dedicated operation called Defragment is also run when the index is under heavy load: it compacts the underlying rows in InnoDB without removing stale vectors or merging nearby postings, and allows the index to remain performant throughout periods of continuous insertions.
Deletes and updates
With all the background LIRE jobs in place, these two user-facing operations which are often the bane of other vector index implementations become trivial. Deletes are performed by marking a delete flag in the versions table — the same table that is used to mark a vector stale because it’s been moved to another posting list. It’s really that simple! We do not need to rewrite a single posting list during deletion, because the stale vectors will eventually be picked up when merged, split or reassigned by the other LIRE jobs.
Updates of vector data are, in our experience, a much more rare occurrence, so we’ve opted to implement them higher in the stack, at the InnoDB level, by transparently updating the vector and the vector ID in the user-facing table that contains the vectors. We then translate this into a delete of the old vector ID and an insert of the new one into the vector index. This results in excellent performance in practice because the delete happens on the versions table and the insert is very efficient and lock-free.
Transactionality and crash resilience
As we’ve seen, the design of this hybrid index requires periodic maintenance operations to ensure that the index can be continuously updated without losing recall or performance. This concept is not new to relational databases; perhaps the most notorious use of maintenance tasks is PostgreSQL’s VACUUM operator, which ensures that the size of its tables does not grow unbounded as their rows are updated. It is, however, rather new in the MySQL world, and we have to pay a lot of attention to ensure that these operations can be performed simultaneously with user-triggered queries and inserts whilst ensuring that the data in the index remains transactionally consistent and that the index is always kept in a valid state even if MySQL crashes in the middle of one of the jobs.
The fact that all posting data is maintained in InnoDB solves a lot of the transactionality requirements, because modifications to posting lists follow the default ACID semantics of the storage engine. By creating a transaction at the start of each maintenance operation, we ensure that its changes are only reflected at the end of it, and only if the operation completes successfully and we choose to commit its transaction. However, there is a part of the index that does not use InnoDB as a source of truth: the HNSW Head Index is kept in memory for performance reasons, and some of the maintenance operations will perform changes to it (e.g. a split will usually remove one head from the index and insert two new ones). To keep the memory index in an always-consistent state, we use a Write Ahead Log tied to the InnoDB transaction for each job. Changes to the HNSW index are stored in a WAL that will eventually be committed together with the changes to the posting lists.
This allows us to keep our Head index with a very efficient in-memory representation that is always in sync with the posting data on disk. If MySQL crashes at any point, during the recovery process we load the last serialized form of the HNSW index (stored in an on-disk blob) and re-apply all the changes from the InnoDB WAL. To prevent the WAL from growing unbounded, we periodically perform Head Index Compaction, a background process that serializes a current in-sync version of the in-memory index to disk and cleans up the WAL once the serialization has been verified to be successful. Performing this kind of WAL compaction might seem like a particularly hairy concurrency problem, but it becomes trivial in practice because of the properties we’ve designed for the index: if you recall our description for all user-facing and background running operations, you’ll notice that selects, inserts, updates and deletes to the vector index do not modify the Head Index — all these operations only require a read-only consistent view of it. Only the background jobs actually perform modifications on the Head Index in order to increase recall and performance. Hence, WAL compaction can be performed by pausing all background jobs in the system while the compaction is running. This allows all user-facing operations to continue without contention while the head index is being serialized, and any background jobs triggered during the compaction are just paused and queued until they’re ready to run.
Conclusion
There are many different approaches to vector indexing aimed at different scales and making different trade-offs. We believe that the design for our hybrid vector index in MySQL hits a sweet spot between scalability and performance, while providing the behavior and functionality that people come to expect from an index in a relational database. There’s of course room for improvement; Facebook’s success with RocksDB as a MySQL storage engine makes it clear that the query and write performance of these vector indexes could be increased by using an LSM as a backend instead of InnoDB’s default B-tree. This would also allow us to simplify the Merge and Defragment background jobs, making the maintenance of the index even more efficient.
Regardless, we’re very happy about the current state of this implementation. Try it out today by creating a new PlanetScale MySQL database. Happy (approximate) searching!]]></content>
        <summary><![CDATA[A new hybrid design for scalable vector indexes and a reference implementation in MySQL]]></summary>
      </entry>
    
      <entry>
        <title>Partnering with Cloudflare to bring you the fastest globally distributed applications</title>
        <link href="https://planetscale.com/blog/partnering-with-cloudflare-fastest-applications" />
        <id>https://planetscale.com/blog/partnering-with-cloudflare-fastest-applications</id>
        <published>2025-09-24T09:00:00.000Z</published>
        <updated>2025-09-24T09:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We are excited to announce that we're partnering with Cloudflare to bring you the easiest way to ship full-stack applications backed by Postgres or MySQL. You can now seamlessly connect Cloudflare Workers to your PlanetScale databases using our new native integration.

Cloudflare's mission is to build a better internet through improved security, better performance, democratized internet infrastructure, and improved reliability. This deeply resonates with PlanetScale's mission: bring you the fastest, most scalable, and most reliable databases without compromising on developer experience.
This partnership delivers immediate benefits for your applications:
Faster setup: Connect your PlanetScale database to Cloudflare Workers with Hyperdrive in just a few clicks
Optimized performance: Leverage Hyperdrive's connection pooling and query caching with PlanetScale's global infrastructure
Reduced latency: Bring your database closer to your users with intelligent edge caching
Better reliability: Combine PlanetScale's database reliability with Cloudflare's global network
The PlanetScale and Cloudflare stack
If you want your application to be fast, you must include the database in that equation.
Combining Cloudflare Workers, Hyperdrive, and PlanetScale relational databases allows you to build modern globally distributed applications without worrying about database connection limits or latency. Your Workers run globally at the edge, Hyperdrive provides intelligent connection pooling and caching, and PlanetScale delivers ultra fast and reliable databases for both Postgres and MySQL/Vitess.
Together, this powerful combination enables you to build data-driven applications that perform like they're running locally for users everywhere. This integration is especially helpful if you already use Cloudflare Workers to access your database, are facing connection limits and connection pooling issues for serverless applications, or want to reduce database latency for global users.
How to use it
The integration is available for both Postgres and Vitess databases. Before getting started, make sure you have a Cloudflare account with Workers enabled and a PlanetScale account.
You can get started either from the Cloudflare dashboard or the PlanetScale dashboard. To connect to your Cloudflare application from the PlanetScale dashboard:
Head to the PlanetScale dashboard.
Create a new database or select the database you'd like to connect to.
Click "Connect".
Create a new User-defined role with the necessary permissions (this will depend on how you're using Cloudflare Workers)
Click "Create role".
Select "Cloudflare" from the connection options below.
Follow the instructions in the dashboard to finish connecting your Cloudflare account.
If you have any questions, reach out to our support team. We look forward to seeing what you build!]]></content>
        <summary><![CDATA[You can now easily set up PlanetScale databases with Cloudflare Workers using this native integration.]]></summary>
      </entry>
    
      <entry>
        <title>Processes and Threads</title>
        <link href="https://planetscale.com/blog/processes-and-threads" />
        <id>https://planetscale.com/blog/processes-and-threads</id>
        <published>2025-09-24T00:00:00.000Z</published>
        <updated>2025-09-24T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[What do Slack, Cursor, Ghostty, and Chrome have in common?
No, they aren't all written in TypeScript.These, along with every other piece of software running on your computer, executes within a Process.
The Process is a fundamental abstraction provided by modern operating systems.An operating system's job is to provide abstractions on top of the underlying CPU, RAM, and disk that allow software to execute code and share resources.There are many important abstractions such as virtual memory and file systems, but the most fundamental is the Process.
This interactive article allows you to build an understanding of what Processes are, how they allow your computer to multitask, and how they differ from Threads.
CPU and RAM
The two most important components of a computer are the CPU and RAM.
The CPU is considered by many to be the brain of the computer.A CPU chip looks like this:
Beneath the shiny metal enclosure is a piece of silicon with billions of transistors, each built at nanometer-scale.The CPU is where the primary sequence of code is executed on the computer.These instructions do very simple things like adding two numbers, jumping to other instructions, and moving data from one location to another.It doesn't sound like much, but when you can execute billions of these instructions each second you can accomplish a lot.
RAM acts as the short-term memory of a computer.Though they come in different shapes and sizes, a typical DDR RAM chip looks like this:
The big black rectangles are the Dynamic Random Access Memory (DRAM) chips, where the data is physically stored.DRAM chips come in capacities of 8, 16, 32 or more gigabytes per chip, and many computers support multiple chips in combination.
RAM is used to store the numbers, text, and other data that the CPU operates on, all in binary format.The CPU makes requests for data stored in RAM and does computations with this data.The CPU and RAM chips are both connected to a motherboard, which bridges communication between the two.
The CPU isn't much use without RAM, and the RAM is of no use without a CPU.They go hand-in-hand, working together to execute code and manipulate data.
Instruction sets
An instruction set is a set of instructions a CPU can execute.These are defined by detailed specifications created by chip manufacturers.The most common ones used in modern CPUs are x86-64 for ones developed by Intel and ARM64 for chips based on the ARM architecture.
Both are quite sophisticated, each including hundreds of unique instructions.To give you an idea of how complex these can get, the x86-64 reference manual is over 5,000 pages!Though complex, at their core they are still just doing basic math, conditional jumping, and memory management.
In this article we're going to use a simpler (made up!) instruction set.It captures much of the same behavior as "real" ones while being easier to follow in the limited space of a blog.Here is our simple instruction set:
We can implement real programs with this, and will visualize these executing on CPU, RAM, and display visuals interactively.We call a sequence of instructions that accomplish some useful task a Program.Here's a simple program that prints all even numbers between 0 and 10 inclusive.Hit the ▶ (play) button below to play and pause the execution.You can use the other two buttons to change the execution speed and reset the simulation.
Sequences of instructions can execute on the CPU, make modifications to the data on RAM, and print messages to the terminal.(We're fans of Ghostty here at PlanetScale, can you tell?)
Running multiple programs
Computers are not limited to running one program at a time.For decades, CPUs have been capable of running many simultaneously.How do they accomplish this?
One thing you might be thinking is: multi-core CPUs!Most CPUs today have multiple execution cores, each of which can run separate programs simultaneously.Nearly all CPUs in modern laptops, desktops, and phones have 2 or more cores.While it is true that this facilitates a computer's ability to multitask (running many programs at once), computers have been able to do this long before multi-core.In fact, we're going to assume you only have one core in your CPU for the rest of this article.
In the early days of computing, computers were fundamentally designed to run only one program at a time from start to finish.A single program was loaded and executed to completion, then the next program could begin.
This limitation was problematic:
Sometimes CPUs need to "wait around" for things like reading data from a hard drive.While it waits, the CPU sits idle.Can we use this idle time to run other programs?
Computers used to be prohibitively expensive and physically huge, taking up entire rooms.It would be costly to give each employee a dedicated machine, so can we share a CPU?
These and other factors drove an innovation: CPU multitasking via processes.
A process is an instance of a program being executed by a computer.The job of an operating system is to manage many processes at once and allow the CPU to switch between them.
Below is an example of a computer with two processes.One of them is responsible for executing a program that prints all positive even numbers up to 10, the other prints all positive odd numbers up to 10.Hit the ▶ (play) button to begin execution.This visual now has a fourth button, which you can use to swap from the current process to another one.
What's happening here?
The first process begins execution to print out even numbers.
After a few seconds it pauses, saves the state of the RAM, and changes processes.
The odd-number process begins executing. After a few seconds it also pauses, saves the RAM, and switches.
This cycle continues until both programs complete.The CPU divides its time up into short time-slices, giving each process a burst of time to make progress.
In the examples here we give each process a few seconds of time to execute, so that our feeble human brains can follow.On real CPUs, the time slices are measured in milliseconds (one millisecond = 1/1000 of a full second).
Modern CPUs are capable of executing well over 1 billion instructions per second.Even if our time slice is 1 millisecond, this means we can execute 1,000,000,000 ÷ 1,000 = 1 million+ instructions per time slice.By quickly switching between processes, our computers are able to give the illusion of multiple programs running at the same time.CPUs switch so fast humans can't even notice!
Context Switching
When a computer swaps execution of one process for another, this is called a context switch.Each context switch requires quite a bit of "work" including switching to kernel mode, saving register state, and virtual memory management (beyond the scope of here!).
The full time of a context switch takes ~5 microseconds on modern CPUs (1 microsecond = 1 millionth of a second).Though this sounds fast (and it is!) it requires executing tens of thousands of instructions, and this happens hundreds of times per second.A CPU typically executes several billion instructions in a second, but managing and switching between processes can consume tens of millions of these.In other words, the convenience of multi-processing comes with a small performance penalty.We can see this visualized below:
The penalty is almost universally considered "worth it" as it's such a convenient abstraction.
Process States
During the lifetime of a process, it can transition between a number of process states.These states are assigned by the Operating System.
While a process is executing on the CPU, it is considered running.Processes can get kicked-off the CPU for one of two reasons: Its time slice is up, or it needs to wait for a disk or network request to continue.In the former case, it moves to the ready state.In the latter case, it moves to the waiting state.
When the process is complete, it moves to the killed state.Hover over or tap on the nodes and arrows in the state diagram below to see how processes flow from one state to the next.
Creating new Processes
There are many pieces of software that are designed to use multiple processes running together, all coordinating to accomplish a single task.
The Postgres database is a perfect example.Postgres is implemented with a process-per-connection architecture.Each time a client makes a connection, a new Postgres process is created on the server's operating system.There is a single "main" process (PostMaster) that manages Postgres operations, and all new connections create a new Process that coordinates with PostMaster.
PlanetScale Postgres is now generally available and it's the fastest way to run Postgres in the cloud.Check out our benchmarks to see for yourself.
Programs create new processes via two main system calls: fork() and execve().
Calling fork() makes a process create an exact clone of itself as a new process.The cloned process gets immediately placed into the ready state, while the current one continues to execute (but this varies).We'll introduce a new instruction, FORK, to do this for our little CPU simulation:
Let's try this out in our simulator.Hit the play button to begin execution.
We call the process that initiated the fork the parent process and the new process the child.When a computer boots up, a single process is initiated and all others are descendants of this one.
However, we don't want all of our processes executing identical code.So, there must be more to this than just fork, right?
Right.
A process can call execve() to replace its currently executing program with a new one.The programs we may want to execute are stored on our computer's hard drive.execve() is given the name of the file to load the program from, and it handles exchanging the instructions and executing the new program.We use the name EXEC for this system call on our simplified CPU:
Let's look at this in action.Again, press the play button.
The program begins execution, forks a new process, then both the parent and child load up a new program that prints out even number strings.These two system calls are how we can spawn new processes that do the wide variety of tasks that run on your computer!
Check out man fork and man execve for more juicy details.
Postgres and MySQL
Postgres is one of the most popular pieces of database software in the world, perhaps only second to MySQL.Postgres is designed to handle many thousands of queries per second and tens or even one hundred+ simultaneous connections.
Postgres uses a model of connection-per-process to handle this concurrency.Every time an application server connects to it, a new process is FORKed to handle the queries that it sends.Connections can last for as short as a second to as long as weeks.Either way, it's not uncommon for a single Postgres server to have many simultaneous connections.
Though processes are a convenient way of handling this, Postgres has received some criticism for this architecture.Processes are heavy: there is memory overhead and a time overhead for managing them.
Consider the following program that is running three processes.Each computes a running sum of the values in a sequence in memory - akin to a SQL query doing an aggregation.
How many instructions could have been executed in all the time the CPU spent switching from one process to the next?
Threads
MySQL is a great contrast, designed to run as a single process (mysqld).However, it is also capable of handling thousands of queries per-second, hundreds of connections, and utilizing multi-core CPUs.It achieves this via threads.
A thread is an additional mechanism for achieving multitasking on a CPU, all within one process.Threads share all the process memory and code (other than their program stacks), but each can be executing at different program locations.They can be switched between, much like process context switches.
Switching between threads is around 5x faster, taking closer to 1 microsecond to complete.If we can architect our applications to handle concurrency with threads, we can achieve better overall performance.
Let's compare the same task being completed by a program that uses multi-processing vs multi-threading.Here we complete the same task as the last example, but with a single process that switches between threads.Notice in the visualization below we never do a full process context switch.Rather, we can switch threads (when the instructions slide to the side) and all of these sequences of execution share the same RAM.
These are rudimentary programs, but these same principles apply to the way that Postgres and MySQL work.Postgres does process-per-connection, MySQL does thread-per-connection.This gives MySQL some advantages in terms of performance in some scenarios.
POSIX threads
On modern Unix systems, new threads are typically created via the pthread_create POSIX library call.Some lower-level programs call this directly, while others build abstractions atop it.
Both fork() and pthread_create() are ultimately wrappers around another system call: clone().
There are a number of flags that you can pass to clone() it to adjust its behavior for spawning either a process or a thread.For example, the CLONE_VM flag causes it to share the virtual memory between the caller and new thread and CLONE_FILES causes it to share file descriptors.We wont get into all these details here, but run man clone on a linux machine for the details.We add a PTCREATE instruction for our CPU to execute:
Multithreaded programs are particularly useful when you have data in RAM that you want to compute multiple things with or subdivide into smaller chunks.Here's an example that calls PTCREATE at the beginning with two sequences of execution:One to find the min value and another to find the max value within the first 7 memory slots of the shared RAM.
This would be more efficient to compute in a single loop, but this shows how several threads can work together.
Connection Pooling
In the database world, thread-per-connection is generally preferable to process-per-connection.However, both MySQL and Postgres suffer from performance issues when the connection counts get too high.Even with threads, each connection requires dedicated memory resources to manage connection state.
MySQL, Postgres, and many other databases use a technique known as connection pooling to help.
Connection poolers sit between clients and the database.All connections from the client are made to the pooler, which is designed to be able to handle thousands at a time.It maintains its own pool of direct connections to the database, typically between 5 and 50.This is a small enough number that the database server is not negatively impacted by too many connections.
The pooler then intelligently distributes incoming queries/transactions across the fixed set of connections.It acts as a funnel: pushing the queries from thousands of connections into tens of connections.
Virtual memory
In the visuals above, we simplified the process of context switching.This is especially true of the data in RAM.The visuals made it appear that all of the RAM data was copied and restored when context switching.This would be incredibly slow, so what actually happens is that OSs use virtual memory.
This is a subject for another day, but in the meantime you can read more about it online or purchase the definitive OS book.
Conclusion
Processes and threads are two foundational abstractions.Every program that runs on your computer or phone runs in a Process, and many use multiple Processes, Threads, or a mix of both!Now you know the basics of how these work and what the tradeoffs are when designing software with them.]]></content>
        <summary><![CDATA[Processes and threads are fundamental abstrations for operating systems. Learn how they work and how they impact database performance in this interactive article.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale for Postgres is now GA</title>
        <link href="https://planetscale.com/blog/planetscale-for-postgres-is-generally-available" />
        <id>https://planetscale.com/blog/planetscale-for-postgres-is-generally-available</id>
        <published>2025-09-22T00:00:00.000Z</published>
        <updated>2025-09-22T00:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[PlanetScale for Postgres is now generally available and out of private preview. To create a Postgres database, sign up or log in to your PlanetScale account, create a new database, and select Postgres. If you are looking to migrate from another Postgres provider to PlanetScale, you can use our migration guides to get started. Finally, if you have a large or complex migration, we can help you via our sales team at postgres@planetscale.com.
What is PlanetScale for Postgres?
Our mission is simple: bring you the fastest and most reliable databases with the best developer experience. We have done this for 5 years now with our managed Vitess product, allowing companies like Cursor, Intercom, and Block to scale beyond previous limits.
We are so excited to bring this to Postgres. Our proprietary operator allows us to bring the maturity of PlanetScale and the performance of Metal to an even wider audience. We bring you the best of Postgres and the best of PlanetScale in one product.
Customers on PlanetScale for Postgres
Hundreds of companies already trust PlanetScale for Postgres to power their production workloads. We say this every time we launch something, but we prefer you hear about real-world usage straight from our customers. Read through some of their stories about their migration to PlanetScale for Postgres below.
Convex: Powered by PlanetScale
Supermemory just got faster on PlanetScale
Scaling Real‑Time Discovery: Inside Layers’ PlanetScale Migration
Why We Migrated from Neon to PlanetScale
Vitess for Postgres
Neki is our Postgres sharding solution. Built by the team behind Vitess combining the best of Vitess and Postgres. Neki is not a fork of Vitess. Vitess’ achievements are enabled by leveraging MySQL’s strengths and engineering around its weaknesses. To achieve Vitess’ power for Postgres we are architecting from first principles and building alongside design partners at scale. When we are ready we will release Neki as an open source project suitable for running the most demanding Postgres workloads. To sign up for the Neki waitlist visit neki.dev.]]></content>
        <summary><![CDATA[PlanetScale for Postgres is now generally available.]]></summary>
      </entry>
    
      <entry>
        <title>Postgres High Availability with CDC</title>
        <link href="https://planetscale.com/blog/postgres-ha-with-cdc" />
        <id>https://planetscale.com/blog/postgres-ha-with-cdc</id>
        <published>2025-09-12T00:00:00.000Z</published>
        <updated>2025-09-12T00:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Change Data Capture (CDC) from your database is a common practice for most businesses. Postgres’ replication design adds high-availability (HA) constraints and operational coupling in ways that are impractical.
The Postgres approach
Start with a standard HA Postgres cluster topology. One primary. Two standbys configured for semi-synchronous replication. A CDC client reading a logical replication slot via pgoutput. WAL level is logical on the primary, and the standbys are configured in synchronous_standby_names = 'ANY 1 (r1, r2)' so commits on the primary wait for at least one standby to flush. The CDC client doesn’t stream continuously; it polls every few hours.
How Postgres moves data across this cluster:
The primary emits WAL
Physical standbys stream and apply WAL
The CDC client reads a logical slot that decodes WAL into row changes
The critical detail is that the logical replication slot is a durable, primary-local object that carries two pieces of state: the oldest WAL the slot requires (restart_lsn) and the most recent position the subscriber has confirmed (confirmed_flush_lsn). The presence of that slot pins WAL on the primary until the CDC client advances. If the client lags, WAL accumulates. That’s expected. The brittle part shows up when you try and achieve HA.
Postgres 17 introduced logical replication failover, so slot state can be synchronized to promotion candidates, but slot eligibility on the replica has caveats; A standby only becomes eligible to carry the slot after the subscriber has actually advanced the slot at least once while that standby is receiving the slot metadata. This guard exists to prevent promoting a node that has never observed real slot progress and would present an inconsistent stream to the subscriber. In practice, if the CDC client hasn’t connected in hours, any freshly added or recently restarted standby won’t be eligible. Attempting a controlled primary promotion becomes impossible without breaking the CDC stream because no replica candidates have an eligible slot.
Failover readiness for a logical slot is determined by three conditions on the standby:
The slot is synchronized on the standby, synced = true.
The slot's position in the WAL is consistent with the position of the standby, not too far behind or too far ahead.
The slot is persistent and not invalidated, temporary = false AND invalidation_reason IS NULL
Explicit failure scenarios:
During a CDC quiet period, logical slots on standbys may remain in temporary status due to position inconsistencies. If forced failover occurs, the temporary slots are not failover-ready. The CDC stream breaks, requiring connector reinitialization and snapshot reload.
Replacing replicas: You add new replicas (fresh pg_basebackup) and plan to retire the old ones. Each new standby begins synchronizing slot metadata from the primary but, by design, starts at a conservative point (older XID/LSN) and won’t consider the slot synchronized until it has seen the subscriber advance. If the CDC client polls every 6 hours, all new replicas remain ineligible for promotion until that polling event occurs. Any switchover in the interim either stalls or breaks CDC exactly like case 1.
Not just CDC. Any replication client backed by a slot can create a similar problem. A physical standby connected through a physical slot that stops pulling WAL will pin restart_lsn indefinitely. That doesn’t directly affect slot eligibility the way logical failover slots do, but it can fill the primary’s WAL volume and trip the cluster into write unavailability, emergency failover, or drop the slot entirely if the maximum WAL size has been reached. The core fragility is the same: progress of the slowest slot determines how far the system can move without manual intervention.
This happens due to the way Postgres records replication progress. The WAL is a physical redo log for crash recovery and physical standby replication. The fact that a downstream consumer needs certain WAL retained is tracked in a primary-local catalog state inside pg_replication_slots. That state advancement only occurs when the consumer connects and acknowledges data. Historically, this state never rode along in WAL, so standbys had no authoritative copy. Postgres 17’s failover slots serialize slot metadata into WAL so candidates can mirror it, but they still refuse to declare a node eligible until a real subscriber has advanced the slot at least once while that node is following along. This preserves exactly-once CDC semantics at the expense of HA flexibility.
The MySQL approach
MySQL’s approach doesn’t create this coupling. MySQL’s binary log is an action log. Every transaction carries a GTID. Replicas with log_replica_updates=ON re-emit transactions they apply into their own binlogs, preserving GTID continuity. A CDC connector records the last committed GTID set. On reconnect it tells any suitable server, “resume from this GTID.” If the binlog containing that GTID still exists, streaming continues with no slot object and no eligibility gate.
Failover looks like:
Promote a replica
Point the connector at any replica and it resumes from it's GTID position
The success of this operation is only determined by whether binlog retention covers the downtime, not by whether the connector recently polled. A lagging consumer can’t stall switchover; at worst, if binlogs are purged past the last GTID the connector processed, the connector must resnapshot but HA completes immediately. You can even recover binlogs from other sources and apply those.
Which is better for HA?
Put the two designs side by side in the same topology:
Postgres: primary P, synchronous standbys R1 and R2, CDC slot S on P. Commits require ANY 1 flush by R1 or R2. CDC polls every 6 hours. New R3 is added during maintenance. Until the CDC client advances S, R3 is not eligible to carry S after promotion; neither is R2 if it joined recently or restarted without seeing slot progress. Switchover options are to wait for CDC to advance or promote anyway and accept slot drop. This ties a write-availability action to the behavior of an external downstream system. Tight coupling.
MySQL: primary M, two replicas MR1 and MR2 with GTID and row-based binlog; log_replica_updates=ON on replicas so they have full binlog history. CDC connector persists a GTID position. Maintenance adds MR3; it catches up and emits the same GTIDs. Switchover can proceed immediately. Direct the CDC connector to any replica and it resumes from its GTID position. There is no eligibility concept because replication progress is embedded in the binlog; nothing special needs to be mirrored across nodes. A much more flexible design.
This is the brittle edge in Postgres high availability with logical consumers: slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control. Even with failover slots, eligibility deliberately waits for the subscriber to move the slot to prevent broken streams. If the subscriber is slow by design (batch CDC) or temporarily offline, you inherit either long switchover delays or intentional CDC breakage. If a physical slot backs a dormant standby, you inherit WAL growth risk on the primary and potential write outages.]]></content>
        <summary><![CDATA[Why a lagging client can stall or break failover, and how MySQL’s GTID model avoids it.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Neki</title>
        <link href="https://planetscale.com/blog/announcing-neki" />
        <id>https://planetscale.com/blog/announcing-neki</id>
        <published>2025-08-11T00:00:00.000Z</published>
        <updated>2025-08-11T00:00:00.000Z</updated>
        
        <author>
          <name>Andres Taylor</name>
        </author>
        
        <author>
          <name>Dirkjan Bussink</name>
        </author>
        
        <author>
          <name>Harshit Gangal</name>
        </author>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        <author>
          <name>Noble Mittal</name>
        </author>
        
        <author>
          <name>Rohit Nayak</name>
        </author>
        
        <author>
          <name>Roman Sodermans</name>
        </author>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today, we are announcing Neki — sharded Postgres by the team behind Vitess. Vitess is one of PlanetScale’s greatest strengths and contemporary Vitess is the product of our experience running at extreme scale. We have made explicit sharding accessible to hundreds of thousands of people and it is time to bring this power to Postgres.
Neki is not a fork of Vitess. Vitess’ achievements are enabled by leveraging MySQL’s strengths and engineering around its weaknesses. To achieve Vitess’ power for Postgres we are architecting from first principles and building alongside design partners at scale. When we are ready we will release Neki as an open source project suitable for running the most demanding Postgres workloads.
To stay up to date with the latest developments on Neki you can signup at neki.dev.]]></content>
        <summary><![CDATA[Sharded Postgres by the team behind Vitess]]></summary>
      </entry>
    
      <entry>
        <title>Caching</title>
        <link href="https://planetscale.com/blog/caching" />
        <id>https://planetscale.com/blog/caching</id>
        <published>2025-07-08T00:00:00.000Z</published>
        <updated>2025-07-08T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Every time you use a computer, caches work to ensure your experience is fast.Everything a computer does from executing an instruction on the CPU, to requesting your X.com feed, to loading this very webpage, relies heavily on caching.
You are about to enjoy a guided, interactive tour of caching: the most elegant, powerful, and pervasive innovation in computing.
Foundation
Everything we do with our computers requires data.When you log into your email, all your messages and contacts are stored as data on a server.When you browse the photo library on your iPhone, they are stored as data on your phone's storage and/or iCloud.When you loaded this article, it had to fetch the data from a web server.
When designing systems to store and retrieve data, we are faced with trade-offs between capacity, speed, cost, and durability.
Hard-Disk Drives are cheaper per-gigabyte than Solid-State Drives, but the latter are lower latency.Random-Access Memory is faster than both of the above, but is more expensive and volatile.
For a given budget, you can either get a large amount of slower data storage, or a small amount of faster storage.Engineers get around this by combining the two: Pair a large amount of cheaper slow storage with a small amount of expensive fast storage.Data that is accessed more frequently can go in the fast storage, and the other data that we use less often goes in the slow storage.
This is the core principle of caching.
If you're hype about PlanetScale for Postgres, try Postgres mode:
We can see this visualized below.Each colored square represents a unique datum we need to store and retrieve.You can click on the "data requester" to send additional requests for data or click on any of the colored squares in the "slow + large" section to initiate a request that color.
Every time we request a piece of data, the request is first sent to the cache.If it's there, we consider it a cache hit and quickly send the data back.
If we don't find it, we call it a cache miss.At this point we pass the request on to the slow storage, find the data, send it back to the cache, and then send it to the requester.
Hit rate
The "hit rate" is the percentage of time we get cache hits.In other words:
hit_rate = (cache_hits / total_requests) x 100
We want to keep the hit rate as high as possible.Doing so means we are minimizing the number requests to the slower storage.
In the setup below we have a small cache, many items to store, and our requests for the items are coming in with no predictable pattern.This leads to a low hit rate.
You can see the hit rate for the above cache and how it changes over time in the bar chart below.
What if instead we have a cache that is nearly the size of our data, with the same random requests pattern?
In this case, we are able to achieve a much higher hit rate.
Increasing the size of our cache increases cost and complexity in our data storage system.It's all about trade-offs.
Your computer
Perhaps the most widespread form of caching in computers is Random-Access Memory (RAM).
The CPU of a computer is what does all of the "work."Some call the CPU the brain of a computer.A brain needs memory to work off of, otherwise it isn't much use.
Any information computers want to store permanently get stored on the hard drive.Hard drives are great because they can remember things even when the computer gets shut off or runs out of battery.The problem is, hard drives are slow.
Click on "CPU" in the visual below to send a request for data to storage.You can also click on any of the colored squares on the "Hard Drive" section to request that specific element.
To get around this, we use RAM as an intermediate storage location between the CPU and the hard drive.RAM is more expensive and has less storage capacity than the hard drive.But the key is, data can be fetched from and written to RAM an order-of-magnitude faster than the time it takes to interact with the hard drive.Because of this, computers will store a subset of the data from the hard drive, the stuff that needs to be accessed more frequently, in RAM.Many requests for data will go to RAM and can be executed fast.Occasionally, when the CPU needs something not already in RAM, it makes a request to the hard drive.
There are actually more layers of caching on a computer than just RAM and disk.Modern CPUs have one or more cache layers for RAM.Though RAM is fast, a cache built directly into the CPU is even faster, so frequently used values and variables can be stored there while a program is running to improve performance.
Most modern CPUs have multiple of these cache layers, referred to as L1, L2, and L3 cache.L1 is faster than L2 which is faster than L3, but L1 has less capacity than L2, which has less capacity than L3.
This is often the tradeoff with caching - Faster data lookup means more cost or more size limitations due to how physically close the data needs to be to the requester.It's all tradeoffs.Getting great performance out of a computer is a careful balancing act of tuning cache layers to be optimal for the workload.
Temporal Locality
Caching is not only limited to the context of our local computers.It's used all the time in the cloud to power your favorite applications.Let's take the everything app: X.com.
The number of tweets in the entire history of X.com is easily in the trillions.However, 𝕏 timelines are almost exclusively filled with posts from the past ~48 hours.After a few days, are rarely viewed unless someone re-posts or scrolls far back into their history.This produces a recency bias.
Let's consider this banger from Karpathy.
This received over 43,000 likes and 7 million impressions since it was posted over two years ago.Though we don't have access to the exact trends, a likely trendline for views on this tweet would be:
Tweets, pictures, and videos that were recently posted are requested much more than older ones.Because of this, we want to focus our caching resources on recent posts so they can be accessed faster.Older posts can load more slowly.
This is standard practice for many websites, not just X.com.These websites store much of their content (email, images, documents, videos, etc) in "slow" storage (like Amazon S3 or similar), but cache recent content in faster, in-memory stores (like CloudFront, Redis or Memcached).
Consider the caching visual below.Redder squares represent recent data ("warm" data) whereas bluer squares represent older data that is accessed less frequently ("cold" data).As we click on the app server to send requests, we can see that red data is more frequently accessed.This results in the cache primarily holding onto red data.
After filling up the cache with recent values, click on a few colder values on the database side.These all need to be loaded from the database, rather than being loaded directly from cache.
This is why viewing older posts on someones timeline often take longer to load than recent ones!It does not make much sense to cache values that are rarely needed.Go to someone's 𝕏 page and start scrolling way back in time.After you get past a few days or weeks of history, this will get noticeably slower.
Spatial Locality
Another important consideration is Spatial Locality.
In some data storage systems, when one chunk of data is read, it's probable that the data that comes immediately "before" or "after" it will also be read.Consider a photo album app.When a user clicks on one photo from their cloud photo storage, it's likely that the next photo they will view is the photo taken immediately after it chronologically.
In these situations, the data storage and caching systems leverage this user behavior.When one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well.In this case, we would also load the next and previous few photos.
This is called Spatial Locality, and it's a powerful optimization technique.
In the example below, you can see how spatial locality works.Each database cell is numbered in sequence.When you click on a cell, it will load the cell and it's two neighbors into cache.
This prefetching of related data improves performance when there are predictable data access patterns, which is true of many applications beyond photo albums.
Geospatial
Physical distance on a planet-scale is another interesting consideration for caching.Like it or not, we live on a big spinning rock 25,000 miles in circumference, and we are limited by "physics" for how fast data can move from point A to B.
In some cases, we place our data in a single server and deal with the consequences.For example, we may place all of our data in the eastern United States.Computers requesting data on the east coast will have lower latency than those on the west coast.East-coasters will experience 10-20ms of latency, while west-coaster will experience 50-100ms.Those requesting data on the other side of the world will experience 250+ milliseconds of latency.
Engineers frequently use Content Delivery Networks (CDNs) to help.With CDNs we still have a single source-of-truth for the data, but we add data caches in multiple locations around the world.Data requests are sent to the geographically-nearest cache.If the cache does not have the needed data, only then does it request it from the core source of truth.
We have a simple visualization of this below.Use the arrows to move the requester around the map, and see how it affects where data requests are sent to.
We also might choose to have the full data stored in all of the CDN / cache nodes.In this case, if the data never changes, we can avoid cache misses entirely!The only time we would need to get them is when the original data is modified.
Replacement policies
When a cache becomes full and a new item needs to be stored, a replacement policy determines which item to evict to make room for a new one.
There are several algorithms we can use to decide which values to evict and when.Making the right choice here has a significant impact on performance.
FIFO
First In, First Out, or FIFO, is the simplest cache replacement policy.It works like a queue.New items are added to the beginning (on the left).When the cache queue is full, the least-recently added item gets evicted (right).
In this visualization, data requests enter from the top.When an item is requested and it's a cache hits, it immediately slides up indicating a response.When there's a cache miss, the new data is added from the left.
While simple to implement, FIFO isn't optimal for most caching scenarios because it doesn't consider usage patterns.We have smarter techniques to deal with this.
LRU
Least Recently Used (LRU) is a popular choice, and the industry standard for many caching systems.Unlike FIFO, LRU always evicts the item that has least-recently been requested, a sensible choice to maximize cache hits.This aligns well with temporal locality in real-world data access patterns.
When a cache hit occurs, that item's position is updated to the beginning of the queue.This policy works well in many real-world scenarios where items accessed recently are likely to be accessed again soon.
Time-Aware LRU
We can augment the LRU algorithm in a number of ways to be time-sensitive.How precisely we do this depends on our use case, but it often involves using LRU plus giving each element in the cache a timer.When time is up, we evict the element!
We might imagine this being useful in cases like:
For a social network, automatically evicting posts from the cache after 48 hours.
For a weather app, automatically evicting previous-days weather info from the cache when the clock turns to a new day.
In an email app, removing the email from the cache after a week, as it's unlikely to be read again after that.
We can leverage our user's viewing patterns to put automatic expirations on the elements.We visualize this below.This cache uses LRU, but each element also has an eviction timer, after which it will be removed regardless of usage pattern.
Others
There are other cool/niche algorithms used in some scenarios.One cool one is Least-Frequently Recently Used.This involves managing two queues, one for high-priority items and one for low priority items.The high-priority queue uses an LRU algorithm, and when an element needs to be evicted it gets moved to the low priority queue, which then uses a more aggressive algorithm for replacement.
You can also set up caches to do bespoke things with timings, more complex than the simple time-aware LRU.
Postgres and MySQL
Caches are frequently put in front of a database as a mechanism to cache the results of slow queries.However, DBMSs like Postgres and MySQL also use caching within their implementations.
Postgres implements a two-layer caching strategy.First, it uses shared_buffers, an internal cache for data pages that store table information.This keeps frequently read row data in memory while less-frequently accessed data stays on disk.Second, Postgres relies heavily on the operating system's filesystem page cache, which caches disk pages at the kernel level.This creates a double-buffering system where data can exist in both Postgres's shared_buffers and the OS page cache.Many deployments set shared_buffers to around 25% of available RAM and let the filesystem cache handle much of the remaining caching work.The shared_buffers value can be configured in the postgres config file.
MySQL does a similar thing with the buffer pool.Like Postgres, this is an internal cache to keep recently used data in RAM.
Arguably, these are more complex than a "regular" cache as they also have to be able to operate with full ACID semantics and database transactions.Both databases have to take careful measures to ensure these pages contain accurate information and metadata as the data evolves.
Conclusion
I'm dead serious when I say this article barely scratches the surface of caching.We completely avoided the subject of handling writes and updates in caching systems.We discussed very little of specific technologies used for caching like Redis, Memcached, and others.We didn't address consistency issues, sharded caches, and a lot of other fun details.
My hope is that this gives you a good overview and appreciation for caching, and how pervasive it is in all layers of computing.Nearly every bit of digital tech you use relies on a cache, and now you know just a little more about it.
If you enjoyed this article, you might enjoy these case studies on how large tech companies implement caching:
Twitter's petabyte-scale caching system
How Uber handles 40 million cache requests per-second
TikTok's database handles 100 million QPS (and uses a cache)]]></content>
        <summary><![CDATA[Every time you use a computer, the cache is working to ensure your experience is fast.]]></summary>
      </entry>
    
      <entry>
        <title>The principles of extreme fault tolerance</title>
        <link href="https://planetscale.com/blog/the-principles-of-extreme-fault-tolerance" />
        <id>https://planetscale.com/blog/the-principles-of-extreme-fault-tolerance</id>
        <published>2025-07-03T09:00:00.000Z</published>
        <updated>2025-07-03T09:00:00.000Z</updated>
        
        <author>
          <name>Max Englander</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[PlanetScale is fast and reliable. Our speed is the best in the cloud due to our shared nothing architecture that enables us to utilize local storage instead of network-attached storage. Our fault tolerance is built on top of principles, processes, and architectures that are easy to understand, but require painstaking work to do well.
We have talked about our speed a lot. Let's talk about why we are reliable.
Principles
Our principles are neither new nor radical. You may find them obvious. Even so, they are foundational for our fault tolerance. Every capability we add, and every optimization we make, is either bound by or born from these principles.
Isolation
Systems are made from parts that are as physically and logically independent as possible.
Failures in one part do not cascade into failures in an independent part.
Parts in the critical path have as few dependencies as possible.
Redundancy
Each part is copied multiple times, so if one part fails, its copies continue doing its work.
Copies of each part are themselves isolated from each other.
Static stability
When something fails, continue operating with the last known good state.
Overprovision so a failing part's work can be absorbed by its copies.
Architecture
Our architecture emerges from the principles above.
Control plane
Provides database management functionality. Database creation, billing, etc.
Composed of parts which are redundant, spread across multiple cloud availability zones.
Less critical than the data plane, and so has more dependencies.
E.g. uses a PlanetScale database to store customer and database metadata.
Data plane
Stores database data and serves customer application queries.
Composed of a query routing layer and database clusters.
Each of these parts are both regionally and zonally redundant and isolated.
The most critical plane, with fewer dependencies than the control plane.
Does not depend on the control plane.
Database clusters
Composed of a primary instance and a minimum of two replicas.
Each instance is composed of a VM and storage residing in the data plane.
Instances evenly distributed across three availability zones.
Automatic failovers from primaries to healthy replicas in response to failures.
Customers may optionally run copies in read-only regions.
Enterprise customers may optionally promote read-only regions to primary.
Extremely critical. Extremely few dependences.
Processes
Within this architecture, we apply processes that reinforce our systems' overall fault tolerance.
Always be Failing Over
Very mature ability to fail over from a failing database primary to a healthy replica.
Exercise this ability every week on every customer database as we ship changes.
In the event of failing hardware or a network failure - fairly common in a big system running on the cloud - we automatically and aggressively fail over.
Query buffering minimizes or eliminates disruption during failovers.
Synchronous replication
MySQL semi-sync replication, Postgres synchronous commits.
Commits stored durably on at least one replica before primary sends acknowledgment to the client.
Enables us to treat replicas as potential primaries, and fail over to them immediately as needed.
Progressive delivery
Data plane changes are shipped gradually to progressively critical environments.
Database cluster config and binary changes are shipped database by database using feature flags
Release channels allow us to ship changes to dev branches first, and to wait a week or more before shipping those same changes to production branches
Minimizes the impact of our own mistakes on our customers.
Failure modes
How adherence to the principles, architecture, and processes above enable us to tolerate a variety of failure modes.
Non-query-path failures
Because our query path has extremely few dependencies, failures outside of the query path do not impact our customers' application queries.
As an example, a hypothetical failure in one of our cloud providers' Docker registry services might impact our ability to create new database instances, but will not impact existing instances' ability to serve queries or store data.
Likewise, failures, even total failure, of our control plane would impact our customer's ability to change their database cluster's settings, but would not impact that cluster's query service.
Cloud provider failures
We run on AWS and GCP, which can and do fail in many different ways.
Instance
If a failure impacts a primary database instance, we immediately fail over to a replica.
If a block storage database instance has a failing VM, the elastic volume is detached from that VM and reattached to a new, healthy VM.
If a PlanetScale Metal database instance has a failing VM, we surge a replacement instance with a new VM and local NVMe drive, and destroy the failing instance once its replacement is healthy.
A storage failure is handled roughly the same way for block storage and Metal clusters: we spin up a replacement database instance and scale down the unhealthy instance.
Zonal failures
As with instance-level failures, if a primary database instance resides in an availability zone that is failing, we immediately fail over to a replica in a healthy availability zone.
Our query routing layer reacts to zonal failures by shifting traffic to instances in healthy zones.
Regional failures
If an entire region goes down, so do database clusters running in that region.
However, database clusters running in other regions are unaffected.
Enterprise customers have the ability to initiate a failover to one of their read-only regions.
PlanetScale-induced failures
A bug in Vitess or the PlanetScale Kubernetes operator rarely impacts more than 1-2 customers, thanks to our extensive use of feature flags to roll out changes.
A failure resulting from an infrastructure change, like a Kubernetes upgrade, can have a bigger impact, but very rarely does because of how rigorously we test and gradually we roll out.]]></content>
        <summary><![CDATA[The principles and processes we follow for fault tolerance.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing PlanetScale for Postgres</title>
        <link href="https://planetscale.com/blog/planetscale-for-postgres" />
        <id>https://planetscale.com/blog/planetscale-for-postgres</id>
        <published>2025-07-01T09:00:00.000Z</published>
        <updated>2025-07-01T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Update: PlanetScale for Postgres is now Generally Available.
Today we are announcing the private preview of PlanetScale for Postgres: the world’s fastest Postgres hosting platform.
We are already hosting customers' production workloads with incredible results. Convex, the complete backend solution for app developers, is migrating their reactive database infrastructure to PlanetScale for Postgres. Read more about their migration here.
Never say never
PlanetScale has been successful hosting some of the world’s largest relational databases, so why are we building for Postgres? The reason is simple: customer demand. In March we announced PlanetScale Metal and something wild happened. We had an immense number of companies reaching out to us asking us to support Postgres. The demand was so overwhelming that by the end of launch day we knew we had to do this. We are nothing without our customers, we do the difficult but boring bit of providing industry leading uptime so they can build incredible products. We want more and more exciting companies building on PlanetScale.
Performance and reliability
Our initial goal was to convince ourselves we could be additive to the ecosystem and provide something of value. We spoke to over 50 customers of the current Postgres hosting platforms and we heard identical stories of regular outages, poor performance, and high cost. PlanetScale is an engineering company. We are not going to enter a market with anything but exceptional engineering.
We had to validate we could out-perform the current solutions while being more reliable. This led us to building a comprehensive benchmark methodology. After extensive testing we are proud to share that we consistently outperform every Postgres product on the market, even when giving the competition 2x the resources:
Amazon Aurora
AlloyDB
Lakebase/Neon
Supabase
Heroku
Crunchy Data
TigerData
PlanetScale for Postgres uses real Postgres running on our proprietary operator meaning we can bring the maturity of PlanetScale and the performance of Metal to an even wider audience. Today's release already achieves true high availability with automatic failovers, query buffering, and connection pooling via our proprietary proxy layer, which includes PgBouncer for connection pooling. We run Postgres v17 and support online imports from any version > Postgres v13, as well as automatic Postgres version updates without downtime. Additionally, PlanetScale Metal’s locally-attached NVMe SSD drives fundamentally change the performance/cost ratio for hosting relational databases in the cloud. We’re excited to bring this performance to Postgres.
Neki: Vitess for Postgres
Vitess is one of PlanetScale’s greatest strengths and has become synonymous with database scaling. Contemporary Vitess is the product of PlanetScale’s experience running at extreme scale. We have made explicit sharding accessible to hundreds of thousands of users and it is time to bring this power to Postgres. We will not however be using Vitess to do this.
Vitess’ achievements are enabled by leveraging MySQL’s strengths and engineering around its weaknesses. To achieve Vitess’ power for Postgres we are architecting from first principles. We are well under way with building this new system and will be releasing more information and early access as we progress. As with all PlanetScale products we work with customers at scale to build and validate maturity. If your company runs Postgres at a significant scale and this is something that interests you, reach out. You can also sign up for the Neki waitlist at neki.dev to stay updated on our progress.
We are incredibly excited to be a part of the vibrant and thriving Postgres community.]]></content>
        <summary><![CDATA[PlanetScale now supports Postgres]]></summary>
      </entry>
    
      <entry>
        <title>Benchmarking Postgres</title>
        <link href="https://planetscale.com/blog/benchmarking-postgres" />
        <id>https://planetscale.com/blog/benchmarking-postgres</id>
        <published>2025-07-01T00:00:00.000Z</published>
        <updated>2025-07-28T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Today we launched PlanetScale for Postgres.For the past several months, we've been laser focused on building the best Postgres experience on the planet, performance included.
To ensure we met our high standard for database performance, we needed a way to measure and compare other options with a standardized, repeatable, and fair methodology.We built an internal tool, "Telescope", to be our go-to tool for creating, running, and assessing benchmarks.
We have used this as an internal tool to give our engineers quick feedback on the evolution of our product's performance as we built and tuned it.We decided to share our findings with the world, and give others the tools to reproduce them.
If you'd like to skip straight to the benchmark results, they are here:
PlanetScale vs Amazon Aurora
PlanetScale vs Google AlloyDB
PlanetScale vs Neon/Lakebase
PlanetScale vs Supabase
PlanetScale vs CrunchyData
PlanetScale vs TigerData
PlanetScale vs Heroku Postgres
PlanetScale vs Xata
And here is a quick overview of how PlanetScale performs compared to other Postgres vendors:
In every aspect of our business we aim for excellence and continual improvement.We encourage you to reach out at benchmarks@planetscale.com if you see any mistakes in our methodology or cost calculations.
What is benchmarking?
The way benchmarking is used is often deceptive.This applies to all technologies, not just databases.
So let's be clear, benchmarking of any kind has its shortcomings.Every OLTP workload at every organization is unique and no single benchmark can capture the performance characteristics of all such databases.Data size, hot:cold ratios, QPS variability, schema structure, indexes, and 100 other factors determine the requirements of your relational database setup.
You cannot look at a benchmark and know for sure that your workload will perform the same given all other factors are the same.
However, when a benchmark is conducted well, it is quite useful for answering the following questions:
How quickly can I reach my database? (latency)
How does the database perform under a "typical" OLTP load? (TPS, QPS, etc)
How does the database perform under high read / write pressure? (IOPS, caching)
How much does it cost to achieve some bar of performance relative to other options? (price:performance ratio)
These are the types of questions we set out to answer with our benchmarking.Therefore, we chose three benchmarks as our primary measurement tools:
Latency: A simple query-path latency benchmark.This repeatedly runs SELECT 1; statements on the database from another instance in the same region.Very simple, but effective to determine basic query-path latency.
TPCC: We used a TPCC-like benchmark developed by Percona to help answer questions 2-4.More details will be provided on the configuration later in this post.
OLTP Read-only: For a selection of the top-performers, we also run an OLTP Read-only sysbench workload.This is useful for isolating read performance, as most OLTP workloads are 80+% reads.
Fairness
In this process, we compared our product to a long list of other cloud Postgres providers.We wanted the comparisons to be as fair as possible.
Every benchmark we are releasing publicly was compared to PlanetScale running on an i8g M-320.That's 4vCPUs, 32GB RAM, and 937GB of NVMe SSD storage.We chose an instance size and cluster configuration that represents what would be typical for a real production application with at least a moderate amount of usage.For example, a database that can serve several-thousand QPS while maintaining low-latency and high-availability.
At PlanetScale, we give you a primary and two replicas spread across 3 availability zones (AZs) by default.Multi-AZ configurations are critical to have a highly-available database.The replicas can also be used to handle significant read load.
For the other products we compared to, we kept things simple by creating single-instance databases that match or exceeded the vCPUs and RAM of the PlanetScale primary.To match the true capacity and availability of PlanetScale, each would also need to add replicas.We account for this when discussing pricing.
Compute
For each competitor, we either matched or exceeded both the vCPU count and RAM.Amazon Aurora, Google AlloyDB, and CrunchyData support memory-optimized RAM:CPU ratios of 8:1, so we were able to match it exactly.
Supabase, TigerData, and Neon only support 4:1 RAM:CPU ratios.For these, we opted to match the RAM, giving them double the CPU count used by PlanetScale.This is an unfair advantage to them, but as you'll see, PlanetScale still significantly outperforms with less resources.
Storage
All of the products we compared with use network-attached storage for the underlying drives.Of these, some do not allow configuration of the specific IOPS, such as Aurora, Neon, and AlloyDB.
Several do however.For Supabase and TigerData, we gave boosts to the default IOPS settings.
Methodology
In the interest of full transparency, we provide full details for how we conducted our benchmarking.Every benchmark was run under the following conditions:
All databases and benchmark machine resources were run from within the same cloud region.For all but the Google products, this meant running in us-east-1.For Google, us-central1.
Except for the Latency benchmarks, we do not provide guarantees down to the availability-zone level.Not all platforms allow you to specify which AZ the database node should reside in (nor do they expose this).Thus, it is impractical to make guarantees around this for all providers.
All AWS-based benchmarks were run from a c6a.xlarge (4 vCPUs, 8 GB Memory) in us-east-1.All GCP-based benchmarks were run from a e2-standard-4 (4 vCPUs, 16 GB Memory) in us-central1.
All Postgres configuration options are left at each platform's defaults.The one exception to this is modifications to connection limits and timeouts, which may be modified to facilitate benchmarking.
For the TPCC-like benchmark, the data was generated using the Percona TPCC scripts with TABLES=20 and SCALE=250.This produces a ~500 gigabyte Postgres database.We provide instructions to reproduce these for yourself.
The OLTP benchmark uses sysbench's built-in oltp_read_only benchmark.We provide instructions to reproduce these for yourself.
The latency benchmark is quite simple: We run SELECT 1; 200 times in a row, and measure the round-trip of the query for each one.You can easily write scripts to measure such latencies.
Additional details are provided in each of the results pages, linked earlier.
An invitation
We have no intention of misrepresenting others.Our aim is to show the world the excellent performance to be gained by running Postgres on PlanetScale Metal.In our testing, Postgres on PlanetScale Metal is by far the most performant.
We invite other vendors to provide feedback.If you see anything wrong in our benchmarking methodology, let us know at benchmarks@planetscale.com.]]></content>
        <summary><![CDATA[Benchmarking Postgres in a transparent, standardized and fair way is challenging. Here, we look at the process of how we did it in-depth]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 22</title>
        <link href="https://planetscale.com/blog/announcing-vitess-22" />
        <id>https://planetscale.com/blog/announcing-vitess-22</id>
        <published>2025-04-29T00:00:00.000Z</published>
        <updated>2025-04-29T00:00:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[The Vitess maintainers are happy to announce the release of version 22.0.0, along with version 2.15.0 of the Vitess Kubernetes Operator.This release is the first to benefit from a 6-month-long development cycle, after our recent change to the release cadence.
Version 22.0.0 comes with significant enhancements to query serving and cluster management.These changes have allowed Vitess to be more performant and easier to operate compared to version 21.This blog post highlights some of the major changes that went into the release.For a more detailed description of the release, please refer to the release notes.
Summary
Query Serving: Prepared statements and new VTGate metrics
Cluster Management and VTOrc: Stalled-disk recovery, improved errant GTID discovery, and VTOrc performance improvements
Kubernetes Operator: Automated backups, Kubernetes 1.32 support, better examples
Performance: Improvements compared to v21.0.0
Query Serving
Vitess 22.0 brings a host of query‑serving enhancements focused on observability, smarter handling of prepared statements, GA support for sharded views, and feature‑complete atomic distributed transactions.
Observability
Vitess now emits metrics for query execution plans, which help you analyze execution patterns and optimize costly plans.See the Query Serving metrics guide.
Prepared Statements
We have overhauled VTGate’s prepare pipeline: prepared statements are cached as raw SQL to skip redundant parsing. Also, deferred execution plans are generated using actual bind values for more optimized, parameter‑aware planning.
Views and Atomic Distributed Transactions
Sharded view support is GA now, and the atomic distributed transaction feature is complete.Learn more in the Distributed Transactions guide.
Cluster Management and VTOrc
There were several performance and usability enhancements to VTOrc in the Vitess 22.0 release.It now operates more efficiently, with fewer topology service calls and better overall resource consumption.
VTOrc now supports specifying shard ranges in the --keyspaces-to-watch flag, making it easier to configure in large deployments.
This release also introduces stalled disk recovery to VTOrc, along with improved handling of errant GTID detection.Additionally, vttablets are now prevented from joining the replication topology if they contain errant GTIDs.
Finally, we’ve added a new semi-sync monitor to VTOrc.This helps detect and unblock vttablets that are stuck waiting for semi-sync ACKs – a situation that can occur if ACKs are lost due to network issues.
Kubernetes Operator
The v22.0.0 release of Vitess comes along with the v2.15.0 release of our Kubernetes operator.This new release follows the same release cycle as Vitess.As part of this release, the officially supported Kubernetes version is v1.32.
The original implementation of automated and scheduled backups done in v2.13.0 used an empty pod running vtctldclient BackupShard to execute the backup.This approach had trade-offs: the backup process ran faster but effectively removed a serving tablet from the available tablet pool.In v2.15.0, we have changed the implementation to use a VTBackup pod instead, to be consistent with our own recommendations for production deployments and to reduce impact on availability.
All our provided examples have been enhanced to illustrate how the VitessOperator and VitessCluster can be run in two different namespaces.
Please refer to the operator release notes to learn more about v2.15.0.
Performance
In this release, we have reduced query latency and cut memory allocations across Vitess through targeted optimizations:
gRPC Codec upgrade: Vitess now uses gRPC with Codec v2 and a pooled buffer, yielding a ~3% QPS improvement and ~13% reduction in per‑request allocations.
Normalizer AST walk reduction: Merged two normalization AST passes into one, speeding VTGate’s normalizer by ~5% and cutting allocations per operation by ~5.5%.
AST rewriting optimization: Combined RewriteAST and Normalizer into a single pass, trimming planner latency by ~2.7% and reducing allocations by ~4.3%.
Merge‑sort avoidance on single‑shard queries: VTGate now skips redundant merge sorts when only one shard is involved, eliminating unnecessary CPU work for the common single‑shard case.
Migrate and Learn More
To ease migration from a previous version to v22.0.0, it is highly recommended to read the release notes of both Vitess and the Kubernetes Operator.The entire changelog for this version is available too.
It is also recommended to explore our documentation for v22.0.0, where you can find step-by-step user guides, best practices, and tips to run Vitess.
Community
As an open-source project, we truly appreciate feedback, insights, and contributions from our community.Whether you want to share a story, ask a question, or anything else, you can reach out to us on GitHub or in our Slack.]]></content>
        <summary><![CDATA[Vitess 22 is now generally available]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale vectors is now GA</title>
        <link href="https://planetscale.com/blog/announcing-planetscale-vectors-ga" />
        <id>https://planetscale.com/blog/announcing-planetscale-vectors-ga</id>
        <published>2025-03-25T12:00:00.000Z</published>
        <updated>2025-03-25T12:00:00.000Z</updated>
        
        <author>
          <name>Patrick Reynolds</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[After a winter of focused development, we are excited to announce that PlanetScale's support for vector search and storage is GA. Since the open beta began, we have doubled query performance, improved memory efficiency eight times, and focused on robustness to ensure vector support is as solid as every other data type MySQL supports. We are especially proud of the scalability of vector indexes, which now perform well even when they are 6x larger than available memory.
It's a real relational index
We have made vectors a first-class part of MySQL, meaning that writes and queries work like a normal RDBMS. Store your vector data in a column in your existing keyspace. Build an index with an ALTER or CREATE VECTOR INDEX statement. Write SELECT statements with JOIN and WHERE clauses like any other data. Wrap your writes in transactions, and expect the data to be visible as soon as the transaction commits — or to roll back if you ask it to. Your index will remain performant as you add and remove rows, without any need for periodic rebuilds.
It's a PlanetScale database
All the PlanetScale features you rely on work with vectors, too. Create a branch. Add a vector index. Open a deploy request and merge the index into your production branch. Revert it if you change your mind. Query your vector index on other replicas, even in other regions. Sleep easy knowing that your vector data is included in scheduled backups.
It's a good vector index
We didn't just get the relational and operational aspects right. We also built advanced vector-index features to satisfy a variety of embeddings and use cases. An index can rank vectors by Euclidean (L2), inner product, or cosine distance. It can store any vector up to 16,383 dimensions. It supports both fixed and product quantization. Fixed quantization down to one bit per field is crazy fast, or just crazy, depending on your needs.
Indexes should be bigger than RAM
One of the biggest things that sets PlanetScale vector support apart is the ability to use indexes larger than RAM. We are the first to incorporate an index based on SPANN (Space-Partitioned Approximate Nearest Neighbors) into an RDBMS. SPANN is a hybrid index design that combines the best aspects of partition-based and tree-based indexes. In SPANN, vectors are assigned to small partitions, which are stored in hidden InnoDB tables. One vector from each partition, around 20% of the index, is stored in a tree structure in memory, enabling the index to quickly identify which partitions are relevant to a query. Only the tree and a small, fixed number of relevant partitions need to be in memory when building and querying the index, allowing efficient operation with vector data up to 6x bigger than available memory. Using vector indexes on PlanetScale Metal ensures that loading vector partitions from InnoDB to answer queries will be as fast as possible.
How to enable vector support
To get started with vectors, go to the "Branches" page for any database. Click on a branch you want to add vectors to, and click on the small gear icon underneath the "Connect" button on the right for that branch's settings. Click the toggle next to "Enable vectors." Follow the examples in the vectors documentation to create a vector column, add data, and create and query an index.
More resources
To learn more about vector embeddings, check out our YouTube video:
You can check out the following documentation:
PlanetScale vectors overview
Vector database terminology and concepts
Common use cases for vector search
PlanetScale vector usage with ORMs
Vector type and index reference
Vector support is ready for use in production today. As you build with vectors in your database, we welcome feedback to help us continually enhance performance and usability. You can submit a support ticket to relay any feedback or issues. We also have a vectors channel in our Discord where you can ask questions, share feedback, or chat about use cases.]]></content>
        <summary><![CDATA[You can now use vector search and storage in your PlanetScale MySQL database.]]></summary>
      </entry>
    
      <entry>
        <title>Faster interpreters in Go: Catching up with C++</title>
        <link href="https://planetscale.com/blog/faster-interpreters-in-go-catching-up-with-cpp" />
        <id>https://planetscale.com/blog/faster-interpreters-in-go-catching-up-with-cpp</id>
        <published>2025-03-20T00:00:00.000Z</published>
        <updated>2025-03-20T00:00:00.000Z</updated>
        
        <author>
          <name>Vicent Martí</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[The SQL evaluation engine that ships with Vitess, the open-source database that powers PlanetScale, was originally implemented as an AST evaluator that used to operate directly on the SQL AST generated by our parser. Over this past year, we've gradually replaced it with a Virtual Machine which, despite being written natively in Go, performs similarly to the original C++ evaluation code in MySQL. Most remarkably, the new Virtual Machine has repeatedly proven itself easier to maintain than the original Go interpreter, even though it's orders of magnitude faster. Let's review the implementation choices we've made to get these surprising results.
What's a SQL evaluation engine?
Vitess has been designed for unlimited horizontal scaling. To accomplish this, all queries to a Vitess cluster must go through a vtgate. Since you can deploy as many vtgate instances as you want, because they’re essentially stateless, this allows you to grow the capacity of your cluster linearly.  The job of each gate is the most complex part of the whole distributed system. It parses the SQL of the incoming queries and creates a shard-aware query plan, which we evaluate in one or many of the shards of the cluster. Then, we aggregate the results of these evaluations, and return them to the user.
One of the reasons why Vitess works so well in practice (in both performance and in ease of adoption) is that every shard in a cluster is backed by a real MySQL instance. Even the more complex SQL queries can be decomposed into simpler statements that are evaluated in the underlying MySQL database. Hence, the results of these queries always match what you’d expect from querying MySQL directly.
However, SQL queries in the real world can get really wild. We need to support pretty much every kind of query that a normal MySQL instance supports, but we need to evaluate it across several MySQL instances. This means that sometimes, we don’t get to fall back to MySQL to evaluate all our SQL expressions.
Think of a rather simple query such as this:SELECT inventory.item_id, SUM(inventory.count), AVG(inventory.price) AS avg_price
FROM inventory
WHERE inventory.state = 'available' AND inventory.warehouse IN ? 
GROUP BY inventory.item_id
HAVING  avg_price > 100;

Assuming this query is executed in a sharded Vitess cluster, the inventoried items can exist in any of the shards. Hence, our query planner will prepare a plan that queries all shards in parallel, pushing down part of the aggregation to MySQL, and then we'll perform the aggregations (SUM and AVG) locally in the vtgate. The state and warehouse checks in the WHERE clause can and will be executed directly on the MySQL instance that powers each shard. But the last expression, avg_price > 100, applies to the result of the aggregation, which is only available inside Vitess. This is where the Vitess evaluation engine comes in.
Our evaluation engine is an interpreter that supports the majority of the scalar expressions in the SQL dialect used by MySQL. This does not include high level constructs such as performing a JOIN,  the grouping of a GROUP BY, etc (these are performed directly by the planner, as we’ve seen), but the actual sub-expressions that you’d see as the condition of a WHERE clause, or a GROUP BY clause. Any piece of SQL that cannot be lowered to be executed in MySQL by the planner is evaluated locally in Go by the engine.
Of course, these SQL sub-expressions are not arbitrarily complex. They are not even Turing complete (as they cannot loop!), so you may think that a statement like avg_price > ? would be trivial to evaluate, but as in most engineering problems, there’s a wealth of nuance when doing these things in the real world.
SQL is an incredibly dynamic language full of quirks, and the SQL in MySQL, doubly so. We have spent an inordinate amount of time getting every single corner case of SQL evaluation to match exactly MySQL’s behavior. In fact, our test suite and fuzzer are so comprehensive that we routinely find bugs in the original MySQL evaluation engine, which we have to fix upstream (like this collation bug, this issue in the insert SQL function or this bug when searching substrings). Nonetheless, being fully accurate is not enough. For most queries, these expressions are evaluated once or even more than once for every returned row, so in order to not introduce additional overhead, evaluation needs to be as quick as possible.
As discussed earlier, the first version of the evaluation engine in Vitess was an AST-based interpreter, operating directly on top of the SQL AST generated by our parser. This was a very straightforward design that allowed us to focus on accuracy, at the expense of performance. Let's discuss our new design for replacing this interpreter with a fully fledged virtual machine which is both faster and easier to maintain. Starting with the basics.
The shapes of an interpreter
For those new to programming language implementations, there are roughly 3 ways to execute a dynamic language at runtime. In increasing level of complexity and performance:
An AST-based interpreter, where the syntax of the language is parsed into an AST and evaluation is performed by recursively walking each node of the AST and computing the results. (this is the way the evalengine in Vitess used to work!)
A bytecode VM, where the AST is compiled into binary bytecode that can be evaluated by a virtual machine — a piece of code that simulates a CPU, but with higher-level instructions. (this is what we've recently shipped!)
A JIT compiler, in which the bytecode is compiled directly into the host platform's native instructions, so it can be executed directly by the CPU without being interpreted by a Virtual Machine. (we'll talk about this later!)
The first thing to consider here is whether the upgrade from an AST interpreter to a virtual machine makes sense from a performance point of view. Here’s an intuition: SQL expressions are incredibly dynamic (when it comes to typing), very high level (when it comes to each primitive operation), and with very little control flow (when it comes to evaluation -- SQL expressions don't really loop, and conditionals are rare; their flow is always lineal!). This can lead us to believe that there's no performance to be squeezed from translating the AST-based evaluation engine into bytecode. The AST is already well suited for high level operations and type-switching!
This is only superficially true. Lots of programming languages are highly dynamic and they manage to run in bytecode VMs much more efficiently than with an AST interpreter. A clear example of this is the now ancient transition that Ruby did from its original AST interpreter in MRI to YARV, a bytecode VM. Python also did a similar switch very early on. And you can bet that literally no JavaScript engines are using AST evaluation: even though the goal of these engines is to start running JS as soon as possible, they still compile to (very efficient) bytecode interpreters before JIT compilation kicks in.
So where’s the advantage of a virtual machine versus an AST interpreter? A lot of it boils down to instruction dispatching, which can be made very fast (more on this later!). But it is true that for SQL expressions, we’re actually going to execute very few instructions. Hence, to squeeze performance out of the VM, we’re going to have to come up with new tricks.
The initial approach I had in mind for our SQL virtual machine was based on Efficient Interpretation using Quickening by Stefan Brunthaler. The idea behind this paper is that dynamic programming languages are very hard to execute efficiently because of the lack of information about types. A simple expression such as a + 1 must be interpreted in a completely different way depending on whether a is an integer, a floating point number of even a string. To optimize these operations in practice, the paper suggests rewriting the bytecode from more generic instructions (e.g. the sum operator that needs to figure out the types of the two operands to know how to sum them) into specific static instructions which are specialized for the types they operate on at runtime (e.g. the sum operator that knows that both operands are integers and can sum them right away).
To do that, a quickening VM needs to figure out at runtime the types of the expressions being evaluated and incrementally rewrite the bytecode into instructions that operate on them. This is very hard to do in practice! But after implementing a good chunk of specialized instructions for the different types of operators in SQL and attempting to runtime rewrite them, I noticed an opportunity to take the idea even further by making it more efficient and, crucially, simpler.
It turns out that the semantic analysis we perform in Vitess is advanced enough that, through careful integration with the upstream MySQL server and its information schema, it can be used to statically type the AST of the SQL expressions we were executing. This took a lot of effort to implement, but resulted in a big win: since the planner knows the types of the actual inputs that will be used to evaluate each SQL expression, we can derive from those the types of all sub-expressions at compilation time, resulting in byte-code that is already specialized without requiring runtime rewriting.
Now we just need to implement a Virtual Machine to efficiently interpret the specialized bytecode!
An efficient Virtual Machine in Go
Implementing a VM usually involves a lot of complexity. As we’ve explained, you have to write a compiler that processes the input expression AST and generates the corresponding binary instructions (you have to come up with an encoding even!) and afterwards you have to implement the actual VM, which decodes each instruction and performs the corresponding operation. And you have to constantly keep these in sync! Any mismatches between the compiler that emits the bytecode and the VM that executes it are often catastrophic and very hard to debug.
Historically, a bytecode VM has always been implemented the same way: a big-ass switch statement. You decode an instruction, and switch on its type to jump to the operation that needs to be performed. This is often a performance advantage against AST interpreters, because switching in practice is quite fast (particularly when implemented in C or C++ like most VMs are), and allows execution to happen linearly, without recursion.
This design, however, also has its fair share of shortcomings. Mike Pall, JIT-master extraordinaire and author of LuaJIT, gives a very insightful rundown of these issues on this mailing list post from 2011. Allow me to summarize for this blog: Besides the fact that the VM's instructions need to be kept in-sync with the compiler, the actual performance of the main VM loop in a language with many instructions is not great in practice because compilers usually struggle when compiling massive functions, and these functions are massive. They spill registers all over the place on each branch of the switch, because it's hard to tell which branches are hot and which ones are cold. With all the pushing and popping, the jump into the switch's branch often looks more like a function call, so a lot of the performance benefits of the virtual machine dissipate.
Mike was discussing C compilers in that post, but it's safe to assume that these problems are the same for a virtual machine implemented in Go. After a lot of testing, I can assure you that they are actually much worse because the Go compiler is not great at optimization. There’s always a trade-off between optimization and fast compile times, and the Go authors have historically opted for the latter.
One key issue for Go is that often the different branches of the switch statement are jumped to via binary search instead of a jump table. Switch jump table optimization was implemented surprisingly late on the compiler, and in practice it is very fiddly, without any way to enforce it. You have to tweak the way the VM's instructions are encoded carefully to ensure that you're jumping in the VM's main loop, and you have no way to reliably check whether your virtual machine’s dispatch code has been properly optimized besides reviewing the generated assembly yourself.
Clearly, switch-based VM loops are not the state of the art for writing efficient interpreters, neither in Go nor in any other programming language. So what is the state of the art then? Well, when it comes to Go it turns out that there's nobody doing fast interpreters right now (at least nobody I can find). The people who are doing interesting work here, such as the wazero WASM implementation are focusing their performance efforts on JIT. So we’re going to have to innovate!
Outside of Go, the most interesting approach for interpreters implemented in C or C++ is continuation-style evaluation loops, as seen in this report from 2021 that implements this technique for parsing Protocol Buffers. This involves implementing all the opcodes for the VM as freestanding functions that operate on the VM as an argument, with the return of the function being a callback to the next step of the computation. It does sound like something expensive and, huh, recursive, but the trick is that newer versions of LLVM allow us to mark functions as forcefully tail-called (see: https://en.wikipedia.org/wiki/Tail_call), so the resulting code is not recursively calling the VM loop but instead jumping between the operations and using the free-standing functions as an abstraction to control register placement and spillage. The most recent release of Python 3.14 actually ships an interpreter based on this design, boasting up to 30% improvement when executing Python code.
Unfortunately, this is not something we can do in Go because as we discussed earlier, the Go compiler is allergic to optimization. It can sometimes emit tail calls, but it needs to be tickled in just the right way, and this implementation simply does not work in practice unless the tail-calls are guaranteed at compilation time. But what if we keep the same design with free-standing functions for each instruction and instead of tail-calling, we forcefully return control to the evaluation loop after each one? This could be implemented very easily by not emitting our compiled program as “byte code”, but instead emitting a slice of function pointers to each instruction. The design may be a bit counter-intuitive, but it has a lot of very interesting properties.
First, the VM becomes trivial! It's just a few lines of code, and it doesn't have to worry about optimizing any large switch statements. It's just repeatedly calling functions one after the other! Here’s a simplified example, but if you check the actual implementation in Vitess you’ll see that a real virtual machine implementation is hardly more complicated than this.func (vm *VirtualMachine) execute(p *Program) (eval, error) {
	code := p.code
	ip := 0

	for ip < len(code) {
		ip += code[ip](vm)
		if vm.err != nil {
			return nil, vm.err
		}
	}
	if vm.sp == 0 {
		return nil, nil
	}
	return vm.stack[vm.sp-1], nil
}

All we need to return when executing each instruction is the offset for the instruction pointer ip. Most functions return 1, which causes the next instruction to be executed, but by returning negative or positive values, you can implement all control flow, including loops and conditionals.
Besides the greatly simplified virtual machine, the second advantage of this approach is that the compiler also becomes trivial, because there is no bytecode! Instead, the compiler emits the individual instructions directly by pushing "callbacks" into a slice. There are no instruction opcodes to keep track off, no encoding to perform and nothing to keep in sync with the VM. Developing the compiler means developing the VM simultaneously, which greatly improves iteration speed and prevents a whole class of bugs that happen often when developing virtual machines.func (c *compiler) emitPushNull() {
	c.emit(func(vm *VirtualMachine) int {
		vm.stack[vm.sp] = nil
		vm.sp++
		return 1
	})
}

As you may notice, there’s a bit of a hiccup here when it comes to modeling the instructions for a non-trivial language: if there's no instruction encoding, then we cannot have instructions with arguments.
This is a big problem in a language like C (traditionally used to implement most programming language interpreters), which is why this technique is never seen there. But it’s actually not a problem for us,  because the Go compiler actually supports closures! We can emit any instruction we want and the Go compiler will automatically capture its arguments inside the callback. We don't have to think about how to encode our arguments in the bytecode, and in fact, our arguments can be as complex as they need to be: the resulting callback will contain a copy of them created by the Go compiler. It's essentially a poor man's JIT, aided by the compiler, and it works amazingly well in practice, both performance-wise and for ergonomics.
Check out this compiler method that generates an instruction to push a TEXT SQL object from the input rows into the stack:func (c *compiler) emitPushColumn_text(offset int, col collations.TypedCollation) {
	c.emit(func(vm *VirtualMachine) int {
		vm.stack[vm.sp] = newEvalText(vm.row[offset].Raw(), col)
		vm.sp++
		return 1
	})
}

Both the offset in the input rows array and the collation for the text are statically baked into the generated instruction!
Almost statically typed
With the fully static typing for SQL expressions (derived from the type information in the planner) we get to design an extremely efficient virtual machine where every single instruction is specialized for the type of the operands it executes on. This is both the optimal and the simplest design for a VM because we never have to do type switching during evaluation. But we’re dealing with SQL here (or, more accurately, the SQL dialect of MySQL), so not everything is rainbows and unicorns. Very often it’s quite the opposite.
Let’s consider this wildly complex SQL expression: -inventory.price. That is, the negation of each of the values in the inventory.price column of our query. We know (thanks to our semantic analysis, and the schema tracker) that the type of the inventory.price column is BIGINT. So what could be the type of -inventory.price? Naive readers without experience in the magical world of SQL may believe the resulting type is BIGINT, but that’s not the case in practice!
The vast majority of the time, the negation of a BIGINT yields indeed another BIGINT value. But when the actual value of the BIGINT is -9223372036854775808 (i.e. the smallest value that can be represented in 64 bits), negating it promotes the value into a DECIMAL, instead of silently truncating it, or returning an error. You can see how this can easily throw a wrench in our statically compiled instructions for our virtual machine. Suddenly the static type checking we’ve computed is no longer valid because the types of the expression no longer depend on the types of the inputs, but on the actual values of the inputs. In order to continue evaluating the result of this negation, we’d always have to type-check again at runtime, defeating the whole point of static typing to begin with.
To work around this issue, we’re not introducing more type switches at runtime. We’re using a classic trick which can be seen all the time in JIT compiled code and very rarely, if ever, in virtual machines: de-optimization. There’s a small list of expressions where corner cases (e.g. overflow) can result in dynamic typing at runtime. Whenever this happens, we simply bail out of executing in our virtual machine and fall back to executing on the old AST evaluator, which has always performed type switching at runtime. This is very similar to what JIT compilers do when they detect that the runtime type of a value no longer matches the generated code they’ve emitted; they fall back from the native code to the virtual machine. In our case, we’re one step behind, falling back from the virtual machine to the AST interpreter, but the performance implications are the same. This design allows us to keep our interpreter executing statically typed code without any type switches at runtime. Here's an example of what integer negation looks like when compiled:func (c *compiler) emitNeg_i() {
	c.emit(func(vm *VirtualMachine) int {
		arg := vm.stack[env.vm.sp-1].(*evalInt64)
		if arg.i == math.MinInt64 {
			vm.err = errDeoptimize
		} else {
			arg.i = -arg.i
		}
		return 1
	})
}

There is one significant drawback with this approach, however: the code for the AST interpreter can never be removed from Vitess. But this is, overall, not a bad thing. Just like most advanced language runtimes keep their virtual machine interpreter despite having a JIT compiler, having access to our classic AST interpreter gives us versatility. It can be used when we detect that an expression will be evaluated just once (e.g. when we use the evaluation engine to perform constant folding on a SQL expression). In those cases, the overhead of compiling and then executing on the VM trumps a single-pass evaluation on the AST. Lastly, when it comes to accuracy, being able to fuzz both the AST interpreter and the VM against each other has resulted in an invaluable tool for detecting bugs and corner cases.
Conclusion
This technique for virtual machine implementation is not fully novel (I’ve seen it used before for a rules-based authorization engine in the wild!), but as far as I can tell it has never been used in Go. Given the constraints of the language and the compiler, the technique yields spectacular results: the new SQL interpreter in Vitess is just faster. Faster to write, faster to maintain and faster to execute. The benchmarks speak for themselves:
Evalengine performance in Vitess over time

Here we have a performance comparison of 5 different queries (ranging from very complex to very simple) between three implementations:
old, which is the original AST-based dynamic implementation of the evalengine.
ast, which is the result of adding static type checking to the virtual machine and using them to partially optimize the AST evaluator.
vm, which is the callback-based virtual machine implementation as discussed in this post.
Recent results compared with MySQL

This is the current performance of our evaluation engine pitted against the native C++ implementation in MySQL. Note that measuring the time that MySQL spends in evaluation is very tricky; these are not the total reponse times for a query, but the result of manual instrumentation in the mysqld server to ensure a fair comparison.                                      │     ast      │                 vm                  │                  mysql                   │
                                      │    sec/op    │   sec/op     vs base                │    sec/op     vs base                    │
CompilerExpressions/complex_arith-32    162.75n ± 1%   50.77n ± 1%  -68.81% (p=0.000 n=10)   49.40n ±  5%  -69.64% (p=0.000 n=10+184)
CompilerExpressions/comparison_i64-32    30.30n ± 2%   16.95n ± 1%  -44.08% (p=0.000 n=10)   26.93n ± 22%  -11.12% (p=0.000 n=10+11)
CompilerExpressions/comparison_u64-32    30.57n ± 3%   17.49n ± 1%  -42.78% (p=0.000 n=10)   18.80n ±  9%  -38.53% (p=0.000 n=10+16)
CompilerExpressions/comparison_dec-32    70.75n ± 1%   52.58n ± 2%  -25.68% (p=0.000 n=10)   46.59n ±  5%  -34.14% (p=0.000 n=10+14)
CompilerExpressions/comparison_f-32      53.05n ± 1%   25.65n ± 1%  -51.64% (p=0.000 n=10)   27.75n ± 23%  -47.69% (p=0.000 n=10)
geomean                                  56.30n        28.94n       -48.60%                  31.76n        -43.58%

                                      │    ast     │                   vm                    │
                                      │    B/op    │    B/op     vs base                     │
CompilerExpressions/complex_arith-32    96.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_i64-32   16.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_u64-32   16.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_dec-32   64.00 ± 0%   40.00 ± 0%   -37.50% (p=0.000 n=10)
CompilerExpressions/comparison_f-32     16.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)

                                      │    ast     │                   vm                    │
                                      │ allocs/op  │ allocs/op   vs base                     │
CompilerExpressions/complex_arith-32    9.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_i64-32   1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_u64-32   1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
CompilerExpressions/comparison_dec-32   3.000 ± 0%   2.000 ± 0%   -33.33% (p=0.000 n=10)
CompilerExpressions/comparison_f-32     2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)

The results are stark: the pre-compiled SQL expressions when ran in the new VM are up to 20x times faster than the first implementation of SQL evaluation in Vitess, and for most cases, we've caught up with the performance of the C++ implementation in MySQL. One further detail which is not shown on the graphs, but can be seen on the raw benchmark data, is that the new virtual machine does not allocate memory to perform evaluation — a very nice side effect of the fully specialized instructions thanks to the static type checking.
Overall, we consider getting in the same performance ballpark as MySQL's C++ evaluation engine as a huge engineering success, particularly when the resulting implementation is so easy to maintain.There will always be a performance gap between Go and C++, arising from the trade-off of quality vs compilation speed in the Go compiler, and from the semantics of the language itself, but as we show here, this gap is not insurmountable. With expertise and careful design, it is possible to reap the many benefits of developing and deploying Go services without paying the performance penalty inherent in the language. In this specific case, we got there by having the capacity to perform semantic analysis and statically typing SQL expressions (something which MySQL does not do), and by choosing an efficient virtual machine design that uses the strengths of Go instead of fighting its limitations.
Addendum: So why not JIT?
Inquiring minds may be wondering: what's next? Are we doing JIT compilation next? The answer is no. Although this design for a compiler and VM looks like an exceptional starting point for implementing a full JIT compiler in theory, in practice the trade-off between optimization and complexity doesn't make sense. JIT compilers are important for programming languages where their bytecode operations can be optimized into a very low level of abstraction (e.g. where an "add" operator only has to perform a native x64 ADD). In these cases, the overhead of dispatching instructions becomes so dominant that replacing the VM's loop with a block of JITted code makes a significant performance difference. However, for SQL expressions, and even after our specialization pass, most of the operations remain extremely high level (things like "match this JSON object with a path" or "add two fixed-width decimals together"). The overhead of instruction dispatch, as measured in our benchmarks, is less than 20% (and can possibly be optimized further in the VM's loop). 20% is not the number you're targetting before you start messing around with raw assembly for a JIT. So at this point my intuition is that JIT compilation would be a needlessly complex dead optimization.]]></content>
        <summary><![CDATA[A novel technique for implementing dynamic language interpreters in Go, applied to the Vitess SQL evaluation engine]]></summary>
      </entry>
    
      <entry>
        <title>The Real Failure Rate of EBS</title>
        <link href="https://planetscale.com/blog/the-real-fail-rate-of-ebs" />
        <id>https://planetscale.com/blog/the-real-fail-rate-of-ebs</id>
        <published>2025-03-18T00:00:00.000Z</published>
        <updated>2025-03-18T00:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[PlanetScale has deployed millions of Amazon Elastic Block Store (EBS) volumes across the world. We create and destroy tens of thousands of them every day as we stand up databases for customers, take backups, and test our systems end-to-end. Through this experience, we have an unique viewpoint into the failure rate and mechanisms of EBS, and have spent a lot of time working on how to mitigate them.
In complex systems, failure isn’t a binary outcome. Cloud native systems are built without single paths of failure, but partial failure can still result in degraded performance, loss of user-facing availability, and undefined behavior. Often, minor failure in one part of the stack appears as a full failure in others.
For example, if a single instance inside of a multi-node distributed caching system runs out of networking resources, the downstream application will interpret error cases as cache misses to avoid failing the request. This will overwhelm the database when the application floods it with queries to fetch data as though it was missing. In this, a partial failure at one level results in a full failure of the database tier, causing downtime.
Defining Failure
While full failure and data loss is very rare with EBS, “slow” is often as bad as “failed”, and that happens much much more often.
Here’s what “slow” looks like, from the AWS Console:

This volume has been operating steadily for at least 10 hours. AWS has reported it at 67% idle, with write latency measuring at single-digit ms/operation. Well within expectations. Suddenly, at around 16:00, write latency spikes to 200ms-500ms/operation, idle time races to zero, and the volume is effectively blocked from reading and writing data.
To the application running on top of this database: this is a failure. To the user, this is a 500 response on a webpage after a 10 second wait. To you, this is an incident. At PlanetScale, we consider this full failure because our customers do.
The EBS documentation is useful in helping us understand what promises AWS’ gp3 is able to make:
When attached to an EBS–optimized instance, General Purpose SSD (gp2 and gp3) volumes are designed to deliver at least 90 percent of their provisioned IOPS performance 99 percent of the time in a given year
This means a volume is expected to experience under 90% of its provisioned performance 1% of the time. That’s 14 minutes of every day or 86 hours out of the year of potential impact. This rate of degradation far exceeds that of a single disk drive or SSD. This is the cost of separating storage and compute and the sheer complexity of the software and networking components between the client and the backing disks for the volume.
In our experience, the documentation is accurate: sometimes volumes pass in and out of their provisioned performance in small time windows:

However, these short windows are enough to have impact on real-time workloads:

Production systems are not built to handle this level of sudden variance. When there are no guarantees, even overprovisioning doesn’t solve the problem. If this were a once-in-a-million chance, it would be different, but as we’ll discuss below, that is far from the case.
The True Rate of Failure
At PlanetScale, we see failures like this on a daily basis - the rate of failure is frequent enough that we’ve built systems that monitor EBS volumes directly to minimize impact.
This is not a secret, it's from the documentation. AWS doesn’t describe how failure is distributed for gp3 volumes, but in our experience it tends to last 1-10 minutes at a time. This is likely the time needed for a failover in a network or compute component.
Let's assume the following: Each degradation event is random, meaning the level of reduced performance is somewhere between 1% and 89% of provisioned, and your application is designed to withstand losing 50% of its expected throughput before erroring. If each individual failure event lasts 10 minutes, every volume would experience about 43 events per month, with at least 21 of them causing downtime!
In a large database composed of many shards, this failure compounds. Assume a 256 shard database where each shard has one primary and two replicas: a total of 768 gp3 EBS volumes provisioned. If we take the 50% threshold from above, there is a 99.65% chance you have at least one node experiencing a production-impacting event at any given time.
Even if you use io2, which AWS sells at 4x to 10x the price, you’d still be expected to be in a failure condition roughly one third of the time in any given year on just that one database!
To make matters worse, we also see these frequently as correlated failure inside of a single zone, even using io2 volumes:

With enough volumes, the rate of experiencing EBS failure is 100%: our automated mitigations are consistently recycling underperforming EBS volumes to reduce customer-impact, and we expect to see multiple events on a daily basis.
That’s the true rate of failure of EBS: it’s constant, variable, and all by design. Because there are no performance guarantees when volumes are not operating to their specifications, it is extremely difficult to plan around for workloads that require consistent performance. You can pay for additional nines, but with enough drives over a long enough timeframe, failure is guaranteed.
Handling Failure
At PlanetScale, our mitigations have clamped down on the expected maximum time for an impact window. We monitor metrics such as read/write latency and idle % closely, and we've even developed basic tests like making sure we can write to a file. This allows us to respond quickly to performance issues, and ensures that an EBS volume isn’t ‘stuck’.
When we detect that an EBS volume is in a degraded state using these heuristics, we can perform a zero-downtime reparent in seconds to another node in the cluster, and automatically bring up a replacement volume. This doesn’t reduce the impact to zero, as it’s impossible to detect this failure before it happens, but it does ensure the majority of the cases don’t require a human to remediate and are over before users notice.
This is why we built PlanetScale Metal. With a shared-nothing architecture that uses local storage instead of network-attached storage like EBS, the rest of the shards and nodes in a database are able to continue to operate without problem.]]></content>
        <summary><![CDATA[Our experience running AWS EBS at scale for critical workloads]]></summary>
      </entry>
    
      <entry>
        <title>IO devices and latency</title>
        <link href="https://planetscale.com/blog/io-devices-and-latency" />
        <id>https://planetscale.com/blog/io-devices-and-latency</id>
        <published>2025-03-13T00:00:00.000Z</published>
        <updated>2025-03-13T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Non-volatile storage is a cornerstone of modern computer systems.Every modern photo, email, bank balance, medical record, and other critical pieces of data are kept on digital storage devices, often replicated many times over for added durability.
Non-volatile storage, or colloquially just "disk", can store binary data even when the computer it is attached to is powered off.Computers have other forms of volatile storage such as CPU registers, CPU cache, and random-access memory, all of which are faster but require continuous power to function.
Here, we're going to cover the history, functionality, and performance of non-volatile storage devices over the history of computing, all using fun and interactive visual elements.This blog is written in celebration of our latest product release: PlanetScale Metal.Metal uses locally attached NVMe drives to run your cloud database, as opposed to the slower and less consistent network-attached storage used by most cloud database providers.This results in a blazing fast queries, low latency, and unlimited IOPS.Check out the docs to learn more.
Tape Storage
As early as the 1950s, computers were using tape drives for non-volatile digital storage.Tape storage systems have been produced in many form factors over the years, ranging from ones that take up an entire room to small drives that can fit in your pocket, such as the iconic Sony Walkman.A tape Reader is a box containing hardware specifically designed for reading tape cartridges.Tape cartridges are inserted and then unwound which causes the tape to move over the IO Head, which can read and write data.
Though tape started being used to store digital information over 70 years ago, it is still in use today for certain applications.A standard LTO tape cartridge has several-hundred meters of 0.5 inch wide tape.The tape has several tracks running along its length, each track being further divided up into many small cells.A single tape cartridge contains many trillions of cells.
Each cell can have its magnetic polarization set to up or down, corresponding to a binary 0 or 1.Technically, the magnetic field created by the transition between two cells is what makes the 1 or zero.A long sequence of bits on a tape forms a page of data.In the visualization of the tape reader, we simplify this by showing the tape as a simple sequence of data pages, rather than showing individual bits.
When a tape needs to be read, it is loaded into a reader, sometimes by hand and sometimes by robot.The reader then spins the cartridge with its motor and uses the reader head to read off the binary values as the tape passes underneath.
Give this a try with the (greatly slowed down) interactive visualization below.You can control the speed of the tape if you'd like it faster or slower.You can also issue read requests and write requests and then monitor how long these take.You'll also be able to see the queue of pending IO operations pop up in the top-left corner.Try issuing a few requests to get a feel for how tape storage works:
If you spend enough time with this, you will notice that:
If you read/write to a cell "near" the read head, it's fast.
If you read/write to a cell "far" from the read head, it's slow.
Even with modern tape systems, reading data that is far away on a tape can take 10s of seconds, because it may need to spin the tape by hundreds of meters to reach the desired data.Let's compare two more specific, interactive examples to illustrate this further.
Say we need to read a total of 4 pages and write an additional 4 pages worth of data.In the first scenario, all 4 pages we need to read are in a neat sequence, and the 4 to write to are immediately after the reads.You can see the IO operations queued up in the white container on the top-left.Go ahead and click the Time IO button to see this in action, and observe the time it takes to complete.
As you can see, it takes somewhere around 3-4 seconds.On a real system, with an IO head that can operate much faster and motors that can drive the spools more quickly, it would be much faster.
Now consider another scenario where we need to read and write the same number of pages.However, these reads and writes are spread out throughout the tape.Go ahead and click the Time IO button again.
That took ~7x longer for the same total number of reads and writes!Imagine if this system was being used to load your social media feed or your email inbox.It might take 10s of seconds or even a full minute to display.This would be totally unacceptable.
Though the latency for random reads and writes is poor, tape systems operate quite well when reading or writing data in long sequences.In fact, tape storage still has many such use cases today in the modern tech world.Tape is particularly well-suited for situations where there is a need for massive amounts of storage that does not need to be read frequently, but needs to be safely stored.This is because tape is both cheaper per-gigabyte and has a longer shelf-life than its competition: solid state drives and hard disk drives.For example, CERN has a tape storage data warehouse with over 400 petabytes of data under management.AWS also offers tape archiving as a service.
What tape is not well suited for is high-traffic transactional databases.For these and many other high-performance tasks, other storage mediums are needed.
Hard Disk Drives
The next major breakthrough in storage technology was the hard disk drive.
Instead of storing binary data on a tape, we store them on a small circular metal disk known as the Platter.This disk is placed inside of an enclosure with a special read/write head, and spins very fast (7200 RPM is common, for example).Like the tape, this disk is also divided into tracks.However, the tracks are circular, and a single disk will often have well over 100,000 tracks.Each track contains hundreds of thousands of pages, and each page containing 4k (or so) of data.
An HDD requires a mechanical spinning motion of both the reader and the platter to bring the data to the correct location for reading.One advantage of HDD over tape is that the entire surface area of the bits is available 100% of the time.It still takes time to move the needle + spin the disk to the correct location for a read or write, but it does not need to be "uncovered" like it needs to be for a tape.This combined with the fact that there are two different things that can spin, means data can be read and written with much lower latency.A typical random read can be performed in 1-3 milliseconds.
Below is an interactive hard drive.You can control the speed of the platter if you'd like it faster or slower.You can request that the hard drive read a page and write to a nearby available page.If you request a read or write before the previous one is complete, a queue will be built up, and the disk will process the requests in the order it receives them.As before, you'll also be able to see the queue of pending IO operations in the white IO queue box.
As with the tape, the speed of the platter spin has been slowed down by orders of magnitude to make it easier to see what's going on.In real disks, there would also be many more tracks and sectors, enough to store multiple terabytes of data in some cases.
Let's again consider a few specific scenarios to see how the order of reads and writes affects latency.
Say we need to write a total of three pages of data and then read 3 pages afterward.The three writes will happen on nearby available pages, and the reads will be from tracks 1, 4, and 3.Go ahead and click the Time IO button.You'll see the requests hit the queue, the reads and writes get fulfilled, and then the total time at the end.
Due to the sequential nature of most of these operations, all the tasks were able to complete quickly.
Now consider the same set of 6 reads and writes, but with them being interleaved in a different order.Go ahead and click the Time IO button again.
If you had the patience to wait until the end, you should notice how the same total number of reads and writes took much longer.A lot of time was spent waiting for the platter to spin into the correct place under the read head.
Magnetic disks have supported command queueing directly on the disks for a long time (80s with SCSI, 2000s with SATA).Because of this, the OS can issue multiple commands that run in parallel and potentially out-of-order, similar to SSDs.Magnetic disks also improve their performance if they can build up a queue of operations that the disk controller can then schedule reads and writes to optimize for the geometry of the disk.
Here's a visualization to help us see the difference between the latency of a random tape read compared to a random disk read.A random tape read will often take multiple seconds (I put 1 second here to be generous) and a disk head seek takes closer to 2 milliseconds (one thousandth of a second)
Even though HDDs are an improvement over tape, they are still "slow" in some scenarios, especially random reads and writes.The next big breakthrough, and currently the most common storage format for transactional databases, are SSDs.
Solid State Drives
Solid State Storage, or "flash" storage, was invented in the 1980s.It was around even while tape and hard disk drives dominated the commercial and consumer storage spaces.It didn't become mainstream for consumer storage until the 2000s due to technological limitations and cost.
The advantage of SSDs over both tape and disk is that they do not rely on any mechanical components to read data.All data is read, written, and erased electronically using a special type of non-volatile transistor known as NAND flash.This means that each 1 or 0 can be read or written without the need to move any physical components, but 100% through electrical signaling.
SSDs are organized into one or more targets, each of which contains many blocks which each contain some number of pages.SSDs read and write data at the page level, meaning they can only read or write full pages at a time.In the SSD below, you can see reads and writes happening via the lines between the controller and targets (also called "traces").
The removal of mechanical components reduces the latency between when a request is made and when the drive can fulfill the request.There is no more waiting around for something to spin.
We're showing small examples in the visual to make it easier to follow along, but a single SSD is capable of storing multiple terabytes of data.For example, say each page holds 4096 bits of data (4k).Now, say each block stores 16k pages, each target stores 16k blocks, and our device has 8 targets.This comes out to 4k * 16k * 16k * 8 = 8,796,093,022,208 bits, or 8 terabytes.We could increase the capacity of this drive by adding more targets or packing more pages in per block.
Here's a visualization to help us see the difference between the latency of a random read on an HDD vs SSD.A random read on an SSD varies by model, but can execute as fast as 16μs (μs = microsecond, which is one millionth of a second).
It would be tempting to think that with the removal of mechanical parts, the organization of data on an SSD no longer matters.Since we don't have to wait for things to spin, we can access any data at any location with perfect speed, right?
Not quite.
There are other factors that impact the performance of IO operations on an SSD.We won't cover them all here, but two that we will discuss are parallelism and garbage collection.
SSD Parallelism
Typically, each target has a dedicated line going from the control unit to the target.This line is what processes reads and writes, and only one page can be communicated by each line at a time.Pages can be communicated on these lines really fast, but it still does take a small slice of time.The organization of data and sequence of reads and writes has a significant impact on how efficiently these lines can be used.
In the interactive SSD below, we have 4 targets and a set of 8 write operations queued up.You can click the Time IO button to see what happens when we can use the lines in parallel to get these pages written.
In this case, we wrote 8 pages spread across the 4 targets.Because they were spread out, we were able to leverage parallelism to write 4 at a time in two time slices.
Compare that with another sequence where the SSD writes all 8 pages to the same target.The SSD can only utilize a single data line for the writes.Again, hit the Time IO button to see the timing.
Notice how only one line was used and it needed to write sequentially.All the other lines sat dormant.
This demonstrates that the order in which we read and write data matters for performance.Many software engineers don't have to think about this on a day-to-day basis, but those designing software like MySQL need to pay careful attention to what structures data is being stored in and how data is laid out on disk.
SSD Garbage Collection
The minimum "chunk" of data that can be read from or written to an SSD is the size of a page.Even if you only need a subset of the data within, that is the unit that requests to the drive must be made in.
Data can be read from a page any number of times.However, writes are a bit different.After a page is written to, it cannot be overwritten with new data until the old data has been explicitly erased.The tricky part is, individual pages cannot be erased.When you need to erase data, the entire block must be erased, and afterwards all of the pages within it can be reused.
Each SSD needs to have an internal algorithm for managing which pages are empty, which are in use, and which are dirty.A dirty page is one that has been written to but the data is no longer needed and ready to be erased.Data also sometimes needs to be re-organized to allow for new write traffic.The algorithm that manages this is called the garbage collector.
Let's see how this can have an impact by looking at another visualization.In the below SSD, all four of the targets are storing data.Some of the data is dirty, indicated by red text.We want to write 5 pages worth of data to this SSD.If we time this sequence of writes, the SSD can happily write them to free pages with no need for extra garbage collection.There are sufficient unused pages in the first target.
Now say we have a drive with different data already on it, but we want to write those same 5 pages of data to it.In this drive, we only have 2 pages that are unused, but a number of dirty pages.In order to write 5 pages of data, the SSD will need to spend some time doing garbage collection to make room for the new data.When attempting to time another sequence of writes, some garbage collection will take place to make room for the data, slowing down the write.
In this case, the drive had to move the two non-dirty pages from the top-left target to new locations.By doing this, it was able to make all of the pages on the top-left target dirty, making it safe to erase that data.This made room for the 5 new pages of data to be written.These additional steps significantly slowed down the performance of the write.
This shows how the organization of data on the drive can have an impact on performance.When SSDs have a lot of reads, writes, and deletes, we can end up with SSDs that have degraded performance due to garbage collection.Though you may not be aware, busy SSDs do garbage collection tasks regularly, which can slow down other operations.
These are just two of many reasons why the arrangement of data on a SSD affects its performance.
Storage in the cloud
The shift from tape, to disk, to solid state has allowed durable IO performance to accelerate dramatically over the past several decades.However, there is another phenomenon that has caused an additional shift in IO performance: moving to the cloud.
Though there were companies offering cloud compute services before this, the mass move to cloud gained significant traction when Amazon AWS launched in 2006.Since that time, tens of thousands of companies have moved their app servers and database systems to their cloud and other similar services from Google, Microsoft, and others.
Though there are many upsides to this trend, there are several downsides.One of these is that servers tend to have less permanence.Users rent (virtualised) servers on arbitrary hardware within gigantic data centers.These servers can get shut down at any time for a variety of reasons - hardware failure, hardware replacement, network disconnects, etc.When building platforms on rented cloud infrastructure, computer systems need to be able to tolerate more frequent failures at any moment.This, along with many engineers' desire for dynamically-scaleable storage volumes has led to a new sub-phenomenon: Separation of storage and compute.
Separating storage from compute
Traditionally, most servers, desktops, laptops, phones and other computing devices have their non-volatile storage directly attached.These are attached with SATA cables, PCIe interfaces, or even built directly into the same SOC as the RAM, CPU, and other components.This is great for speed, but provides the following challenges:
If the server goes down, the data goes down with it.
The storage is of a fixed size.
For application servers, 1. and 2. are typically not a big deal since they work well in ephemeral environments by design.If one goes down, just spin up a new one.They also don't typically need much storage, as most of what they do happens in-memory.
Databases are a different story.If a server goes down, we don't want to lose our data, and data size grows quickly, meaning we may hit storage limits.Partly due to this, many cloud providers allow you to spin up compute instances with a separately-configurable storage system attached over the network.In other words, using network-attached storage as the default.
When you create a new server in EC2, the default is typically to attach an EBS network storage volume.Many database services including Amazon RDS, Amazon Aurora, Google Cloud SQL, and PlanetScale rely on these types of storage systems that have compute separated from storage over the network.This provides a nice advantage in the that the storage volume can be dynamically resized as data grows and shrinks.It also means that if a server goes down, the data is still safe, and can be re-attached to a different server.This simplicity has come at a cost, however.
Local vs network storage
Consider the following simple configuration.In it, we have a server with a CPU, RAM, and direct-attached NVMe SSD.NVMe SSDs are a type of solid state disk that use the non-volatile memory host controller interface specification for blazing-fast IO speed and great bandwidth.In such a setup, the round trip from CPU to memory (RAM) takes about 100 nanoseconds (a nanosecond is 1 billionth of a second).A round trip from the CPU to a locally-attached NVMe SSD takes about 50,000 nanoseconds (50 microseconds).
This makes it pretty clear that it's best to keep as much data in memory as possible for faster IO times.However, we still need disk because (A) memory is more expensive and (B) we need to store our data somewhere permanent.As slow as it may seem here, a locally-attached NVMe SSD is about as fast as it gets for modern storage.
Let's compare this to the speed of a network-attached storage volume, such as EBS.Read and write requires a short network round trip within a data center.The round trip time is significantly worse, taking about 250,000 nanoseconds (250 microseconds, or 0.25 milliseconds).
Using the same cutting-edge SSD now takes an order of magnitude longer to fulfill individual read and write requests.When we have large amounts of sequential IO, the negative impact of this can be reduced, but not eliminated.We have introduced significant latency deterioration for every time we need to hit our storage system.
Another issue with network-attached storage in the cloud comes in the form of limiting IOPS.Many cloud providers that use this model, including AWS and Google Cloud, limit the amount of IO operations you can send over the wire.By default, a GP3 EBS instance on Amazon allows you to send 3000 IOPS per-second.This can be configured higher, but comes at extra cost.
The older GP2 EBS volumes operate with a pool of IOPS that can be built up to allow for occasional bursts.The following visual shows how this works.Note that the burst balance size is smaller here than in reality to make it easier to see.
If instead you have your storage attached directly to your compute instance, there are no artificial limits placed on IO operations.You can read and write as fast as the hardware will allow for.
For as many steps as we've taken forward in IO performance over the years, this seems like a step in the wrong direction.This separation buys some nice conveniences, but at what cost to performance?
How do we overcome issue 1 (data durability) and 2 (drive scalability) while keeping good IOPS performance?
Issue 1 can be overcome with replication.Instead of relying on a single server to store all data, we can replicate it onto several computers.One common way of doing this is to have one server act as the primary, which will receive all write requests.Then 2 or more additional servers get all the data replicated to them.With the data in three places, the likelihood of losing data becomes very small.
Let's look at concrete numbers.As a made up value, say in a given month, there is a 1% chance of a server failing.With a single server, this means we have a 1% chance of losing our data each month.This is an unacceptable for any serious business purpose.However, with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million).At PlanetScale the protection is actually far stronger than even this, as we automatically detect and replace failed nodes in your cluster.We take frequent and reliable backups of the data in your database for added protection.
Problem 2. can be solved, though it takes a bit more manual intervention when working with directly-attached SSDs.We need to ensure that we monitor and get alerted when our disk approaches capacity limits, and then have tools to easily increase capacity when needed.With such a feature, we can have data permanence, scalability, and blazing fast performance. This is exactly what PlanetScale has built with Metal.
The solution: Metal
Planetscale just announced Metal, an industry-leading solution to this problem.
With Metal, you get a full-fledged database cluster set up (Vitess or Postgres), with each database instance running with a direct-attached NVMe SSD drive.Each Metal cluster comes with a primary and two replicas by default for extremely durable data.We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits.Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime.
Perhaps most importantly, with a Metal database, there is no artificial cap on IOPS.You can perform IO operations with minimal latency, and hammer it as hard as you want without being throttled or paying for expensive IOPS classes on your favorite cloud provider.
If you want the ultimate in performance and scalability, try Metal today.]]></content>
        <summary><![CDATA[Take an interactive journey through the history of IO devices, and learn how IO device latency affects performance.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing PlanetScale Metal</title>
        <link href="https://planetscale.com/blog/announcing-metal" />
        <id>https://planetscale.com/blog/announcing-metal</id>
        <published>2025-03-11T00:00:00.000Z</published>
        <updated>2025-03-11T00:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today we are announcing the general availability of an entirely new class of nodes available on PlanetScale: Metal.
What is Metal?
Metal instances are powered by locally-attached NVMe SSD drives and fundamentally change the performance/cost ratio for hosting relational databases on AWS and GCP — with unlimited I/O on every M- cluster type.
Metal has been in production for 3 months with some of our customers, and it has served over 5 trillion queries across 5 petabytes of storage. Workloads have seen as much as a 65% drop in p99 query latency, alongside 53% cost savings compared to Amazon Aurora.
And it's now available to everyone, today. No waitlist, no previews.
Customers on Metal
Now let's hand this launch over to some of our customers to share their experiences:
Block: Cash App on PlanetScale Metal
Intercom: Evolving Intercom's Database Infrastructure: A Progress Update
Depot: 8x faster queries on PlanetScale Metal
PlanetScale just made the fastest SQL database ever
Upgrading PlanetScale Query Insights to Metal
If you want to learn more about how we built Metal, we wrote about it here.
🤘]]></content>
        <summary><![CDATA[Database goes brrrrrrrrrrr.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale Metal: There’s no replacement for displacement</title>
        <link href="https://planetscale.com/blog/planetscale-metal-theres-no-replacement-for-displacement" />
        <id>https://planetscale.com/blog/planetscale-metal-theres-no-replacement-for-displacement</id>
        <published>2025-03-11T00:00:00.000Z</published>
        <updated>2025-03-11T00:00:00.000Z</updated>
        
        <author>
          <name>Richard Crowley</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Announcing PlanetScale Metal
Metal is the latest iteration of PlanetScale’s scalable cloud OLTP database platform powered by Vitess (MySQL-compatible) and Postgres, and now by fast, local NVMe drives that offer effectively unlimited IOPS to ensure your database is screaming fast, especially when your data doesn’t fit in RAM.
Typical PlanetScale Metal configurations are cost-neutral or less expensive than equivalent PlanetScale or Amazon Aurora configurations.When you consider the performance you get per dollar, there’s no contest.
Metal differs from the PlanetScale you already know well in exactly one way: We’ve substituted Amazon EBS and Google Persistent Disk with the fast, local NVMe drives available from the cloud providers.But this small step for an architecture represents a giant leap for performance and price without having to give up durability, reliability, or scalability.
Why network-attached storage isn’t ideal for databases
Network-attached storage products like Amazon EBS and Google Persistent Disk are a lowest-common-denominator technology that make it very easy to do an OK job of building fairly reliable systems. They can be attached to and detached from virtual machines while retaining the data they’re storing, which makes them convenient. They replicate the blocks they’re storing behind the scenes, making one of these volumes more durable than a single hard drive.
But they suffer from problems inherent with separating storage and compute: They’re high latency and that latency varies wildly. Writes to these network-attached volumes pass through a NIC, network gear, and another machine before landing on a hard drive. No amount of engineering from the cloud providers can mask the physical reality that network-attached storage is very far away.
They’re also lower throughput. Even at the very expensive upper end – EBS io2, for example – the network holds the storage hardware back. A local NVMe drive can often perform an order of magnitude more IOPS than a network-attached storage volume.
Take the EBS performance of an r6i.4xlarge EC2 instance, for example. It can perform 40,000 IOPS if the volume or volumes can keep up. (Some EC2 instance types require striping multiple EBS volumes to achieve their maximum performance.) By contrast, an i4i.4xlarge EC2 instance can perform 220,000 random write or 400,000 random read IOPS using local NVMe SSDs!
Network-attached storage is also expensive – $80 per TB for the slowest configuration to $2,573 per TB for the highest-performance EBS io2 volumes most instances can support. All that extra (hidden) hardware behind the scenes adds up. For example, that same i4i.4xlarge is less expensive than an equivalently sized EBS gp3 volume with only 16,000 IOPS attached to an r6i.4xlarge.
What Metal is for

PlanetScale Metal is for high-I/O databases. It’s for the most demanding, most critical workloads. It’s for databases where microseconds matter. And the powerful hardware that enables Metal to serve the most difficult workloads makes everything else faster and more consistent, too.
Constant, wide-ranging, random reads? Working sets that don’t fit into the InnoDB buffer pool? Metal can help you shrink or even avoid a fleet of read-only replicas. memcached? Leave it in the box.
Massive write throughput? Replicas can’t even keep up with the primary? Metal can make your workload fit on fewer shards or maybe even make sharding unnecessary at all.

Low tolerance for high latency? Metal’s directly-attached NVMe drives offer lower and more consistent read and write latency than any network-attached storage.
Why Metal is fast
The magnetic hard drives that were common when MySQL earned its production stripes could do maybe hundreds of IOPS. They were I/O-bound if you so much as looked at them funny. SSDs, first connected via SATA, then SAS, and nowadays NVMe, changed the equation. I/O latency is lower now because SSDs don’t need to seek and because the interconnects have gotten faster, too. Throughput is higher because the interconnects have higher bandwidth.
And that’s really the secret of PlanetScale Metal: Hardware is really good now. The rest is PlanetScale doing everything it takes to let that hardware shine.
Consider a real, million-QPS, production workload on PlanetScale. Its network-attached storage volumes report I/O latency around 1ms. We recently migrated it to PlanetScale Metal, using NVMe drives with I/O latency on the order of microseconds. As a result, its 99th percentile query latency dropped from 9ms to 4ms.

Why Metal databases are durable
The basis for any distributed system’s durability claim is replication. PlanetScale and PlanetScale Metal are no different. The replication that matters here is semi-synchronous, row-based, MySQL replication from a primary to two replicas distributed across three availability zones within a cloud region.
Semi-synchronous replication ensures every write has reached stable storage in two availability zones before it’s acknowledged to the client. Row-based replication integrates logically into transaction processing which allows readable replicas and backups.
PlanetScale databases, Metal or not, are backed up at least daily. More importantly, each and every backup taken is tested by actually restoring it and starting up MySQL. This allows us to automatically and quickly replace failed replicas.
Notably absent from the basis for PlanetScale’s durability is the replication built into network-attached storage products like Amazon EBS and Google Persistent Disk. These volumes fail, or get slow, to a significant enough degree that they simply aren’t good enough as a basis for the durability of a production database.
However, when one of the virtual machines serving one of the three replicas has a fault, the ability to re-attach a storage volume is a significant advantage over having to restore a backup, purely in terms of wall-clock time. In order to quantify the difference, we made some assumptions that were unfair to Metal to see where it stacks up. First of all we assume that 1% of Amazon EC2 instances will fail within 30 days; this is far more often than we observe in production. Then we assume it takes five minutes to detach an Amazon EBS volume, launch a new EC2 instance, and attach that volume to it; we think this is on the fast side of fair. Finally, we assume it takes five hours to restore a backup; this is wildly conservative compared to restore times we see in production for even terabyte-scale databases.
The first failure scenario of note is losing write availability. This requires losing two of the three replicas within the amount of time it takes to re-attach a volume or restore a backup. Given these assumptions, the probability of a PlanetScale Metal database losing write availability due to hardware failure is about 0.000001%.
The second failure scenario is losing all three replicas – data loss. Given these same assumptions, the probability of a PlanetScale Metal database losing data due to hardware failure is about 0.00000000003%.
Durability through replication, backup, and restore is powerful.
This performance comes cheap
Typically, if you want performance and durability you’re expecting to have to compromise on price. As it happens, because coaxing serious performance out of network-attached storage often costs 30 times what the basic network-attached storage costs, local NVMe drives end up being the high-performance and low-dollar option. In fact, a high-performance network-attached storage volume capable of even 20,000 IOPS usually costs more than the virtual machine it’s attached to.
As a concrete measure of how cost effectively PlanetScale Metal delivers performance, consider these ratios of IOPS per dollar for a variety of configurations running in AWS on r6a and i4i instance types:
IOPS / $ (On-Demand price)
xlarge
2xlarge
4xlarge
r6a with EBS gp3 (3,000 IOPS)
3.35
1.68
0.84
r6a with EBS gp3 (16,000 IOPS)
13.2
7.57
4.11
r6a with EBS io2 (20,000 IOPS)
3.80
3.18
2.40
r6a with EBS io2 (40,000 IOPS)
4.45
3.99
3.31
i4i with instance storage
58.41
58.48
58.50
In pure dollar terms, many configurations of PlanetScale Metal in AWS are less expensive even than configurations using Amazon EBS gp3 with 16,000 IOPS.
But that’s not all. Amazon EBS cannot be discounted by either Reserved Instances or Savings Plans but the instance storage that comes with the instances PlanetScale Metal uses can be. PlanetScale Managed customers that adopt Metal can realize even greater savings while still achieving performance network-attached storage can’t touch.
Start using Metal today
PlanetScale Metal is available today in both AWS and GCP. Visit your PlanetScale dashboard to create a new Metal database, migrate an existing PlanetScale database to Metal (online, naturally), or import a database from elsewhere straight into PlanetScale Metal.
Contact us to learn more about Metal.]]></content>
        <summary><![CDATA[Learn how PlanetScale Metal was built and how we ensured it is safe.]]></summary>
      </entry>
    
      <entry>
        <title>Upgrading Query Insights to Metal</title>
        <link href="https://planetscale.com/blog/upgrading-query-insights-to-metal" />
        <id>https://planetscale.com/blog/upgrading-query-insights-to-metal</id>
        <published>2025-03-11T00:00:00.000Z</published>
        <updated>2025-03-11T00:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Since PlanetScale Metal is now generally available, I wanted to share a post describing our experience migrating the PlanetScale database that powers the Query Insights feature to Metal.
Data collection in Query Insights
First, a bit about how we collect the data for Query Insights. The basic steps are:
Collect query-pattern telemetry from the Vitess layer of your PlanetScale database
Publish the data to Kafka
Consume the data from Kafka and write to several MySQL tables, aggregated by time.
The primary scalability concern in the Query Insights pipeline is ensuring that we can process and write data to the database quickly enough to keep up with the inbound volume. To accomplish this, we read Kafka messages in batches, coalesce data in memory to avoid unnecessary writes, and hand the writes off to a thread pool in each Kafka consumer.
Pre-Metal: Write-heavy with provisioned IOPS
The result is that the Query Insights database is very write-heavy. As of this writing, we execute approximately 10k UPDATE/INSERT statements per second. These writes come from 32 consumer processes, each with 25 writer threads for a total max concurrency of 800 threads.
The Query Insights PlanetScale database has 8 shards and, prior to our upgrade to Metal, we'd had to provision more IOPS to the EBS volumes backing MySQL in our sharded keyspace to keep up with the telemetry volume.Since this workload had demonstrated a sensitivity to I/O latency, we figured it would be a good candidate for upgrading to Metal.
Performance improvements after migrating to Metal
To do this, we picked 1 of our 8 MySQL shards, the busiest one, to upgrade first.
The following graphs show the query latency at various percentiles.The lines shows the latency for the 8 primaries of the Insights database.The purple line corresponds to our busiest shard, which was upgraded to Metal around 19:35.




Upgrading a test shard to Metal causes a substantial decrease in latency across all the measured percentiles.After the Metal upgrade, our busiest shard with the highest latencies started executing queries faster than the other shards by a significant margin.
After letting the first upgrade soak for a few days, we upgraded the remaining shards and saw nearly identical improvement in performance.
Without making any changes to our application, architecture, or sharding configuration, we were able to realize substantial performance improvements by upgrading to PlanetScale Metal.This resulted in a lower average backlog in our Kafka consumers, and has given us additional capacity to handle increasing message volume in the future.]]></content>
        <summary><![CDATA[Our experience upgrading the Query Insights database to PlanetScale Metal]]></summary>
      </entry>
    
      <entry>
        <title>Automating cherry-picks between OSS and private forks</title>
        <link href="https://planetscale.com/blog/automating-cherry-picks-between-oss-and-private-forks" />
        <id>https://planetscale.com/blog/automating-cherry-picks-between-oss-and-private-forks</id>
        <published>2025-01-14T00:00:00.000Z</published>
        <updated>2025-01-14T00:00:00.000Z</updated>
        
        <author>
          <name>Manan Gupta</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Introduction: The challenge of staying up-to-date
One of the major challenges faced by large companies customizing open-source projects for their specific use cases is staying aligned with upstream changes. At PlanetScale, we encountered this issue firsthand while managing our private modifications alongside the continual evolution of the open-source Vitess project.
The early days: A manual approach
In the early days, when our private changes were relatively small, we used a straightforward approach to maintain the diff. Each week, a GitHub Action running on a cron would cherry-pick all private changes onto the latest updates from the main branch. While this method worked initially, it quickly became unsustainable as PlanetScale’s private diff grew with our increasing sophistication.
The situation became even more complicated when we decided to align with stable release versions of Vitess rather than the latest code from the main branch. This introduced an additional challenge: maintaining the private diff not just on the main branch but also across multiple release branches.
The first attempt: Git-replay
One of our early observations was that when cherry-picking private commits onto multiple release branches, we frequently encountered the same conflicts repeatedly. To address this, we developed a tool that could sequentially process all relevant commits and replay them on top of the open-source (OSS) branch. This tool, aptly named git-replay, could store how specific conflicts were resolved during cherry-picks and reuse that information to resolve similar conflicts in the future.
While git-replay was a significant improvement over our previous manual process, it wasn’t without its limitations. Someone still had to run the tool, manually resolve any unrecognized conflicts, and verify that all changes were correctly cherry-picked. To ensure no commits were missed, we generated a code diff between the OSS branch and the rebased private branch, followed by a painstaking manual review by code owners across different parts of the repository.
We used this tool for several releases, but as our private diffset grew larger than ever before, the limitations of git-replay became increasingly apparent. It was clear that a more comprehensive solution was needed. This realization led to the creation of the Vitess cherry-pick bot, marking a major overhaul in how we managed our private diffset.
The start of something new: The requirements
As our private diffset continued to grow, we realized the need for a more continuous and efficient process. Instead of performing a massive cherry-pick run every time we needed a new release, we envisioned a system where open-source (OSS) changes would flow seamlessly into our private fork. This led us to define a model for continuously tracking changes in the main branch of OSS Vitess, synchronizing them with the corresponding upstream branch in our private fork. We also set up private equivalents of Vitess release branches. These private branches would also include our private diff.
OSS Branch
Private Equivalent
main
upstream
release-22.0
latest-22.0
release-x.0
latest-x.0
Key requirements
To achieve this, we established the following requirements for the new process:
Any PR merged into OSS main should be cherry-picked into private upstream.
Any PR backported from OSS main into an OSS release branch release-x.0 should trigger the backport of the corresponding cherry-pick PR to the private branch (latest-x.0).
Private changes will continue to go into private upstream, and will be cherry-picked as needed into the private latest branches.
Whenever a new OSS release branch is cut from OSS main, a corresponding private latest branch should be created from private upstream.
Automate as much of this process as possible to minimize manual effort.
The promise of automation
This new process offered several clear benefits:
We no longer needed to explicitly maintain the private diffset.
PRs from OSS would flow continuously into the private fork, ensuring it remained up-to-date.
With these goals in mind, we began developing a bot to automate and streamline as much of this workflow as possible. This marked the beginning of the Vitess cherry-pick bot.
Grand stage entry: The Vitess cherry-pick bot
With the requirements in place, we began exploring how to bring the bot to life. Several critical design questions arose:
Should the bot be hosted on a dedicated server or leverage GitHub Actions and applications?
Should it be stateless, storing all information in PRs, or stateful with a dedicated data store?
After extensive deliberation, we opted for a solution that runs periodically on a cron schedule using GitHub Actions, with its state stored in a PlanetScale database instance. The bot operates on an hourly cron schedule in GitHub Actions and performs two core tasks: Cherry-Picking and Backporting.
Cherry-picking
Identifying PRs for Cherry-Picking
The bot uses the go-github library to interact with the GitHub API and fetch recently closed PRs from the Vitess repository.
It filters out PRs that are closed without being merged and inserts the remaining PRs into the database for cherry-picking.
To determine how far back in history it needs to fetch, the bot stops once it encounters a PR that predates any PR already present in the database.
Executing Cherry-Picks
Cherry-picking and PR creation happen entirely within the workflow.
The bot uses a GitHub token for authentication to check out the vitess-private repository and create PRs against it.
Handling Conflicts
The workflow creates a PR even if conflicts arise during cherry-picking, ensuring no PRs are missed.
PRs inherit the title and description from the original PR. The original author or the person who merged the PR is assigned as the assignee, unless they are a non-PlanetScale contributor, in which case no one is assigned.
If there are conflicts, the bot creates a draft PR with labels like do not merge and Conflict.
It comments on the PR with the Git status output, highlighting the files with conflicts, and tags the original PR author for resolution.
Finalizing Cherry-Picks
Once the process is complete, the bot marks the PR as cherry-picked in the database.
Backporting
The process for backporting PRs from upstream to latest branches is similar to cherry-picking, with a few key differences:
Backports are not automatically triggered. Instead, they rely on labels applied to PRs in the vitess-private repository.
Labels like Backport to: latest-x.0 signal the bot to initiate the backport.
Once the label is applied, the bot executes the backporting steps, following a workflow akin to cherry-picking.
This approach allowed us to automate and streamline much of the manual effort involved in keeping our private fork aligned with OSS changes. The bot's stateful design and use of GitHub Actions provided a robust, scalable solution to manage the growing complexity of our workflow.
Building confidence
As with any major refactor or overhaul of a critical process, it was essential to implement safeguards to ensure reliability. To this end, we built integrity checks directly into the bot, which runs weekly. These checks provide a summary of any discrepancies, allowing for manual intervention when necessary.
Weekly integrity checks
The bot performs two key reconciliation tests every week:
Upstream Reconciliation Test
This test ensures that the upstream branch remains in sync with OSS changes. It flags the following issues:
Open cherry-pick PRs against upstream.
PRs from main that were not cherry-picked into upstream.
PRs merged directly into latest branches instead of being backported from upstream.
Latest Branches Reconciliation Test
This test ensures that latest branches (e.g., latest-21.0) are consistent. It identifies:
Open backport PRs against latest branches.
PRs merged into latest that are not backports.
PRs backported to latest-x.0 but not other higher numbered latest branches (if any).
Alerts and Manual Inspections
The bot posts a summary of these checks to a dedicated GitHub issue every week, providing visibility into any issues that may require manual inspection or action.
The outcome
With these safety measures in place, we cautiously rolled out the new process and began using the Vitess cherry-pick bot. Over a year and six months later, the results have been remarkable. The bot has saved countless hours of engineering time, allowing our team to focus on building innovative features for our users rather than manually cherry-picking PRs!]]></content>
        <summary><![CDATA[Learn how PlanetScale keeps its private fork of Vitess up-to-date with OSS]]></summary>
      </entry>
    
      <entry>
        <title>Database Sharding</title>
        <link href="https://planetscale.com/blog/database-sharding" />
        <id>https://planetscale.com/blog/database-sharding</id>
        <published>2025-01-09T00:00:00.000Z</published>
        <updated>2025-01-09T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[What is sharding
Sharding is the process of scaling a database by spreading out the data across multiple servers, or shards.Sharding is the go-to database scaling solution for many large organizations managing data at petabyte scale.Your favorite companies likeUber,Shopify,Slack, andCash Appall use sharding with Vitess and MySQL to scale their massive databases.
In this article, you'll learn how sharding works and considerations for designing a performant sharded database cluster.Along the way, you'll be able to interact with database cluster diagrams, giving you the opportunity to really let the concepts sink in via hands-on examples.
With our PlanetScale Vitess clusters, you can utilize Workflows that make explicit sharding much easier than other solutions.Built on top of Vitess, this allows you to horizontally scale without adding additional logic to your application code.Sharded tables appear as a single, unified table to your application.Sign up to try it out yourself or check out our video walkthrough if you want to see how it works on PlanetScale.
Sharding basics
Most small-scale web applications will have one or more application servers that connect to a single, monolithic database server.The applications store all persistent data on this single server, and send queries to it to meet application needs.This includes user account information and whatever other data the application needs in order to operate.
Below is an interactive database cluster with this setup.There is an application server on the left and a single database server on the right.Try clicking insert row and select row to see data get added to and retrieved from the database.
Each row inserted has a user_id, name, and age respectively.We'll use rows with this formatting for several examples going forward.
As a quick aside, you can configure the speed of animations in this post by selecting your desired speed below.
The architecture described above works well for low-demand systems that only need to make a few hundred or a few thousand queries per second.
However, popular software apps often have hundreds of thousands of concurrent users.In some cases, these applications need to store petabytes of information in the database and process millions of queries per second at peak hours.Such huge workloads demand spreading out our database across many servers, rather than just one.Sharding is a popular solution to this problem.
In a sharded database, we will have multiple separate database servers, each with a portion of the total data.Below you can see a simple configuration with an application server that can send queries to one of two database servers (shards).Try inserting and selecting rows again.For any of the remaining interactive visualizations, you can also click on a row in a shard todelete it or click on a shard's labelto empty it completely
With this setup, the code running on the application server has to be aware of all of the shards, know which rows are stored where, and keep a connection open to each.This is not a huge problem with only two shards, but it becomes complex when there are hundreds of them.Storing this logic in the application code can quickly become messy and difficult to maintain.
A better option is to have the app servers connect to an intermediary server that we call a proxy.When an application server needs to use the database, it will send queries to a proxy.The proxy is then responsible for routing the query to the correct shard server
We have such a configuration below.Go ahead and click the insert row button to add rows to the shards.You'll see visually how the proxy server is used to manage the inserts.
PlanetScale builds its sharding solution on top of Vitess.In Vitess, these proxy servers are known as Vitess Gates, or VTGates for short.
After filling up the shards, you can see that there are 6 rows in total, each database server storing 3.Of course, a real sharded database would store many millions or billions of rows per server, but we keep it small here for the sake of learning.
If you inserted more than six rows, you may noticed one or more of the shard servers pulse red indicating that they are being overloaded.If you inserted really fast, you may have seem the proxy server pulse red as well.Server overloading can happen when the query demand is too high or if the server is nearing its storage capacity.As the amount of data grows, we can add more shards to support it.This is a process known as resharding.Let's increase this database cluster to have three shards instead of two.
It's also important to consider that the proxy needs time to process a query as well.Consider the below sharded configuration.We have lots of database capacity, but only one proxy server.Try inserting a bunch of rows in quick succession.
You should see that the proxy flashed red, and inserts were queued up beneath it rather than all being handled at once.This is because the proxy server hit the capacity for simultaneous queries it could process, and had to queue up other inserts.This added latency would be unacceptable for a production database system.To get around this, we can add more proxy servers.Try clicking insert row multiple times again.
Since we have more proxy capacity, we have reduced the likelihood of needing to queue a query.When designing a large-scale sharded database system, it's important to understand the demands and choose the correct number of proxies and shards to handle the workload, even at peak times.
Without the right tooling, the process of migrating a database from unsharded to sharded can be challenging.Some organizations spend months transitioning from an unsharded to sharded architecture.Thankfully, PlanetScale has built tools like the Clusters page and Workflows that make sharding a MySQL database easy, even if that database is serving live, production traffic.
Sharding strategy
Now you know the basics of what a sharded database is.The next question is: how does the data get sharded?.
One of the most important considerations in a sharded database setup is the sharding strategy.The sharding strategy is the set of rules used to determine which rows of data go to which shards.In the above examples, the shard each row was sent to was chosen at random.Typically, your sharding strategy will involve selecting a shard key: the column(s) that you use to determine where each row will reside.The sharding strategy and shard key you choose have a huge impact on how evenly the data is distributed across shards as well as query performance.
Let's consider several options and weigh their pros and cons.
Range sharding
In a Range sharding strategy, the proxy layer decides where each row should go based on pre-defined ranges of values.
For example, say we have four shards and want to use the user_id (the number in the first column of each row) as our sharding key.In such a setup, we could say that all rows with ID 1-25 go to the first shard, 26-50 to the second shard, 51-75 to the third shard, and 76-100 to the fourth.The database cluster below is configured in such a manner.Try inserting some rowsWhat do you notice?
The first 25 inserts all go to the first shard, leading to one hot shard (a shard that is over-worked) and three other cool shards (under-worked).If we continue inserting, the same problem arises for all the other shards.Using naive range-based sharding with IDs is generally a bad idea if our IDs are monotonically increasing, as we have here.
Let's try range sharding with the second column, name.Here, we'll put all names a-f on the first shard, g-m on the second, n-u on the third, and v-z on the fourth.Let's try more insertions
Uh-oh, more problems!None of our users have names in the v-z range, leading to a wasted shard.Such a sharding solution only works well if our users have names that are perfectly evenly distributed across the alphabet.This is rarely true in practice.The letters that names begin with varies based on factors like name popularity and nationality.
Finally, let's try using range sharding on this third column, age.Here, we'll put all ages 0-24 on the first shard, 25-49 on the second, 50-74 on the third, and 75-200 on the fourth.Go ahead and insert a handful of rows
Again, we run into problems here.The vast majority of our users are between 25-74 years of age.Two of our shards are hot with lots of traffic while the other two are quite cold.We could tweak these ranges to better fit our audience, but what happens as the ages of our users shift over time?There must be a better way.
Hash sharding
One of the most popular sharding strategies is hash sharding.In hash sharding, we choose a column to be our shard key and we generate a cryptographic hash of this value for each row that needs to be inserted.Each shard is responsible for storing the rows for a range of hashes, and this process is controlled by the proxy servers.
For the purpose of example, let's say we will run the shard key column through a simple algorithm that always produces a hash between 0 and 100.Hashes 0 - 25 go to the first shard, 26-50 go to the second, 51-75 to the third, and 76-100 to the fourth.The nice thing about hashes is that similar inputs can produce very different outputs.We might pass in the name "joseph" and get hash 45, but the name "josephine" produces 28.Similar names, completely different hashes.This means similar values may end up on totally different servers, a good property to help the data get evenly spread out.
Below we have a server configured to use hash sharding on the name column.Insert a few rows to see how things get distributed.
This seems to work well, but is it optimal?With hash sharding, how do we know what column to choose as our shard key?
Ideally, we want something with high cardinality.The name column is not the ideal choice.There may be very popular names, and even with hashing we might end up with hotter servers than others.
Often a column like user_id is a good choice because each value is unique.We also get the added benefit of hash speed.It's faster to hash a fixed-size integer as compared to a variable-width name string.
Here's an example of using hash sharding on the id column instead.
We still achieve good data distribution, but now with a high-cardinality and fixed-size value.
Other strategies
Range and hash sharding are good choices that cover many scenarios.However, there are other options.Software like Vitess also supports lookup sharding and even custom sharding functions.
In lookup sharding, the developer can set up a table that contains information needed to map incoming data to the appropriate shard.This table is referenced when queries for such tables come in.
Vitess also supports custom sharding functions, allowing developers to write their own functions that take column(s) as input, and produces custom output to let the system know which shard(s) need to be used.
Cross-shard queries
Another important consideration when designing a sharded database cluster is how many shards are required to fulfill each query.The ideal is that most individual queries can be handled by a single shard.Consider the following query to the user table with a user_id, name, and age column as described earlier:SELECT name, age
  FROM user
  WHERE user_id = 1;

Assuming user_id is our primary key, this query will only select one row, which should only live on one shard.That means that the proxy layer only needs to ask one shard to get the answer for the query.If you tried clicking the select row button in some of the earlier interactive examples, you should have already seen this as it was only selecting individual rows.
Let's now consider an additional steps table in our database.This table is used for storing per-day step-count statistics tracked by a health app on our user's phones and watches.Each row has a user_id, step_count, and date, respectively.We also want to shard this table.
A common use for this data may be for a user to request their step history to view it on a line chart.In SQL, the query to fetch this data would look like this:SELECT step_count, date
  FROM steps
  WHERE user_id = 1;

This query may return many rows.How many shards will be needed for the proxy to find all these rows?It depends on how we sharded the steps table to begin with!
Let's consider what would happen if we had done range sharding on the step_count column (the second column).Try inserting some rows and then click select where id == 1 to get all of the entries for user with id = 1.
As you can see, the SELECT query may have to spread across multiple shards to make this work!This is known as a cross-shard query.Cross-shard queries are bad for system performance, and should be avoided whenever possible.When multiple shards need to fulfill a single query, this adds excessive network and CPU overhead to the database cluster.
A better solution would be to shard based on user_id.That way, all entries for a given user will live on the same shard, which will allow us to avoid this cross-shard query.Give it a go below:
This will lead to much better performance for this query.
Updates
Yet another thing to consider when deciding on a column to be your shard key is update frequency.Any time the value in a column of a shard key is changed, it may need to be moved from one shard to another in order to maintain the integrity of our sharding strategy.
Consider again the choice between using step_count or user_id as a shard key.A step_count is a column that may be volatile.Throughout a day, the step count will change as the user continues to walk.We may even want to give a user the ability to manually update the step_count of previous days.Each time a change occurs, the database cluster has to re-evaluate which shard the row belongs on.This means a single row may end up moving around to different shards over time.
Compare that to sharding on user_id.Once we give a user a unique identifier, it is rarely changed.Therefore, once we determine the shard for a row, it will stay on that shard.The only time we will need to move it is in situations where we need to grow or shrink our number of shards.
Always take time to consider the volatility of a column before selecting it as a shard key.
Latency
Adding a proxy layer does come with a downside: added latency.By introducing the proxy, there is an additional network hop for requests coming in to our database.Consider a request from the app to the database and back without a proxy.
Now, compare that to the time it takes to make this same round-trip but with a proxy in the middle.
Clearly, it takes longer!
However, this problem can be minimized with proper consideration for server location.If the proxy and shards all live in the same data center, the added latency can be brought down to 1ms or less.For the vast majority of applications, adding 1ms is worth the scalability achieved with the sharded architecture.For example, Slack runs massive sharded database cluster with Vitess, and reports an average query latency of only 2ms.
Data durability
A replica is a database server that is connected to the primary database server and replicates all data from it.Whether you have gigabytes or petabytes of data, running replicas is always a good idea.
For one, they increase the durability of your data.If the main server goes down, you still have copies of your data on the replicas.Adding replicas decreases the possibility of losing your database due to hardware failure.
Below is an example of a sharded database cluster that is using replication.Each shard has a single primary and then two additional replicas per-shard for data durability.
Replicas are also useful for keeping your system highly available.If your primary server goes down and you have no replicas, your application could experience hours or even days of outage while the server is fixed or replaced and then brought back online.If you have replicas configured, traffic can be switched over to one of them, getting your app back in the action immediately.
PlanetScale runs on the open-source sharding solution Vitess.Vitess allows you to build sharded database clusters with customizable number of replicas per-shard.It also has an orchestration component, which can automatically detect server failures and quickly replace downed primaries.This keeps your data safe while maintaining high-availability.
Fast Backups
For large-scale databases, backup time can easily get out of hand.
Say you have an unsharded 4 terabyte database running on a single server.Now, say our network dictates that a backup can run at 100 MB/s per-server.In this case, backing up this database will take (4TB / 100MBps * 60s * 60m) = ~11 hours.That's a long time!
Hit the run backup button below to see a small-scale example of this.
Alternatively, what if this 4TB was spread out across 4 shards storing 1TB each?Each individual server can capture a backup simultaneously at 100MBps, allowing the cluster to back up the data at 400MBps, taking only ~2.7 hours.Hit the run backup button again.
Notice how much faster the backup completes.This is all thanks to spreading our data out across multiple shards
For more details about how sharding help switch backup performance, check out this blog post.Sharding your database comes with a number of benefits, not all of which are covered here.Check out our other blog post for more details.
Conclusion
Sharding is an excellent solution for scaling a database system.However, getting a sharded database to perform well requires careful attention to sharding strategy, shard key selection, and query optimization.The concepts covered here lay the foundation to help you build a quality sharded database using technologies like Vitess and PlanetScale.
If you're interested in getting up and running with a sharded database, you can sign up for a PlanetScale account and try it out yourself by following our Sharding quickstart.If you'd like hands-on support from our Solutions team, contact us.
Looking for sharded Postgres? We're building Neki — sharded Postgres by the team behind Vitess. Join the waitlist to get early access.]]></content>
        <summary><![CDATA[Learn about the database sharding scaling pattern in this interactive blog.]]></summary>
      </entry>
    
      <entry>
        <title>Anatomy of a Throttler, part 3</title>
        <link href="https://planetscale.com/blog/anatomy-of-a-throttler-part-3" />
        <id>https://planetscale.com/blog/anatomy-of-a-throttler-part-3</id>
        <published>2024-11-19T00:00:00.000Z</published>
        <updated>2024-11-19T00:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[This is the last installment of a three-part blog series.If you missed the previous articles, you can catch up by reading part one and part two.In this conclusion, we discuss throttler clients and their identities, cooperation, prioritization, and constraint issues.
Identifying clients
Our focus continues to be on asynchronous, batch, and massive operations, such as ETLs, data imports, and schema changes. The components that invoke these operations are the throttler's clients. These components need to break down the operation into reasonably small subtasks and periodically check the throttler for permission to proceed. This is a cooperative model where the client asks for permission, and we discuss an alternative design later on. But first, we'd like to propose that clients should identify themselves to the throttler.
If only for the high-level purpose of analyzing or investigating an incident or being able to generate metrics, you generally want to know which operations were being throttled at what time. You want to be able to tell that the daily aggregation ETL job was mostly throttled between 07:00 and 07:25. Or that around 12:00, the throttler was handling requests from multiple clients, including a data import, a schema migration over the customers table, and an hourly cleanup job.
But client identification also serves operational purposes. Is it possible to prioritize one specific job over others? Or perhaps tune one down, or put it entirely on hold for a while? How about prioritizing a category of clients? Such questions can only be answered if we can clearly distinguish between clients.
Does it even make sense to prioritize client requests? Let's begin with what appears to be an extreme scenario, one that will highlight the risks of prioritization.
Exemption and starvation
In a cooperative model, clients ask the throttler for permission. A rogue client might neglect to connect to the throttler and just go ahead and send some massive workload. Or perhaps the throttler has a mechanism with which to exempt requests from a specific client. The end result is the same: all clients play nicely by the rules, but one gets a free pass to operate without limitation.
Going back to replication lag, let's assume the client's workload is such that it exhausts resources and causes replication lag to spike to the scale of many minutes, well beyond the throttler's threshold. Nothing pushes back this client, and it continues to hammer the database for hours. During that time, requests from all other clients are continuously rejected. This is a starvation scenario.
Exemption is risky because it not only blocks operation of other players but can also degrade system performance, going against the very reason for the throttler's existence. In some sense it breaks the rules; yet, it has a place as we'll discuss later.
Prioritization
A safer way is to play within the rules. Instead of exempting, we can consider prioritization, or rather de-prioritization. This can be done using a dice roll: a client asks the throttler for permission. The throttler can choose to roll a die, and if the result is, say, 1 or 2, flat out reject the request, irrespective of the system metrics. We thus consider a ratio of the requests to be rejected.
A rejected client will back off, sleep for a while, then try again. The database is therefore less busy, at the expense of pushing back potential client work. But if we can selectively choose to have a high rejection ratio to one client, while having a low (or zero) rejection ratio to a second client, then we've effectively prioritized the second over the first: the first client will spend more time backing off, even if the database metrics are healthy. During such time, the second client will have more opportunity to do its own work.
It's important to highlight that both clients still play by the rules: none is given permission to act if the database has unhealthy metrics. It's just that one sometimes doesn't even get the chance to check those metrics.
In another model, one could configure the throttler to reject a ratio of requests for all clients, and then have a lower, or zero rejection ratio for a particular client. Thus, a safe way to prioritize one client over others is to de-prioritize all other clients.
Throttling on different metrics
Does it make sense for different clients to throttle based on different metrics? For example, one client would throttle based on replication lag, and then a second client would throttle based on replication lag and also on load average.
Looking closely, this is an exemption scenario. While the second client throttles based on load average, the first client is effectively exempted from checking load average. If that first client's workload is such that it does indeed push load average beyond its threshold, then the second client becomes starved. It never gets a chance to operate.
And yet, this is nuanced. Not all jobs are created equal. Some copy data, others purge data. Some work on busy tables with high write contention, others deal with old data that is not in memory. These different jobs will have different impacts on the system. In practice, the engineer or administrator will be familiar with the type of impact of a specific job and can explicitly assign a specific metric to that job. Does that mean we necessarily need to apply the same metric to all other clients? Logically yes, but in practice, no. In our example above, the first client, exempted from checking load average, might not have a significant impact on the metric in the first place. If load average were to be high even without the first client, throttling that client may not have any impact at all, so we may as well just let it complete its job. There are practical considerations that we should examine as we operate our system.
Where exemption makes sense
Nothing lasts forever. If a client is starved for 10 minutes out of a total runtime of 12 hours, this may not be a big deal.
If a task absolutely has to run at all costs (e.g., fixing an incident) and that pushes resources beyond what we want to see in normal times, so be it.
If the client is an essential part of the system itself, and goes through the throttling mechanism due to data flow design, and does not handle massive data changes, then we may and should exempt it altogether.
Categorization and breakdown
It is further beneficial if a client can identify itself on different levels. For example, in Vitess, a client may be identified as d666bbfc_169e_11ef_b0b3_0a43f95f28a3:vcopier:vreplication:online-ddl. This is a vreplication job with ID d666bbfc_169e_11ef_b0b3_0a43f95f28a3, specifically running the vcopier flow, on behalf of an online-ddl schema migration.
With this identity scheme, it is possible to categorically prioritize (or de-prioritize) all online-ddl jobs, or just this very specific job, or alternatively exempt all vcopier flows entirely.
Observability-wise, this makes it easier to analyze throttler access patterns by categories of requests.
Nothing lasts forever
Jobs and operations eventually complete. But it's also a good idea to put a time limit on any rules you may have set. If you've exempted a category of clients, then it's best if that exemption expires at some point. It is useful to de-prioritize all jobs for a couple of hours during rush hour if some unexpected workload is received, or for the duration of an ongoing investigation.
Cooperation vs. enforcement
We've discussed the potential for rogue (or malfunctioning) clients to skip throttler checks. This is a possible scenario in the cooperative design. An alternative throttling enforcement design puts the throttler between the client and the system. The throttler runs as a proxy, or integrates with an existing proxy, to be able to throttle client requests. Such is the Vitess transaction throttler, which can actively delay database query execution when system performance degrades. Clients cannot bypass the throttler, and may not even be aware of its existence. As such, it's more complicated to identify the clients, and the throttler must rely on domain-specific attributes made available by the client/connection/query to be able to distinguish between clients and implement any needed prioritization.
Conclusion
While we've mostly discussed throttling database systems, the principles laid out should be applicable to throttlers of all systems and services. Dynamic control of the throttler is absolutely critical, and the ability to prioritize or push back specific requests or jobs is essential in production systems.]]></content>
        <summary><![CDATA[Design considerations for implementing a database throttler]]></summary>
      </entry>
    
      <entry>
        <title>Introducing sharding on PlanetScale with workflows</title>
        <link href="https://planetscale.com/blog/introducing-workflows-on-planetscale" />
        <id>https://planetscale.com/blog/introducing-workflows-on-planetscale</id>
        <published>2024-11-07T10:00:00.000Z</published>
        <updated>2024-11-07T10:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We just released our new workflows functionality, which provides you with recipes that run a series of predefined steps to perform actions on your database. Our first workflow enables you to horizontally scale your databases by moving tables to a sharded keyspace, all from within the PlanetScale dashboard.
If you're familiar with Vitess, this first workflow is similar to MoveTables, a critical component to managing your data distribution in Vitess. We are excited to now offer this functionality directly in PlanetScale.
Workflows functionality is available on all PlanetScale plans.
Why did we build this?
At PlanetScale, our goal is to be the most reliable and scalable platform for running relational databases.One of the keys to achieving this is sharding — aproven architecture used by some of the web's largest properties toscale their databases toseveral millions of queries per second.
Up until recently, much of the functionality for creating and configuring sharded databases was only accessible via our Enterprise plan.
The recent release of the Cluster configuration functionality allowed users to create their own sharded keyspaces as well as configure custom VSchema and routing rules.With the addition of workflows, you can also easily migrate existing data to sharded keyspaces and smoothly switch production traffic between them with no downtime.
Over the past few decades, many companies have faced significant challenges scaling their databases, especially at the point where a single server cannot handle all traffic.Being able to easily and safely transition to a sharded environment shifts this phase of a company's existence from an extreme pain point to a smooth, well-tuned process.We want to empower our users with the tools they need for scaling their database to meet any demands.
Let's take a look at how this works.
How to shard your tables with PlanetScale
You can refer to the Sharding quickstart for a full end-to-end tutorial. If you prefer video walkthroughs, check out our latest video:
Before starting a workflow, you'll want to ensure you have a sharded keyspace set up in addition to your unsharded source keyspace.You can view, modify, and create keyspaces from our recently-released Cluster configuration interface.Navigate to “Clusters” and create keyspaces as needed.In this example database, we already have two such keyspaces: gymtracker and gymtracker-sharded:

You can use a workflow to move one or more tables from the unsharded keyspace into a sharded one.To start a workflow, navigate to the “Workflow” UI:

Click “New workflow” to start a new workflow.In this menu, you'll be asked to give the workflow a name, select the source and target keyspaces, and select the table(s) you want to move.

After setting everything up, click “Validate”.PlanetScale will not allow you to start the workflow until all validation checks pass.

After clicking “Create workflow”, you enter into the "copying phase" and can monitor the progress of the workflow.Initially, PlanetScale will migrate your data from the source keyspace to the target.After the initial bulk migration completes, it will continue to replicate any new rows that come in to the target.When you're ready to proceed, we recommend you first Verify data (to ensure everything migrated correctly), and then you can Switch traffic.

After switching traffic, the new sharded table is configured to handle all traffic for the migrated tables!We also give you an option to switch traffic back to the unsharded database, providing an escape hatch in case any unexpected problems arise from the sharded configuration.
With just a few clicks, you can create a sharded keyspace with however many shard you'd like, move existing tables to that keyspace, and switch production traffic to be served from the sharded keyspace.
Vitess workflows
Every PlanetScale database is powered by Vitess — which supports a number of workflows to facilitate managing your database cluster.These include:
MoveTables — Allows you to move tables between keyspaces (between logical databases within your cluster)
Reshard — Facilitates modifying the way that your data is sharded. Allows you to spread your data across more shards or less shards, depending on demand.
Materialize — Allows you to create copies, aggregations, or views of the tables in your Vitess cluster.
LookupVindex — Helps with the creation and population of Lookup Vindexes (lookup index tables to help queries execute faster).
Migrate — Allows you to move tables between distinct Vitess clusters.
For this first release, we focused on supporting MoveTables, specifically when migrating a table from an unsharded keyspace to a sharded one. We believe that this is one of the most important workflows for our users, as it unlocks the ability to horizontally scale existing unsharded databases with minimal friction.
We intend to support more types of workflows in the future. For example, the ability to reshard is also important to allow users to self-manage their growing database systems on PlanetScale. We also plan to integrate the Migration workflow into PlanetScale to help with migrations from outside sources.
Workflows resources
We have a number of resources to help you get up and running with PlanetScale workflows.
We also have detailed documentation that walks you through important concepts and instructions for running your own unsharded to sharded workflows.
Sharding quickstart
Workflows
Avoiding cross-shard queries
Vindexes
Sharding workflow state reference
Pre-sharding checklist
Targeting the correct keyspace
What is a keyspace?
If you have questions or feedback about workflows, contact us. We'd love to chat.]]></content>
        <summary><![CDATA[Run Vitess workflows right from within PlanetScale. Migrate data from unsharded to sharded keyspaces, manage traffic cutover, and easily revert when problems arise.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 21</title>
        <link href="https://planetscale.com/blog/announcing-vitess-21" />
        <id>https://planetscale.com/blog/announcing-vitess-21</id>
        <published>2024-10-29T09:01:00.000Z</published>
        <updated>2024-10-29T09:01:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We're delighted to announce the release of Vitess 21 along with version 2.14.0 of the Vitess Kubernetes Operator.
Version 21 focuses on enhancing query compatibility, improving cluster management, and expanding VReplication capabilities, with experimental support for atomic distributed transactions and recursive CTEs. Key features include reference table materialization, multi-metric throttler support, and enhanced online DDL functionality. Backup and restore processes benefit from a new mysqlshell engine, while vexplain now offers detailed execution traces and schema analysis. The Vitess Kubernetes Operator introduces horizontal autoscaling for VTGate pods and Kubernetes 1.31 support, improving overall scalability and deployment flexibility.
What's New in Vitess 21
Query Compatibility: Experimental support for atomic distributed transactions and recursive CTEs
VReplication: Reference table materialization, dynamic workflow configuration
Cluster Management and VTOrc: More metrics in VTOrc to track errant GTIDs
Throttler: Multi-metric support
Online DDL: Various improvements
Backup & Restore: Experimental mysqlshell engine
Vitess Operator: VTGate scaling, image customization, Kubernetes 1.31 support
VTAdmin: VReplication workflow creation and management, distributed transaction management
VExplain: vexplain trace for detailed query execution insights, vexplain keys for analyzing sharding key usage and optimizing query performance
Query Compatibility
Atomic Distributed Transactions
We’re reintroducing atomic distributed transactions with a revamped, more resilient design.This feature now offers deeper integration with core Vitess components and workflows, such as Online DDL and VReplication (including operations like MoveTables and Reshard). We have also greatly simplified the configuration required to use atomic distributed transactions.This feature is currently in an experimental state, and we encourage you to explore it and share your feedback to help us improve it further.
Recursive Common Table Expressions (CTEs)
Vitess 21 introduces experimental support for recursive CTEs, allowing more complex hierarchical queries and graph traversals.This feature enhances query flexibility, particularly for managing parent-child relationships like organizational structures or tree-like data.As this functionality is still experimental, we encourage you to explore it and provide feedback to help us improve it further.
Cluster Management and VTOrc
We have added a new metric in VTOrc that shows the count of errant GTIDs in all the tablets for better visibility and alerting.This will help operators track and manage errant GTIDs across the cluster.
VReplication
Reference Table Materialization
Vitess provides Reference Tables as a mechanism to replicate commonly used lookup tables from an unsharded keyspace into all shards in a sharded keyspace.Such tables might be used to hold lists of countries, states, zip codes, etc., which are commonly used in joins with other tables in the sharded keyspace.Using reference tables allows Vitess to execute joins in parallel on each shard, thus avoiding cross-shard joins.Previously, we recommended creating Materialize workflows for reference tables but did not provide an easy way to do so. In v21, we have added explicit support to the Materialize command to replicate a set of reference tables into a sharded keyspace.
Dynamic Workflow Configuration
Previously, many configuration options for VReplication workflows were controlled by VTTablet flags.This meant that any change required restarting all VTTablets.We now allow these to be overridden while creating a workflow or updated dynamically once the workflow is in progress.
Throttler: Multi-metric Support
The tablet throttler has been redesigned with new multi-metric support.With this, the throttler now handles more than just replication lag or custom queries, but instead can work with multiple metrics at the same time, and check for different metrics for different clients or for different workflows.This gives users better control over the throttler, allowing them to fine-tune its behavior based on their specific production requirements.
Several new metrics have been introduced in v21, with plans to expand the list of available metrics in later versions.
The multi-metric throttler in v21 is backward compatible with the v20 throttler.It is possible to have a v20 primary tablet collecting throttler data from a v21 replica tablet, and vice versa.This backward compatibility will be removed in v22, where all tablet throttlers will be expected to communicate multi-metric data.
Other key throttler changes:
With the above, the sub-flags --check-as-check-self and --check-as-check-shard to the UpdateThrottlerConfig command are deprecated and slated to be removed in a future version.Similarly, SHOW VITESS_THROTTLER STATUS and SHOW VITESS_THROTTLED_APPS queries, and all /throttler/ API access points (with the exception of /throttler/check) are deprecated and slated to be removed in v22.
When enabled, the throttler ensures it leases heartbeat updates, even if heartbeat configuration is otherwise unset.In other words, the throttler overrides the configuration when it requires heartbeat information.
Throttler check response now includes a human-readable summary detailing exactly why a request was rejected (if rejected).
Online DDL
Several bug fixes and improvements, including:
Added support for the ALTER VITESS_MIGRATION CLEANUP ALL command.
More INSTANT DDL scenario analysis, going further beyond the documented limitations.
In schema changes where columns change charsets, Online DDL now converts the text programmatically rather than using a CONVERT(... USING utf8mb4) clause, thereby improving performance when such columns are part of the Primary Key or the iteration key.
Internally, more of the schema and diff analysis is now delegated to schemadiff library, which means more programmatic power and better testability.
Fixes for self-referencing foreign key tables (only relevant when using the PlanetScale MySQL build).
Backup & Restore
Introducing an experimental mysqlshell engine.With this engine, it is possible to run logical backups and restores.The mysqlshell engine can be used to create full backups, incremental backups, and point-in-time recoveries.It is also available to use with the Vitess Kubernetes Operator.
The mysqlshell engine work was contributed by the Slack engineering team.
VExplain Enhancements
VExplain Trace
The new vexplain trace command provides deeper insights into query execution paths by capturing detailed execution traces.This helps developers and DBAs analyze performance bottlenecks, review query plans, and gain visibility into how Vitess processes queries across distributed nodes.The trace output is delivered as a JSON object, making it easy to integrate with external analysis tools.
VExplain Keys
The new vexplain keys feature helps you analyze how your queries interact with your schema, showing which columns are used in filters, groupings, and joins across tables.This tool is especially useful for identifying candidate columns for indexing, sharding, or optimization, whether you’re using Vitess or a standalone MySQL setup.By providing a clear view of column usage, vexplain keys makes it easier to fine-tune your database for better performance, regardless of your backend infrastructure.
Vitess Kubernetes Operator
Vitess v21.0.0 comes with a companion release of the vitess-operator v2.14.0.In v2.14, we have added the ability to horizontally scale the VTGate deployment using an HPA.We have upgraded the supported version of Kubernetes to the latest version (v1.31).We have added a feature that allows users to select Docker images on a per-keyspace basis instead of a single setting for the entire cluster.
VTAdmin
New VTAdmin pages have been added for creating, monitoring, and managing VReplication Workflows.We have also added a dashboard to view and conclude distributed transactions.
Vitess and the Community
As an open-source project, Vitess thrives on the contributions, insights, and feedback from the community.Your experiences and input are invaluable in shaping the future of Vitess.We encourage you to share your stories and ask questions on GitHub or in our Slack community.
Getting Started
For a seamless transition to Vitess 21, we highly recommend reviewing the detailed release notes.Additionally, you can explore our documentation for guides, best practices, and tips to make the most of Vitess 21.Whether you're upgrading from a previous version or running Vitess for the first time, our resources are designed to support you every step of the way.
Thank you for your support and contributions to the Vitess project!]]></content>
        <summary><![CDATA[Vitess 21 is now generally available.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing the PlanetScale vectors public beta</title>
        <link href="https://planetscale.com/blog/announcing-planetscale-vectors-public-beta" />
        <id>https://planetscale.com/blog/announcing-planetscale-vectors-public-beta</id>
        <published>2024-10-21T10:00:00.000Z</published>
        <updated>2024-10-21T10:00:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We're excited to announce that PlanetScale vector search and storage is now available in open beta! With PlanetScale vector support, you can store your vector data alongside your application's relational MySQL data — eliminating the need for a separate specialized vector database.
What sets PlanetScale vector search and storage apart?
When we decided to add vector support to PlanetScale's MySQL fork, we knew it would be a long journey to ensure the solution met our high standards for performance and scalability.
A crucial piece of this was architecting a search algorithm that allows for both fast performance and large scale. PlanetScale's vector search is based on two innovative research papers from Microsoft Research: SPANN (Space-Partitioned Approximate Nearest Neighbors) and SPFresh. We did additional work to fully integrate our solution with InnoDB and Vitess, which allows us to support transactional operations, ensure data consistency, and efficiently manage vector indexes at terabyte-scale. This makes our vector search solution ideal for large-scale databases.
On top of that, we wanted to make sure our implementation supports the following:
Pre-filtering and post-filtering
Full SQL syntax — including JOIN, WHERE, and subqueries
ACID compliance
Our base implementation checks all of these boxes, and we will continue to improve performance leading up to GA.
Choosing a vector search algorithm
There are a few different algorithms commonly used to implement vector search: Hierarchical Navigable Small Worlds (HNSW) and DiskANN being two of the most popular. These algorithms, however, make technical trade-offs that we deemed inadequate for our implementation.
HNSW has very good query performance, but struggles to scale because it needs to fit its whole dataset in RAM. Most importantly, HNSW indexes cannot be updated incrementally, so they require periodically re-building the index with the underlying vector data. This is just not a good fit for a relational database. DiskANN scales well, but suffers from worse query performance, and while it can be modified to allow incremental updates, these are not particularly efficient and are hard to map to transactional SQL semantics.
Because PlanetScale is designed to support incredible performance for databases at massive scale, we knew these implementations wouldn't work for us. So we set out to find a better solution.
PlanetScale vector search is based on a novel implementation of two state-of-the-art papers from Microsoft Research: SPANN (Space-Partitioned Approximate Nearest Neighbors) and SPFresh. SPANN is a hybrid vector indexing and search algorithm that uses both graph and tree structures, and was specifically designed to work well for larger-than-RAM indexes that require SSD usage. SPFresh extends the design of SPANN with a set of concurrent background maintenance operations that allow the index to be continuously updated without losing recall or query performance.
For our implementation, we have extended SPFresh by adding transactional support to all its operations and fully integrating it inside InnoDB, MySQL's default storage engine. This means that inserts, updates, and deletes of vector data are immediately reflected in the vector index as part of committing your SQL transaction, and follow the same transactional semantics, including support for batch commits and rollbacks.
Since the indexes are fully managed and stored on-disk by InnoDB, they are always in-sync with the vector data in your tables, they survive process crashes with strong consistency guarantees, they do not need to be periodically rebuilt, and they scale all the way into terabytes, just like any other MySQL table. Together with Vitess, PlanetScale's sharding layer, this allows the construction and efficient querying of huge vector indexes that are fully integrated with all the relational data in your database and can be used with JOINs and WHERE clauses while the underlying vector data is continuously updated.
For a comparison of some of the common vector algorithms and indexes, see our Vector database terminology and indexes documentation.
How to enable vector support
To get started, go to your database settings page, click "Beta features", find Vectors and click "Enroll". Vector support is enabled at the branch level, so choose the branch you wish to enroll into the vectors beta. Click the gear icon on the branches page, and click the toggle next to "Enable vectors".

More resources
To learn more about vector embeddings, check out our YouTube video:
You can check out the following documentation:
PlanetScale vectors overview
Vector database terminology and concepts
Common use cases for vector search
PlanetScale vector usage with ORMs
Vector type and index reference
Your feedback is extremely valuable during this beta period, so don’t hesitate to reach out. You can submit a support ticket to relay any feedback or issues. We also have a vectors channel in our Discord where you can ask questions, share feedback, or chat about use cases.]]></content>
        <summary><![CDATA[You can now use the vector data type for vector search and storage in your PlanetScale MySQL database.]]></summary>
      </entry>
    
      <entry>
        <title>Anatomy of a Throttler, part 2</title>
        <link href="https://planetscale.com/blog/anatomy-of-a-throttler-part-2" />
        <id>https://planetscale.com/blog/anatomy-of-a-throttler-part-2</id>
        <published>2024-10-10T00:00:00.000Z</published>
        <updated>2024-10-10T00:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[This is part 2 of a 3-part series. You can catch up with part 1 here.
Up until now, we've only discussed the existence of the throttler and did not make assumptions about its deployment or distribution. We have referenced it in singular form, and indeed it is possible to run a throttler as a singular service. Let's consider the arguments for singular vs. distributed throttler deployments.
Singular throttler design
A singular throttler is an all-knowing service that serves all requests. It should be able to probe and collect all metrics, which means it may need direct access to all database servers and to OS metrics (such as load average, which we mentioned in part 1). Such access could be acceptable in some environments. The throttler will likely hold on to a large group of persistent connections, each of which the throttler will use to read a subset of metrics. It's a simple, monolithic, synchronous approach.
This approach immediately calls into question the issue of high availability. What do you do if the throttler host goes down? We can first assume it is possible to spin up a new throttler service. It may take a few moments until it's up and running, and until it has established all connections and has begun to collect data. In the interim, what do clients do? This is where a design decision must be made: do clients fail open, or do they fail closed? If no throttler is available, should clients consider themselves rejected, or should they proceed with full power? A common approach is to hold off up to some timeout and, from there on, proceed unthrottled, taking into consideration the possibility that the throttler may not be up for a while.
Another option is to run multiple, independent instances of the throttler. They could run in different availability zones, all oblivious to each other, and so we still consider them to be singular. They will all collect the same metrics from the same servers, albeit at slightly different timings. This means two independent throttlers will overall exhibit the same behavior allowing/refusing requests, but it is possible that at any specific point in time, two throttlers will disagree with each other.
More combinations exist, such as having active-passive throttlers, with traffic always directed to the active one. The passive throttlers will be readily available should the active one step down. They will possibly be collecting metrics while passive.
It is clear that the singular approach is susceptible to a scaling problem. There's only so many connections it can maintain while running high-frequency probing.
Metrics access/API
Another issue concerns environments where the throttler cannot be allowed direct access to database and OS metrics due to security restrictions. A common solution is to set up an HTTP server, some API access point, which the throttler can check to get the host metrics. A daemon or an agent running on the host is responsible for collecting the metrics locally to the host. The throttler's work is now made simpler, as it may need to only hit a single access point to collect all metrics for a given host. However, this design introduces many complexities.
First and foremost, the addition of a new component that needs to reliably run continuously on the host. Metric collection is no longer in the hands of the throttler, and we must trust that component to grab those metrics. We then introduce an API or otherwise some handshake between the throttler and the metric collection component, which must be treated with care when it comes to upgrades and backwards compatibility.
Last but not least, we lose synchronicity and introduce multiple layers of collection polling intervals: the agent collects metrics at its own interval. For example, the agent might collect data once per second, while the throttler polls the API at its own interval, which could also be once per second. The throttler may now collect data that is up to 2 seconds stale, as opposed to up to 1 second stale in the monolithic, synchronous approach. Note that there is also a latency effect to the staleness of metrics, but we will leave this discussion to a later stage.
In adding this agent component, even if it's the simplest script to scrape and publish metrics, we've effectively turned our singular throttler into a distributed multi-component system.
Distributed throttler design
We now consider the case for a distributed throttler design, and again find that there are multiple approaches to distribution.
In the context of your environment, does it make sense for a throttler to probe hosts/services outside its availability zone? It is possible to run a throttler per AZ that only collects metrics within that AZ. Each throttler considers itself to be singular and is oblivious to other throttlers, but clients need to know which throttler to address.
A variation of the above could introduce some collaboration between the throttlers: a us-east-1 throttler can advertise its own collected metrics to a us-west-1 throttler. This lets the us-west-1 throttler grab all us-east-1 metrics in one go, and without having to have direct access to the underlying hosts or services. This is a glorified and scaled implementation of the agent architecture above, where all components are throttlers.
Or, we can do functional partitioning on our architectural elements. Some services are disjoint, and we can run different throttlers for different elements, grouped by functionality association. In this design throttlers are again independent of each other.
Going even more granular, there may be a throttler associated with any host, or with any probed service.
A granular throttler design case study
The Vitess tablet throttler combines multiple design approaches to achieve different throttling scopes. The throttler runs on each and every vttablet, mapping one throttler for each MySQL database server. Each such throttler first and foremost collects metrics from its own vttablet host and from its associated MySQL server. Then, the throttlers (or vttablet servers) of any shard, or a replication topology (primary and replicas) collaborate to represent the "shard" throttler. The throttler service on the primary server takes the responsibility for collecting the metrics from all of the shard's throttlers and aggregating them as the "shard" metrics.
Thus, a massive write to the primary is normally throttled by replication lag, a metric collected from all serving replicas. Clients consult with the primary throttler, which accepts or rejects their requests based on the highest lag among all replicas. In contrast, some operations only require a massive read, which can take place on a specific replica. These reads can pollute the replica's page cache and overload its disk I/O, causing it to lag. This will, however, have no effect on the replication performance of other replicas. The client can therefore suffice in checking the throttler on the specific replicas, ignoring any metrics from other servers. This introduces the concept of a metric's scope, which can be an entire shard or a specific host in this scenario.
Different shards represent distinct architectural elements, and there is no cross-shard throttler communication. This limits the hosts/services monitored by any single throttler to a sustainable amount.
It's important to remember that any cross-throttler communication introduces the layered collection poll interval and reduction of granularity, as discussed above.
Reducing the throttler's impact
Can the throttler itself generate load on the system? Let's avoid the infinite recursion trap of throttle-the-throttler and consider the usage pattern.
The throttler can introduce load on the system by over-probing for metric data, as well as by over-communicating between throttler nodes. Clients can introduce load by overwhelming the throttler with requests.
To begin with the low-hanging fruit, busy loops should be avoided at all times. A rejected client should sleep for a pre-determined amount of time before rechecking the throttler. Conversely, depending on the metric, a client might get a free pass for a period of time after a successful check. Consider the case for replication lag: if a client checks for lag and the lag is 0.5sec and the threshold is 5sec, then the next 4.5sec (up to metric collection granularity) are guaranteed to be successful and can be skipped.
The metrics should be collected at an appropriate granularity for the situation. We mentioned the use case for massive background jobs. A job can take a few hours to run, but there may yet be a few hours break between two jobs. During that break, there isn't strictly a need for the throttler to collect data at high rate, and likewise, no strict need for throttlers to communicate with each other at high rate. The throttler can choose to slow down based on lack of requests. It could either stop collecting metrics altogether and go into hibernation, or it might just slow down its normal pace. It would take a client checking the throttler to re-ignite the high-frequency collection of metrics.
This does come at a cost because the very first check, and likely also the next few checks, will run on stale data and potentially reject requests that would otherwise be accepted. However, we defer to the expected client retry mechanism which will induce another check, at such time that the throttler is again fully engaged and has up-to-date metrics.
Another caveat is that with a distributed throttler design, throttlers which depend on each other should be able to inform each other upon being checked. All throttlers who communicate with each other should re-ignite upon the first request to any of them.
Metric hibernation case study
Metric collection is not the only process that can hibernate. Metric generation itself can also hibernate. This may sound ludicrous at first: aren't metrics just there? Let us discuss replication heartbeats.
In Part 1, we've elaborated on the role of replication lag as the single most used throttling metric. While there are alternative techniques, it has been established in the MySQL world that the most reliable way to evaluate replication lag is by injecting timestamps on a dedicated table on the Primary server, then reading the replicated value on a replica, comparing it with the system time on said replica. This technique works well when the replicas are working well, when they are lagging, when replication is stopped, and even when replication gets broken or misconfigured.
There must be some process to routinely inject heartbeats on the Primary database server (pt-heartbeat is a popular tool to do so). You'd need to ensure heartbeats are only ever written on the Primary, and handle failovers and promotion situations.
The injection interval dictates the lag metric granularity, and you need to balance the desire to have the most granular replication lag metric with the overhead of generating those writes.
But then there is another impact to heartbeat generation using this technique: the heartbeat events are persisted in the binary logs, which are then re-written on the replicas. For some users, the introduction of heartbeats causes a significant increase in binlog generation. With more binlog events having to be persisted, more binary log files are generated per given period of time. These consume more disk space. It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set. Furthermore, you may wish to retain the binary logs for an extended period of time, for recovery/auditing purposes, and you may wish to back them up. You'll need larger disks, more backup storage space, and this translates to expenses.
It thus also makes sense to avoid generating those heartbeats, or generate them much less frequently, when not absolutely needed. When no massive background job takes place, we can enjoy little to no overhead, and when there is a massive operation, the heartbeat events are but a small overhead compared with the amounts of data written to the binary logs.
Like throttler hibernation, these heartbeats must be re-ignited at the right time. Like throttler hibernation, the first few checks will read outdated heartbeats, and are most likely to be rejected. It may take a few seconds to get to a fully active operation, when the throttler has re-engaged, heartbeats re-generated, and replication is caught up with at least the very first re-generated heartbeats, and the clients must be prepared for some retries.
To be continued
In the next and final part of this series we will discuss clients, prioritization, starvation scenarios, and more.]]></content>
        <summary><![CDATA[Design considerations for implementing a database throttler with a comparison of singular vs distributed throttler deployments.]]></summary>
      </entry>
    
      <entry>
        <title>B-trees and database indexes</title>
        <link href="https://planetscale.com/blog/btrees-and-database-indexes" />
        <id>https://planetscale.com/blog/btrees-and-database-indexes</id>
        <published>2024-09-09T00:00:00.000Z</published>
        <updated>2024-09-09T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[What is a B-tree?
The B-tree plays a foundational role in many pieces of software, especially database management systems (DBMS).MySQL, Postgres, MongoDB, Dynamo, and many others rely on B-trees to perform efficient data lookups via indexes.By the time you finish this article, you'll have learned how B-trees and B+trees work, why databases use them for indexes, and why using a UUID as your primary key might be a bad idea.You'll also have the opportunity to play with interactive animations of the data structures we discuss.Get ready to click buttons.
Computer science has a plethora of data structures to choose from for storing, searching, and managing data on a computer.The B-tree is one such structure, and is commonly used in database applications.B-trees store pairs of data known as keys and values in what computer programmers call a tree-like structure.For those not acquainted with how computer scientists use the term "tree" it actually looks more like a root system.
Below you'll find the first interactive component of this blog.This allows you to visualize the structure of a B-tree and see what happens as you add key/value pairs and change the number of key/value pairs per node.Give it a try by clicking the Add or Add random button a few times and try to get an intuitive sense of how it works before we move on to the details.
If the animations above are too fast or slow, you can adjust the animation speed of everything that happens with the B-trees in this article.Adjust below:
Every B-tree is made up of nodes (the rectangles) and child pointers (the lines connecting the nodes).We call the top-most node the root node, the nodes on the bottom level leaf nodes, and everything else internal nodes.The formal definition of a B-tree can vary depending on who you ask, but the following is a pretty typical definition.
A B-tree of order K is a tree structure with the following properties:
Each node in the tree stores N key/value pairs, where N is greater than 1 and less than or equal to K.
Each internal node has at least N/2 key/value pairs (an internal node is one that is not a leaf or the root).
Each node has N+1 children.
The root node has at least one value and two children, unless it is the sole node.
All leaves are on the same level.
The other key characteristic of a B-tree is ordering.Within each node the elements are kept in order.Any child to the left of a key must only contain other keys that are less than it.Children to the right must have keys that are greater than it.
This enforced ordering means you can search for a key very efficiently.Starting at the root node, do the following:
Check if the node contains the key you are looking for.
If not, find the location in the node where your key would get inserted into, if you were adding it.
Follow the child pointer at this spot down to the next level, and repeat the process.
When searching in this way, you only need to visit one node at each level of the tree to search for one key.Therefore, the fewer levels it has (or the shallower it is), the faster searching can be performed.Try searching for some keys in the tree below:
B-trees are uniquely suited to work well when you have a very large quantity of data that also needs to be persisted to long-term storage (disk).This is because each node uses a fixed number of bytes.The number of bytes can be tailored to play nicely with disk blocks.
Reading and writing data on hard-drive disks (HDDs) and solid-state disks (SSDs) is done in units called blocks.These are typically byte sequences of length 4096, 8192, or 16384 (4k, 8k, 16k).A single disk will have a capacity of many millions or billions of blocks.RAM on the other hand is typically addressable on a per-byte level.
This is why B-trees work so well when we need to organize and store persistent data on disk.Each node of a B-tree can be sized to match the size of a disk block (or a multiple of this size).
The number of values each node of the tree can store is based on the number of bytes each is allocated and the number of bytes consumed by each key / value pair.In the example above, you saw some pretty small nodes — ones storing 3 integer values and 4 pointers.If our disk block and B-tree node is 16k, and our keys, values, and child pointers are all 8 bits, this means we could store 682 key/values with 683 child pointers per node.A three level tree could store over 300 million key/value pairs (682 × 682 × 682 = 317,214,568).
The B+Tree
B-trees are great, but many database indexes use a "fancier" variant called the B+tree.It's similar to a B-tree, with the following changes to the rules:
Key/value pairs are stored only at the leaf nodes.
Non-leaf nodes store only keys and the associated child pointers.
There are two additional rules that are specific to how B+trees are implemented in MySQL indexes:
Non-leaf nodes store N child pointers instead of N+1.
All nodes also contain "next" and "previous" pointers, allowing each level of the tree to also act as a doubly-linked list.
Here's another visualization showing how the B+tree works with these modified characteristics.This time you can individually adjust the number of keys in inner nodes and in the leaf nodes, in addition to adding key/value pairs.
Why are B+trees better for databases?There are two primary reasons.
Since inner nodes do not have to store values, we can fit more keys per inner node! This can help keep the tree shallower.
All of the values can be stored at the same level, and traversed in-order via the bottom-level linked list.
Go ahead and give searching on a B+tree a try as well:
B+trees in MySQL
MySQL, arguably the world's most popular database management system, supports multiple storage engines.The most commonly used engine is InnoDB which relies heavily on B+trees.In fact, it relies so heavily on them that it actually stores all table data in a B+tree, with the table's primary key used as the tree key.
Whenever you create a new InnoDB table you are required to specify a primary key.Database administrators and software engineers often use a simple auto-incrementing integer for this value.Behind the scenes, MySQL + InnoDB creates a B+tree for each new table created.The keys for this tree are whatever the primary key was set to.The values are the remaining column values for each row, and are stored only in the leaf nodes.

The size of each node in these B+trees is set to 16k by default.Whenever MySQL needs to access a piece of data (keys, values, whatever), it loads the entire associated page (B+tree node) from disk, even if that page contains other keys or values it does not need.
The number of rows stored in each node depends on how "wide" the table is.In a "narrow" table (a table with few columns), each leaf could store hundreds of rows.In a "wide" table (a table with many columns), each leaf may only store a single-digit number of rows.InnoDB also supports rows being larger than a disk block, but we won't dig into that in this post.
Use the visualization below to see how the number of keys in each inner node and in each leaf node affect the depth of the tree.The deeper the tree, the slower it is to look up elements.Thus, we want shallow trees for our databases!
It's also common to create secondary indexes on InnoDB tables — ones on columns other than the primary key.These may be needed to speed up WHERE clause filtering in SQL queries.An additional persistent B+tree is constructed for each secondary index.For these, the key is the column(s) that the user selected the index to be built for.The values are the primary key of the associated row.Whenever a secondary index is used for a query:
A search is performed on the secondary index B+tree.
The primary keys for matching results are collected.
These are then used to do additional B+tree lookup(s) on the main table B+tree to then find the actual row data.
Consider the following database schema:CREATE TABLE user (
  user_id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
  username VARCHAR(256) NOT NULL,
  email VARCHAR(256) NOT NULL,
  PRIMARY KEY (user_id)
);
CREATE INDEX email_index ON user(email);

This will cause two B+tree indexes to be created:
One for the table's primary key, using user_id for the key and the other two columns stored in the values.
Another for the email_index, with email as the key and user_id as the value.
When a query like this is executed:SELECT username FROM user WHERE email = 'x@planetscale.com';

This will first perform a lookup for x@planetscale.com on the email_index B+tree.After it has found the associated user_id value it will use that to perform another lookup into the primary key B+tree, and fetch the username from there.
Overall, we'd like to always minimize the number of blocks / nodes that need to be visited to fulfill a query.The fewer nodes we have to visit, the faster our query can go.The primary key you choose for a table is pivotal in minimizing the number of nodes we need to visit.
Insertions
The way your table's data is arranged in a B+tree depends on the key you choose.This means your choice of PRIMARY KEY will impact the layout on disk of all of the data in the table, and in turn performance.Choose your PRIMARY KEY wisely!
Two common choices for a primary key are:
An integer sequence (such as BIGINT UNSIGNED AUTO_INCREMENT)
A UUID, of which there are many versions.
Let's first consider the consequences of using a UUIDv4 primary key.A UUIDv4 is a mostly-random 128 bit integer.
We can simulate this by inserting a bunch of random integers into our B+tree visualization.On each insertion, all of the visited nodes will be highlighted green.You can also control the percentage of keys to keep in the existing node when a split occurs.Give it a try by clicking the Add random button several times.What do you notice?
A few observations:
The nodes visited for an insert are unpredictable ahead of time.
The destination leaf node for an insert is unpredictable.
The values in the leaves are not in order.
Issues 1 and 2 are problematic because over the course of many insertions we'll have to visit many of the nodes (pages) in the tree.This excessive reading and writing leads to poor performance.Issue 3 is problematic if we intend to ever search for or view our data in the order it was inserted.
The same problem can arise (albeit in a less extreme way) with some other UUIDs as well.For example, UUID v3 and v5 are both generated via hashing, and therefore will not be sequential and have similar behavior to inserting randomly.Alternatively, UUIDv7 actually does a good job of overcoming some of these challenges.
Let's consider using a sequential BIGINT UNSIGNED AUTO_INCREMENT as our primary key instead.Try inserting sequential values into the B+tree instead:
This mitigates all of the aforementioned problems:
We always follow the right-most path when inserting new values.
Leaves only get added on the right side of the tree.
At the leaf level, data is in sorted order based on when it was inserted.
Because of 1 and 2, many insertions happening in sequence will revisit the same path of pages, leading to fewer I/O requests when inserting a lot of key/value pairs.
The bar chart below shows the number of unique nodes visited for the previous 5 inserts on the two B+trees above.Assuming trees of the same depth, you should see random one being slightly higher, meaning worse performance.
If you are curious about the effect of the split percentage on sequential vs random insert patterns, check out the interactive visualization below.Use the slider to set the split percentage.The line graph will update to show how many nodes needed to be visited for the prior 5 at various points in a 400-key insertion sequence.Notice that in most cases, the sequential inserts require much fewer node visits than random inserts, and are also more predictable.
Data order
It's common to search for data from a database in time-sequenced order.Consider viewing the timeline on X, or a chat history in Slack.We typically want to see the posts and chat messages in time (or reverse-time) sequences.This means we'll often read chunks of database that are "near" each other in time.These queries take the form:SELECT username, message_text, ...
FROM post
  WHERE sent > $START_DATETIME
  AND sent < $END_DATETIME
  ORDER BY sent DESC;

Consider what this would be like if we have UUIDv4s for our primary key.In the B+tree below, a bunch of random keys and corresponding values have been inserted into the table.Try finding ranges of values.What do you see?
Notice that the value sequences are spread out across many non-sequential leaf nodes.On the other hand, consider finding sequentially inserted values instead.
In such cases, all pages with the search results will be next to each other.It's even possible to search for several rows, and all of them will be next to each other in a single page.For this variety of query pattern, we can mitigate the number of pages that need to be read using a sequential primary key.
Primary key size
Another important consideration is key size.We always want our primary keys to be:
Big enough to never face exhaustion
Small enough to not use excessive storage
For integer sequences, we can sometimes get away with a MEDIUMINT (16 million unique values) or INT (4 billion unique values) for smaller tables.For big tables, we often jump to BIGINT to be safe (18 sextillion possible values).BIGINTs are 64 bits (8 bytes).UUIDs are typically 128 bits (16 bytes), twice the size of even the largest integer type in MySQL.Since B+tree nodes are a fixed size, a BIGINT will allow us to fit more keys per-node than UUIDs.This results in shallower trees and faster lookups.
Consider a case where each tree node is only 100 bytes, child pointers are 8 bytes, and values are 8 bytes.We could fit 4 UUIDs (plus 4 child pointers) in each node.Hit the play insertion sequence button below to see the inserts.
If we had used a BIGINT instead, we could fit 6 keys (and corresponding child pointers) in each node instead.This would lead to a shallower tree, better for performance.
Pages and InnoDB
Recall that one of the big benefits of a B+tree is the fact that we can set the node size to whatever we want.In InnoDB, the B+tree nodes are typically set to 16k, the size of an InnoDB page.
When fulfilling a query (and therefore traversing B+trees), InnoDB does not read individual rows and columns from disk.Whenever it needs to access a piece of data, it loads the entire associated page from disk.
InnoDB has some tricks up its sleeve to mitigate this, the main one being the buffer pool.The buffer pool is an in-memory cache for InnoDB pages, sitting between the pages on-disk and MySQL query execution.When MySQL needs to read a page, it first checks if it's already in the buffer pool.If so, it reads it from there, skipping the disk I/O operation.If not, it finds the page on-disk, adds it to the buffer pool, and then continues query execution.

The buffer pool drastically helps query performance.Without it, we'd end up doing significantly more disk I/O operations to handle a query workload.Even with the buffer pool, minimizing the number of pages that need to be visited helps performance (1) because there's still a (small) cost to looking up a page in the buffer pool, and (2) it helps reduce the number of buffer pool loads and evictions that need to take place.
Other situations
Here, we mostly focused on comparing a sequential key to a random / UUID key.However, the principles shown here are useful to keep in mind no matter what kind of primary or secondary key you are considering.
For example, you may also consider using a user.created_at timestamp as a key for an index.This will have similar properties to a sequential integer.Insertions will generally always go to the right-most path, unless legacy data is being inserted.
Conversely, something like a user.email_address string will have more similar characteristics to a random key.Users won't be creating accounts in email-alphabetical order, so insertions will happen all over the place in the B+tree.
Conclusion
This is already a long blog post, and yet, much more could be said about B+trees, indexes, and primary key choice in MySQL.On the surface it may seem simple, but there's an incredible amount of nuance to consider if you want to squeeze every ounce of performance out of your database.If you'd like to experiment further, you can visit the dedicated interactive B+tree website.If you want a regular B-tree, go here instead.I hope you learned a thing or two about indexes!
Special thanks to Sam Rose for early review.]]></content>
        <summary><![CDATA[B-trees are used by many modern DBMSs. Learn how they work, how databases use them, and how your choice of primary key can affect index performance.]]></summary>
      </entry>
    
      <entry>
        <title>Instant deploy requests</title>
        <link href="https://planetscale.com/blog/instant-deploy-requests" />
        <id>https://planetscale.com/blog/instant-deploy-requests</id>
        <published>2024-09-04T16:01:00.000Z</published>
        <updated>2024-09-04T16:01:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[PlanetScale now offers instant deployments in eligible deploy requests, cutting down schema deployment runtime to near-instant. This option is opt-in per deploy request, and we continue to offer Online DDL as the default and safest schema change strategy.
To qualify for instant deployment, a deploy request must consist of changes all of which can be fulfilled instantly:
One or more ALTER TABLE statement that qualifies for INSTANT DDL in MySQL.
Plus, optionally, creating or dropping any number of tables.
Plus, optionally, creating, modifying, or dropping any number of views.
To learn more about INSTANT DDL in MySQL, see our breakdown on The State of Online Schema Migrations in MySQL. In short, there is a limited set of changes for which MySQL supports the INSTANT algorithm, such as adding a new column, changing a column's default value, and more.
In some such cases, an Online DDL operation may take hours to deploy a schema change to a large tables, where an instant deployment may take just a few seconds. PlanetScale pre-evaluates whether a deployment is eligible for instant deployment and presents the user with a choice.

Instant deployments do come with some caveats:
They are not revertible.
Under some workloads, users may experience a multi-second (or more) lock on the migrated table.
For these reasons, PlanetScale continues to run Online DDL as the default strategy, and users are asked to make an explicit choice when opting for instant deployments.]]></content>
        <summary><![CDATA[PlanetScale now supports instant DDL. Where eligible, you can run deploy requests that complete near-instantly.]]></summary>
      </entry>
    
      <entry>
        <title>Anatomy of a Throttler, part 1</title>
        <link href="https://planetscale.com/blog/anatomy-of-a-throttler-part-1" />
        <id>https://planetscale.com/blog/anatomy-of-a-throttler-part-1</id>
        <published>2024-08-29T00:00:00.000Z</published>
        <updated>2024-08-29T00:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[A throttler is a service or component that pushes back against an incoming flow of requests to ensure the system has the capacity to handle all received requests without being overwhelmed. In this series of posts, we illustrate design considerations for a database system throttler, whose purpose is to keep the database system healthy overall. We discuss choice of metrics, granularity, behavior, impact, prioritization, and other topics.
Which requests do you throttle?
There are different approaches to throttling requests in a database. We focus on throttling asynchronous, batch, and massive operations that are not time critical. Examples could be ETLs, data imports, online DDL operations, mass purges of data, resharding, and so forth. The throttler will push back on those operations that can span minutes, hours, or days of operation. Other forms of throttling may push back on OLTP production traffic. This discussion applies equally to both.
By way of illustration, consider a job that needs to import 10 million rows into the database. Instead of attempting to apply all 10 million in one go, the job breaks down the task into much smaller subtasks: it will try to import (write) 100 rows at a time. Before any such import, it will request access from the throttler.
Some throttler implementations are collaborative, meaning they assume clients will respect their instructions. Others act as barriers between the app and the database. Either way, if the throttler indicates that the database is overloaded, the job should hold back for a period of time and then request access again. This process repeats until granted. Each subtask should be small enough so as not to single-handedly tank the database's serving capacity, while large enough to compensate for the added throttler overhead and to enable meaningful progress.
What does the throttler throttle on?
Some generic throttlers only allow a regulated rate of requests, in anticipation that the consuming job will be able to process them at some known, fixed rate. With databases, things are less clear. A database can only handle so many queries at any given point in time, or over some period of time. However, not all queries are created equal. The database capacity of serving queries depends on the scope of queries, any hot spots or cold spots in affected data, the state of the page cache, overlap or lack thereof of data served by queries, to name a few factors.
We therefore need to be able to determine: how do we consider our database to be "healthy"? How do we determine if it is being overwhelmed?
To do this, we look for metrics that define or predict service level objectives (SLO) for the database. But things are not always so simple. Let's start with a popular metric which is widely used as a throttling indicator, and see what's so special about that metric.
Replication lag
Replication is often used in database clusters, especially ones using a primary-secondary (aka leader-follower) architecture. Sometimes the replication is set to be asynchronous, and at other times group-communication based. In these scenarios replication lag is defined as the time passing between a write on the primary server and the time it is applied or made visible on the replica/secondary server.
In the MySQL world, replication lag is probably the single most used throttling indicator, as multiple third-party and community tools use it to push back against long-running jobs. This is for good reasons: it is easy to measure and it has clear impact on the product and the business. For example, in the case of a database failover, replication lag impacts the time it takes for a replica/standby server to be promoted and made available to receive write requests. Read-after-write can be simplified when replication lag is low, allowing secondary servers to serve some of the read traffic.
We can thus have a business limitation on the acceptable replication lag, which we can use in the throttler: below this lag, allow requests. Above this lag, push back.
Other database metrics: what's in a metric?
Another common metric in the MySQL world is the value of threads_running. On any given server, this is the number of concurrent, actively executing queries (not to be confused with concurrent open transactions, some of which could be idle in between queries). This metric is frequently seen on database dashboards, and is an indicator for the database load.
But, what's an acceptable value? Are 50 concurrent queries OK? Are 100 OK? Pick a number, and you'll soon find it doesn't hold water. Some values are acceptable in early morning, while others are just normal during peak traffic hours. As your product evolves and its adoption increases, so do the queries on your database. What was true 3 months ago is not true today. And again, not all queries are created equal.
What's different about this metric compared with replication lag is that it is much more of a symptom than an actual cause. If all of a sudden we see a sharp spike in active queries, this can indicate some possible causes: perhaps all are held by the commit queue, which for some reason stalls. Or, the queries happen to compete over a specific hotspot and wait on locks. Or, they don't, they all happen to compete on very different pages, none of which is in memory, and they all congest while waiting on the page cache, etc. So what is it exactly that we need to monitor? Is the metric itself useless?
Not necessarily. An experienced administrator may only need to take one look at this metric on the database dashboard to say "we're having an issue".
A closer examination: queues
Software relies heavily on queues. They are fundamental not only in software design but also in hardware access. Requests queue on network access. They queue on disk access. They queue on CPU access. They queue on locks.
Circling back to replication lag, much like concurrent queries, it is a symptom. E.g. disk I/O is saturated on the replica, hence the replica cannot keep up replaying the changelog, thereby accumulating lag. Or perhaps the lag is caused by slow network. Or both! Whatever the case is, what's interesting is that the replication mechanism itself is a queue: the changelog event queue. A new write on the primary manifests as a "write event", which is shipped to the replica, and waits to be consumed (processed, replayed) by the replica. Replication lag is the event's time spent in the queue, where our queue is a combination of the network queue, local disk write queue, actual wait time, and finally the event's processing time. Each of these can be the major contributor to the overall replication lag, and yet, we can still look at replication lag as a whole — as a clear indicator for database health.
Armed with this insight, we take another attempt at understanding other metrics. In the case for concurrent writes, we understand a major contributor to a spike in concurrent queries is their inability to complete. Normally, this means they're held back at commit time, i.e. they wait to be written to the transaction (redo) log. And that means they're in the transaction queue, and we can hence measure the transaction queue latency (aka queue delay).
But, what's a good threshold? Transaction commit delay is typically caused by disk write/flush time, and that changes dramatically across hardware. It is a matter of knowing your metrics. Yet again, an experienced administrator should know what values to expect. But now the values are more tightly bound to hardware and slightly less affected by the app.
Queue delay is not the only metric. Another common one is the queue length: the number of entries waiting in the queue. A long queue at the airport isn't in itself a bad thing, some queues move quite fast, and yet it's often a predictor to wait times. Where wait time is impossible or difficult to measure, queue length can be an alternative.
An operating system's Load Average metric evaluation includes the number of processes waiting for CPU time. This changes by the number of CPUs available. A common rough indicator is a 1 threshold for (load average)/(num CPUs). This is again a metric that must agree with your own systems. Some database deployments famously push their servers to their limits with load averages soaring far above 1 per CPU.
Pool usage
Another indicator is pool usage. The single most common pool with regard to databases must be the application's database connection pool. To run a query, the app will take a connection from the pool, use it to execute the query, then return the connection to the pool. If the pool has connections to spare, getting that connection comes at no cost. But if the pool is exhausted, then either the app needs to wait for the creation of a new connection, or it gets rejected. Similarly to concurrent queries, a high pool usage indicates a congestion of operations. However, pooled connections can be used across multiple queries in a transaction, as well as across multiple transactions, and the app may run its own logic in between running queries, while still holding on to the connection.
An exhausted pool is a strong indication of excessive load, while the difference between a 60% and an 80% used pool is not as clear an indication. Taking a step back, what does it mean that we exhaust some pool? Who decides the size of the pool in the first place? If someone picked a number such as 50 or 100, isn't that number just artificial?
It may well be, but pool size was likely chosen for some good reason(s). It is perhaps derived from some database configuration, which is itself derived from some hardware limitation. And while the choice of metric could possibly change arbitrarily, it is still sensible, as far as throttling goes, to push back when the pool is exhausted. The throttler thereby relies on the greater system configuration and does not introduce any new artificial thresholds.
The case for multiple metrics
A throttler should be able to push back based on a combination of metrics, and not limit itself to just one metric. We've illustrated some metrics above, and every environment may yet have its own load predicting metrics. The administrator should be able to choose an assorted set of metrics the throttler should work with, be able to set specific thresholds for each such metric, and possibly be able to introduce new metrics either programmatically or dynamically.
What does a throttled system look like?
Many software developers will be familiar with the next scenario: you have a multithreaded or otherwise highly concurrent app. There's a bug, likely a race condition or a synchronization issue, and you wish to find it. You choose to print informative debug messages to standard output, and hope to find the bug by examining the log. Alas, when you do so, the bug does not reproduce. Or it may manifest elsewhere.
In adding writes to standard output, you have introduced new locks. Your debug messages now compete over those locks, which in turn incurs different context switches.
Introducing a throttler into your infrastructure shows resemblances to this synchronization example. All of a sudden, there is less contention on the database, and certain apps that used to run just fine, exhibit contention/latency behavior. The appearance of a new job suddenly affects the progress of another. But where previously you could clearly analyze database queries to find the root cause, the database now tells you little to nothing. It's now down to the throttler to give you that information. But even the throttler is limited, because all the apps do is to check the throttler for health status. They do not yet actually do anything.
Let's say we throttle based on replication lag, and let's assume that we want to run an operation so massive that it is bound to drive replication lag high if let loose. With the throttler keeping it under control, though, the operation will only run small batches of subtasks. But an interesting behavior emerges: the operation will push replication lag up to the throttler's threshold, then back down, and push again. As we start the operation, we expect to see the replication lag graph jump up to the threshold value, and then more or less stabilize around that value, slightly higher and slightly lower, for the duration of the operation, which could be hours.
During that time, the operation will be granted access thousands of times or more, and will likewise also be rejected access thousands of times or more. That is how a healthy system looks with a throttler engaged. No matter how many more concurrent operations we run, we expect to contain replication lag at about the same slight offset above or below the threshold. More on this when we discuss granularity.
It is not uncommon for a system to run one or two operations for very long periods, which means what we consider as the throttling threshold (say, a 5sec replication lag) becomes the actual standard. Thankfully, not all operations and workloads are so aggressive that they necessarily push the metrics as high as their thresholds.
Check intervals and metric granularity
A throttler collects the metrics asynchronously from check requests, so that it has an immediate answer available upon request. The intervals at which the throttler collects metrics can have a significant effect on how the throttler is being put to use. Let's consider a case where the throttler collects a metric at a large interval, say every 5 seconds. The metric could be anything at all during those 5 seconds, but it is the specific sampling that takes place at the end of that period that counts.
Similarly, other metrics could have some granularity. Namely, replication lag can be measured in different methods, and the most common one is by deliberate injection of heartbeat events on the primary, and by capturing them on a replica. More on this in another post, but the intervals in which the heartbeat events are generated dictate the granularity or the accuracy of the measured lag. Let's assume we inject heartbeats at one second intervals, and we've just injected a heartbeat at precisely noon. Let's also assume we sample the metrics once per second, and we happen to make that sample at 12:00:00.995. The sample still reads 12:00:00.000 as this was our last injected metric. A client then checks the throttler at 12:00:01.990. By now there will have been a new metric value, but one which we have not sampled yet. The throttler responds by using its last sample that is almost, but not quite, one second old, and which in itself represents a metric that is now almost, but not quite, two seconds old.
Long heartbeat intervals and outdated information have negative impacts on both our system health as well as the throttler's utilization.
On one hand, it is possible that in the duration of the interval, we miss noticing a significant uptick in system load. We'd only find out about it a few seconds later, at which time the throttler would be engaged. However, by that time, the system performance already degrades. It will take a few seconds before it comes down to acceptable values. But then again, once we do catch that the metrics exceed their thresholds, and for the duration of the next interval, we reject all further requests. If the metrics do turn healthy sooner than that, that's a missed opportunity to make some progress. Thus, we degrade the database's operations capacity.
When multiple operations attempt to make progress all at once, all will be throttled while metrics are above threshold, and possibly all released at once when metrics return to low values, thus all pushing the metrics up at once.
Borrowing from the world of networking hardware, it is recommended that metric interval and granularity oversample the range of allowed thresholds. For example, if the acceptable replication lag is at 5 seconds, then it's best to have a heartbeat/sampling interval of 1-2 seconds.
Lower intervals and more accurate metrics reduce spikes and spread the workload more efficiently. That, too, comes at a cost, which we will discuss in a later post.
To be continued
In the next part of this series we will be looking into singular vs. distributed throttler design, as well as the impact the throttler itself may have on your environment.]]></content>
        <summary><![CDATA[Learn about some design considerations for implementing a database throttler.]]></summary>
      </entry>
    
      <entry>
        <title>Increase IOPS and throughput with sharding</title>
        <link href="https://planetscale.com/blog/increase-iops-and-throughput-with-sharding" />
        <id>https://planetscale.com/blog/increase-iops-and-throughput-with-sharding</id>
        <published>2024-08-19T00:00:00.000Z</published>
        <updated>2024-08-19T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When sizing resources for a database, we often focus on specs like vCPUs, RAM, and storage capacity.However, IOPS and throughput are crucial considerations for I/O intensive workloads like databases.
In this article, we'll look at how the IOPS and throughput requirements of a database affect cost and how sharding can help reduce this cost for large-scale workloads.
Before we jump into these details, let's first define what we mean by IOPS and throughput.
Since this article was written, we have released PlanetScale Metal.Metal databases give you unlimited IOPS and ultra low latency reads and writes.If you need a database with incredible IO performance, check out Metal.
What are IOPS?
IOPS is shorthand for Input/Output Operations Per Second.In other words, how many times per-second does the system perform a read or write operation on the underlying storage volume.
But what counts as a single operation?This depends on the cloud provider and storage system being used.
AWS EBS IOPS
In AWS land, EC2 instances and RDS databases are often attached to EBS (Elastic Block Storage) volumes.Even within EBS, the definition of a single IOP varies between storage classes, of which there are several options (gp2, gp3, io1, io2, st1, sc1).If we choose to use gp3, a single operation is measured as a one 64 KiB disk read or write.If you have a gp3 volume with the default 3000 IOPS provisioned, that means it can handle a maximum of 64 * 3000 = 192000 = 192 MiB/s of I/O.
There are a few other things to consider with EBS IOPS:
EBS volumes also allow you to bank unused IOPS, up to a fixed limit.These stored IOPS can be redeemed in the future, allowing the volume to burst up beyond the set IOPS limit for stretches of time.Once the bank is depleted, it will not be able to burst until more IOPS are accumulated.
Computing the total amount of bytes we can move to and from disk per second is not as simple as calculating number_of_iops * 64 KiB.This is due to the difference between how EBS handles sequential and random reads.For sequential reads, EBS will bundle requests together, allowing you to maximize IOPS.For random reads, each read counts as a full IOP, even if it is less than 64k.For example, a single random read of a 4k block from disk will count as a full 64k IOP.Some think that once you move to SSDs, your sequential vs random read patterns no longer matter.However, this applies to workloads on both HDDs and SSDs on EBS.
The more random reads you have, the less efficiently you'll use your IOPS.The more sequential reads, the better.
Throughput vs IOPS
Throughput is the total amount of data or requests than can move through a system over a defined span of time.Though throughput is related to IOPS, a volume's IOPS does not directly translate to a volume's throughput delivery at any given time.
Each EBS volume is given a set amount of IOPS.gp3 volumes are given a default value of 3000 IOPS per volume.More IOPS can be purchased for additional per-IOP-month cost, up to a 16,000 cap.Each gp3 volume is also given a default throughput limit of 125 MiB/s (again, with the ability to pay for more up to 1000 MiB/s).Based on the calculation from earlier, if we utilize our 3000 IOPS with perfectly sequential IO patterns, we could theoretically achieve 192 MiB/s.However, due to the default throughput limit of 125 MiB/s, we cannot actually reach this unless we purchase more throughput.

In practice, database workloads often have a mix of sequential and random disk IO operations.We rarely will be operating at maximum IOPS efficiency, so 3000 IOPS will pair acceptably with 125 MiB/s in many situations.
Database workloads
Databases are one of the most I/O intensive workloads we run in the cloud.For large databases, we may need to pay for extra throughput and IOPS to get the performance we want.There are a few options for this:
Use a general-purpose EBS type such as gp3, and then pay for additional IOPS and throughput.
Upgrade to AWS's provisioned IOPS SSD volume types such as io1 or io2.These volumes are significantly more expensive but also have higher maximum IOPS and throughput.Whereas gp3 has a max of 16000 IOPS and 1000 MiB/s, io2 has a max of 256,000 IOPS and 4000 MiB/s.

Provisioned IOPS volumes may be your only option if you are running a massive database with a single primary server.
Sharding for improved IOPS and throughput
Database sharding is an excellent technique to run huge databases efficiently, without needing to pay an EBS premium.In a sharded database, we spread out our data across many primaries.This also means that our IO and throughput requirements are distributed across these instances, allowing each to use a more affordable gp2 or gp3 EBS volume.
Baseline comparison
Lets look at some hard numbers to make this more clear.We'll start by looking at a small unsharded database, and then compare this to a larger database.All prices based on a us-east-1 database as of August 2024.Warning — math ahead!
Let's say we are in a situation where we need a cluster with one primary and two replicas, each with approximately 8 vCPUs, 32 GB of RAM, and 500 GB of storage.We'll compare three different configurations — two from RDS MySQL, two Aurora MySQL, and one on PlanetScale (powered by Vitess and MySQL).

As shown in the image above, the monthly cost comes out to the following for PlanetScale, RDS d class, RDS non d class, and the two Aurora I/O optimized variants, respectively: $1749.00, $2136.20, $1649.94, $1741.14, and $3369.78.Aurora is significantly more expensive, whereas the others are in a similar ballpark.
All PlanetScale Base plan databases are configured as multi-AZ clusters with one primary and two read-only replicas.On PlanetScale, we would need to spin up a PS-400 database with 500 gigabytes of storage to achieve our compute requirements.
For a comparable 3 node, multi-AZ database from RDS, we would need to choose a db.m6id.2xlarge (or similar) with one primary and two replicas to achieve comparable vCPU and RAM capabilities to a PS-400.The pricing information for this can be found on Amazon's pricing page.
It's important to note that for 3-node multi-AZ RDS clusters, RDS only only supports the d class variants.These come with attached NVMe SSD storage, in addition to the vCPUs and RAM.The attached SSDs are utilized for fast I/O, but appropriately-sized EBS volumes are also used for asynchronous data writing.
Though they don't technically support it, pricing for a hypothetical 3-node cluster using db.m6i.2xlarge instances (non-d class) are also included to make the comparison more apples-to-apples.PlanetScale is cheaper than the multi-az RDS instance with db.m6id.2xlarge instances, and slightly higher than the theoretical one with db.m6i.2xlarge instances.
Two Aurora I/O optimized configurations with one primary and two read-only nodes is included as well.Due to instance class restrictions, we cannot match the same specifications exactly.One is show with half the vCPUs and equal RAM, the other is shown with equal vCPUs and double the RAM.
These prices are assuming that the default IOPS values will be sufficient to run our database.For the RDS option, the storage price is based on using General Purpose SSD storage (gp3) without paying for additional IOPS or throughput.PlanetScale also uses EBS volumes.
Higher compute, IOPS, and throughput requirements
Over time, demand on a database can grow significantly.Say that one year in the future, we now require 8x the compute, storage, IOPS, and throughput capabilities.Our new requirements are the following for the primary and replicas:
64 vCPUs
256 GB RAM
4 TB of storage
24000 IOPS
1000 MiB/s peak throughput
Let's again consider four database configurations.Two with RDS MySQL, two with Aurora MySQL, and one with PlanetScale.
Using d class instances, we would need 3 db.m6id.16xlarge instances.We could accomplish this either with io1/io2, or with 4-volume striping of gp3.We'll use io1 for this example, which comes with both a per GB/mo price and a per IOPS/mo price.We do not need to pay extra to achieve 1000 MiB bandwidth, so long as we have the IOPS to support it.The total for this configuration comes out to $24,196.56/mo (detailed breakdown below).
If we were able to use the more affordable db.m6i.16xlarge instances instead, this would bring the cost down a few thousand dollars to $20,519.52/mo.
For Aurora I/O optimized, we pay for instances and storage.All necessary IOPS are included.
Lower costs with sharding
Using a platform like PlanetScale, we do not need to rely on having a single, huge instance to handle a workload as high as this.If the database has grown by 8x, we could instead use sharding, and spread out data and compute across 8 shards.

To shard, we'd need to choose a good sharding strategy that spreads the data and query load evenly across the shards.Each one of the shards is essentially a multi-AZ PS-400 cluster, with 8 vCPUs, 32 GB RAM, and ~500 GB of data stored.To compute the cost of this, we multiply the single-node cost by 8, giving a total of $13,992/moIn this situation, we do not need to pay extra for additional IOPS or dedicated io1 infrastructure.The IOPS and throughput demand is spread evenly across the 8 shards, allowing us to stick with a more affordable class of EBS volumes.In the pricing diagram, one additional PS-400 database is included for storing smaller, unsharded tables.

Notice that for the RDS instances, the pricing did not grow linearly.The cost jumped by 11-13x.For Aurora, the cost grew linearly, but the base costs were high to begin with.With sharding, we are afforded a linear growth rate and acceptable costs.This is a much better option for long term scalability.
Performance benchmarks are not included here.These four configurations would likely all have unique performance characteristics in production workloads.However, the purpose of this comparison was not to provide a detailed performance analysis, but rather to to demonstrate how pricing scales as I/O and other requirements increase.
Workload suitability
Each database has different characteristics.Some have large amounts of warm data, whereas for others it is small.Some workloads are spiky, where others are even.Some have large storage-to-compute ratios, and others are smaller.All of these factors will affect the IOPS and throughput needs of your database servers.If you are running a database where such requirements are high, you may find yourself in a situation where you need to pay significantly higher costs for this performance.
Sharding is an excellent alternative for these kinds of workloads.Spreading out your data also means spreading out IOPS and throughput demand, allowing you to get away with using more affordable volume types.Sharding also comes with a number of other benefits, including improved failure isolation, faster backups, and huge scalability.]]></content>
        <summary><![CDATA[For big databases, IOPS and throughput can become a bottleneck in database performance. Learn how sharding helps scale out IOPS and throughput beyond the limitations of a single server.]]></summary>
      </entry>
    
      <entry>
        <title>Tracking index usage with Insights</title>
        <link href="https://planetscale.com/blog/tracking-index-usage-with-insights" />
        <id>https://planetscale.com/blog/tracking-index-usage-with-insights</id>
        <published>2024-08-14T00:00:00.000Z</published>
        <updated>2024-08-14T00:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[For relational databases, creating and maintaining the right indexes for your workload is critical to ensuring good performance. Unfortunately, creating good indexes isn’t a one-time activity performed only when adding new tables. Tables grow, schemas evolve, and, most importantly, the queries sent by your application change over time. To ensure your application’s queries are indexed correctly, it’s important to be able to observe which indexes are (or are not) actually being used.
To help with this, we’ve added a new capability to PlanetScale Insights: index usage tracking. With this feature, it’s easy to see which indexes your queries are using and how that usage is changing over time.
Let’s dive right in.
Feature walkthrough
To see the index usage graph for a query pattern, go to your insights dashboard, click on a SELECT query, then click on the new Indexes tab in the upper right hand corner. Here’s an example of what the index usage page looks like.

The main graph shows a time series of the percentage of queries that use each index. The bar graph at the bottom shows cumulative usage over the entire time span. In the example above we see that MySQL is selecting from one of three indexes each time it executes the query pattern.
This view shows us which indexes a given query is using, but we can also ask the question in reverse: which queries are using a given index? To answer this, go to the main Insights page, and add index:$TABLE_NAME.$INDEX_NAME as a search term. For example:

Note: index information is only reported for SELECT queries, so it’s important to independently verify that indexes aren’t being used in UPDATE or DELETE queries before removing an index.
We can also use insights to find a list of query patterns that use no indexes at all by entering indexed:false in the insights search box.

These index-related search terms can be combined with existing search terms. For example, to find a list of all unindexed queries that have been executed at least a thousand times in the last 24 hours and have a p50 response time over 250ms, you’d enter indexed:false query_count:>1000 p50:>250.
To learn more about how and why we implemented this feature, read on.
Existing tools and implementation
Before building a new system to monitor index usage, we evaluated the available tools for monitoring and understanding index usage. We’ll explore these tools first to motivate the decisions we made in designing Insights usage tracking.
The first tool most developers reach for when trying to understand index usage is explain, and for good reason. Explain is an incredibly powerful tool that exposes a wealth of information about how MySQL executes your query, including index usage. It’s great for troubleshooting a problematic query or testing out a new index. Unfortunately, explain can only provide information for queries you explicitly provide. It doesn’t record information for the actual queries processed by MySQL, so it can’t show how a query pattern is using indexes over time, across shards, or with the different query parameters from your production workload.
For aggregate index usage from your production environment, MySQL’s built in performance schema provides counters for how many times each index has been used in the table_io_waits_summary_by_index_usage table. This is a useful facility, but comes with a number of limitations that make it difficult to use in practice:
Stats are in the form of global counts for each MySQL server, and are reset when MySQL is restarted. This means you can’t see usage trends over time, and counts may be reset at any time.
Counters are only provided at the index level, so it’s not clear which query patterns are using which indexes.
To make it easy to understand index usage patterns, we wanted a system that:
Breaks down index usage information per query pattern.
Stores index usage as a timeseries, so it’s obvious when something has changed.
Provides cumulative data for all your queries, without sampling or extrapolating based on explain plans.
With these goals in mind, our first task was to extract query index usage information from MySQL. Since PlanetScale databases exclusively use the InnoDB storage engine, we were able to focus our efforts there. The InnoDB storage handler includes an index-initialization function that MySQL calls (once) prior to using an index in a query. By recording the index name passed to this function in a per query data structure, we’re able to find the set of all indexes used by each query. When the query is finished we return the list of used indexes in the final packet returned by MySQL to the client, and ultimately to VTGate, Vitess’s query proxying layer.
With the per-query index information in VTGate, we aggregate index usage information per query-pattern and send it into the Insights pipeline every 15 seconds. This approach allows us to aggregate the time series count of indexes used for 100% of queries with negligible overhead in MySQL.
Try it out now
Index usage information is available on all PlanetScale databases. We’ve found this feature useful in managing our own databases and we hope you do too.]]></content>
        <summary><![CDATA[Learn about the new PlanetScale Insights index tracking feature.]]></summary>
      </entry>
    
      <entry>
        <title>Zero downtime migrations at petabyte scale</title>
        <link href="https://planetscale.com/blog/zero-downtime-migrations-at-petabyte-scale" />
        <id>https://planetscale.com/blog/zero-downtime-migrations-at-petabyte-scale</id>
        <published>2024-08-13T00:00:00.000Z</published>
        <updated>2024-08-13T00:00:00.000Z</updated>
        
        <author>
          <name>Matt Lord</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Scaling challenges, frequent maintenance tasks, constant fear of when it's going to all fall over, and quickly rising costs — these are all common issues that teams will put up with just to avoid the task of migrating a database that has outgrown its home.
And with good reason. There is a lot of risk in moving from one database server or system to another: extended downtime, data loss, version incompatibilities, and errors no one could have even predicted.
But what if I told you it doesn't have to be this difficult? At PlanetScale, we regularly migrate databases in the terabyte and even petabyte range to our platform — all without any downtime. Our process allows us to do heavy testing with production traffic prior to cutover and even reverse the migration back to the external database if needed.
To give you a sense of the scale involved, you can see the size of one of the larger data sets — after migrating only a subset of the databases to date — that originated from migrations here:

In this article, we'll take a look under the hood to show you how our migration process works and what allows us to handle huge data migrations with no downtime.
Data migrations
Before we get into the specifics, let's cover the high level problem and lay out the relevant terminology so that we can better understand and appreciate the details.
A data migration is the process of moving data from one database server or system to another — with the goal of transferring the responsibility for housingand serving this data from the old system to the new at a future point in time, often called the point of cutover. Each database vendor and system will havesome paved roads for this process. For example MySQL has mysqldump andmysqlimport while PostgreSQL has pg_dump andpg_restore for logical data copies. And both products have methods for setting up replication from theold to the new system if the new system is of the same type as the old (e.g. MySQL to MySQL). Common migration examples being upgrading across several majorversions of the same database system (e.g. PostgreSQL 13 to 16), or moving from one database system to another (e.g. PostgreSQL to MySQL or vice versa).
How data migrations work
At a high level, this is the general flow for data migrations:
Take a snapshot of the data in the old system.
This can be done with a logical backup or a physical (volume/filesystem/block device) snapshot, with a logical backup being decoupled from the storage and offering the ability to migrate across different databases and storage types.
Restore the snapshot to the new system.
Either replaying the SQL statements in the logical backup or restoring the physical snapshot to the new system
Verify the new system.
Perform at least some basic checks to ensure the new system is able to read and serve the data we expect to see.
Cutover application traffic to the new system.
Have all applicable applications start sending read and write traffic to the new system.
Decommission the old system.
Once you're satisfied with the new system behavior, you can completely shut down and optionally archive the old system.
The simplest of processes would incur extensive downtime — and at least write unavailability — which covers all of these steps so that the system is read-only from the time that step 1 starts to the timethat step 4 is complete. That can take hours, days, or even weeks. So this is not always acceptable for production systems, especially those of considerablesize and which are expected to be available 24/7. This also offers no way to revert the cutover step if for some reason things go poorly afterward — e.g.some of your common queries may now throw errors or perform much worse than before.
An ideal process would allow us to perform all of these steps without any downtime and offer the ability to revert the cutover step if necessary — i.e. tosupport cutting over traffic in both directions until we're satisfied with the end state (and only at some point after that would we decommission the old system).The process which PlanetScale uses has been very intentionally designed to meet these criteria! Now that we've laid the groundwork, we can start to dive into the specifics.
Data migrations at PlanetScale
The process of migrating data at PlanetScale is done this way at a higher level (you can see user guides in the PlanetScale documentation, forexample, the AWS RDS migration guide):
Take a consistent non-locking snapshot of the data which includes the metadata needed to replicate changes that happen after our snapshot was taken. We will be replicating changes throughout the migration process.
This is critical as it means that the availability of your database and the applications using it are not affected by the migration process — thus no downtime.
Once the data has been copied to the new system, we continue to replicate changes from the old system to the new system so that the new system is ready for cutover at any point.
We replicate changes from the old database instance to the PlanetScale database in order to keep our consistent snapshot from step 1 up to date and in sync with the original system — thus no downtime.
Run a VDiff to verify that all of the data has been correctly copied and that the new system is in sync with the old, identifying any discrepancies that need to be addressed before the cutover.
This is a critical step that ensures nothing went wrong during the migration process and all of the data was copied correctly and the systems are in sync — providing the confidence to move on with the eventual cutover.
At some point between steps 1 and 4, the application starts sending traffic to PlanetScale rather than directly to the old system. PlanetScale then continues to route traffic back to the old system until we're ready to cutover.
This can be done any time before we perform the cutover in step 6. This allows you to test how your application behaves when PlanetScale is its target, and most importantly, it allows the eventual cutover to be fast and transparent as PlanetScale manages the routing — thus no downtime.
You will typically want to take this step as you get closer to the potential cutover point. That is because at this pre-cutover stage your query route is Application->PlanetScale->OldDB with the query response following that same path in reverse. This will incur a significant performance cost — all the more so if the application was previously connecting to the original database using a fast local connection (unix domain socket, loopback device, etc). It's for this same reason that you should not attempt to do application side performance comparisions using PlanetScale, comparing it to the usage of your original database, until you've done the cutover. This is because in this temporary pre-cutover stage the bulk of the total execution time will be spent in network round trips for well optimized queries.
Please note, however, that when using PlanetScale Managed — as most customers at this scale would be doing — the additional network cost would be minimal as the application and PlanetScale database would typically be in the same cloud vendor account and sharing the same physical locations (regions and availability zones) and physical networks in those locations.
You can remain in this state as long as necessary for you to prepare for the application cutover and perform additional testing of the application and database system.
Transparently cutover application traffic to the new system so that application traffic is going to PlanetScale.
During this process we ensure that there is no data loss or drift.
Incoming queries are paused/buffered while performing the system changes for the traffic switch.
Once the cutover is complete — which would typically take less than 1 second — the paused/buffered queries are executed and the system is back to normal operation.
Reverse replication is put in place so that if for any reason we need to revert the cutover, we can do so without data loss or downtime (this can be done back and forth as many times as necessary).
The query buffering done here is the last part that allows the entire migration to be done without any downtime.
Whenever you are confident in the system and no longer require the need to cut traffic back over to your old system, you can finish or complete the migration.
This is the final step in the migration process and is the point at which you can decommission your old system whenever you like.
At no point during this process is the old system down or are application users aware that anything out of the ordinary is occurring. The only thing one would typically notice is the slight spike in query latencyat the point of cutover where we briefly pause the incoming queries so that we can route them to the correct side of the migration (old vs new) once the cutover work is done.
When you've reached a certain scale — where a database is larger than 250GiB being the recommendation in Vitess — horizontally sharding your databaseis the best way to continue scaling without incurring exponentially more expensive hardware related costs and affecting the performance of queries and various operations (e.g. schema changes and backups). This is a key feature of Vitess and PlanetScale and it is typical to shard adatabase as part of the data migration. So e.g. you can have an unsharded MySQL database that we then split into N shards as part of the data migration into PlanetScale. In fact, being able to do this is a common reason for migrating to PlanetScale in the first place.
A deep dive into the technical details
PlanetScale is a database-as-a-service offering a "modern MySQL" developer experience, built around Vitess. Vitess offers a suite of related tools and primitives related to data migrationscalled VReplication.
The PlanetScale feature built around that set of Vitess primitives for imports is called (not surprisingly) Database Imports. This feature provides the web and CLI based user interfaces aroundVReplication's MoveTables workflow for the consistent data copy (streaming rows) and the replication (streaming binary log events).
If we walk through the process of a data migration at PlanetScale again at a lower level (the steps are somewhat simplified here with certain details and optional behaviors left out, but it covers the main points):
Copy the existing data by taking a consistent non-locking snapshot of the data which includes the metadata needed to replicate changes that happen after our snapshot was taken — which we will be doing throughout the migration process.
a. Each of the N shards in the PlanetScale database cluster has a PRIMARY tablet. Those connect to the unmanaged tablet that is placed in front of theexternal MySQL instance (the old system) and initiates a stream where we issue a LOCK TABLE <tbl> READ query to make it read-only just long enough to issue START TRANSACTION WITH CONSISTENT SNAPSHOTand read the @@global.GTID_EXECUTED value, then releasing the lock. At this point we have a consistent snapshot of the table data and the GTID set or "position"metadata to go along with it so that we can replicate changes to the table that have occurred since our snapshot was taken.
b. We then start reading all of the rows from our snapshot — at a logical point in time — ordering the results by the PRIMARY KEY (PK) columns in the table (if there are none then we will use the best PK equivalent, meaning non-null unique key) so that we can read from theclustered index immediately as we are then reading the records in order and do not need to formulate the entire result set and order it with a filesort before we canstart streaming rows. The source tablet then has N streams, where each stream is going to a target shard's PRIMARY tablet, and each stream filters out any rows from our query results that are not going to the stream'starget shard based on the sharding scheme defined for the table. The applicable rows are then sent to the target shard's PRIMARY tablet where they are inserted into the table and metadata for the stream is updated there(in the sidecar database's vreplication and copy_state tables) so that we can continue where we leftoff when the copy is interrupted for any reason.

The streams continue to copy rows until we've completed copying all of the rows in our snapshot or we hit the configured copy phase cycle duration(see Life of a Stream for more details) — in which case we will pause the row copy work and catch up on all of the changes that have happened to the rows we've copied so far by streaming the applicable binary log events. The binlogevents are filtered in each stream by the destination shard and whether or not the change is applicable to a row we've copied as otherwise we'll get a later version of the row in a subsequent copy phase cycle. This regular catchup step is important to ensure that we don't completethe row copy only to then be unable to replicate from where we left off because the source MySQL instance no longer has binary log events that we need as they have been purged — in which case we would be forced to start the entiremigration over again. This also happens to improve the performance of the overall operation as we are replicating the minimal events needed to ensure eventual consistency.
c. We do this table by table (serially), across all of the streams (the streams running concurrently), until we're done copying the initial table data.
As you can imagine, executing all of this work on your current live database instance can be somewhat heavy or expensive and potentially interfere with your live application traffic and its overall performance (and there is that brief window in step a where we take a read-only table level lock to get a GTIDset/position to go along with the consistent snapshot of the table). It's for this reason that we recommend you setup a standard MySQL replica — if you don't already have one — and use that as the source MySQL instance for the migration. This is another key factor that ensures we not only avoid downtime, butwe avoid any impact whatsoever on the live production system that is currently serving your application data.
Once the data has been copied to the new system, we continue to replicate changes from the old database instance to the PlanetScale database so that we are ready for the cutover at any point.
Again, there is a stream from the source (MySQL and unmanaged tablet pair) to each target shard's PRIMARY tablet. For each stream, the tablet on the source will connect to the external MySQL instance and initiate aCOM_BINLOG_DUMP_GTID protocol command, providing the GTID executed snapshot that the migration/workflow has which corresponds to where we were when the copy phase completed (seeLife of a Stream for more details). Each replication stream then continues to filter those binlog events based on the sharding scheme and forward them to the target shard's PRIMARY tabletwhere the changes are applied and persistent metadata is stored (in the sidecar database's vreplication table) to record the associated GTID set/position so thatno matter what happens, we can restart our work and pick up where we left off. We continue to do this until we're ready to cutover.
Run a VDiff to verify that all of the data has been correctly copied and that the new system is in sync with the old, identifying any discrepancies that need to be addressed before the cutover (see Introducing VDiff V2for more details) — also done without incurring any downtime.
Each table in the workflow is diffed serially before the VDiff is complete. So all steps below are done for each table in the workflow.
a. We first get a named lock on the workflow in the target keyspace — in the topology server — to prevent any concurrent changes to the workflow while we are initializing the VDiff as we will be manipulating the workflow to stop it, update it, and then restart it.Once we have the named lock on the workflow we stop the workflow for the table diff initialization done in steps a through e.
b. We then connect to the source (MySQL and unmanaged tablet pair) and initiate a consistent snapshot there to use for the comparision in the same way that we did in step 1a for the data copy. At this point we have the snapshotwe need on the source side.
c. We then use the GTID position/snapshot from step b to start the stream on each target PRIMARY tablet until it has reached that given position and then it stops (this is the same thing a standard MySQL instance does for START REPLICA UNTIL). On eachtarget shard we then setup a consistent snapshot just as we did on the source side. Now we have a consistent snapshot of the table on the source instance and each target shard that we can use for the data comparison.
d. Now we restart the VReplication workflow so that it can continue replicating changes from the source instance to the target shards.
e. At this point, we are done manipulating the workflow for the table diff, and we release the named lock on the workflow in the target keyspace taken in step a.
f. We then execute a full table scan on the source instance and all target shards, comparing the streamed results as we go along, noting any discrepancies as they are encountered (a row missing on either side or a row with different values) — the state of the diff being persisted in the sidecar database's VDiff tables on each target shard.
You can follow the progress as it goes, which includes an ETA, and when it's done you can see a detailed report which notes if any discrepancies were found and providing details on what those differences were — allowing you to address them before the cutover (see theVDiff show command).
The VDiff will choose REPLICA tablets by default on the source and target, for the data streaming (the work is still orchestrated by and the state still stored on the target PRIMARY tablets), to prevent any impact on the live production system. The VDiff is also fault-tolerant — it will automatically pickup where it left off if any error is encountered — and it can be done in an incremental fashion so that if e.g. you are in the pre-cutover state for many weeks or even months, you can run an initial VDiff, andthen resume that one as you get closer to the cutover point.
While it is not required that this step is taken, it is highly recommended that at least one VDiff is run before the cutover to ensure that the data has been copied correctly and that the new system is in sync with the old.
At some point between steps 1 and 4, the application starts sending traffic to PlanetScale rather than directly to their old system. PlanetScale then continues to route traffic back to the old system until we're ready to cutover.
Schema routing rules are put in place so that during the migration, queries against the tables being migrated will be routed to the correct destination — the external MySQL instance (old system) or the PlanetScale database (new system) depending on where we are in the migration process. When the migration starts, these rules ensure that all queries are sent to the source keyspace (old system) and they are updated accordingly when traffic is cutover along with if and when the cutover is reversed.
You can remain in this state as long as necessary as you prepare for the application cutover and perform additional testing of the application and database system.
Transparently cutover application traffic to the new system as application traffic is going through PlanetScale and PlanetScale will now route traffic to the new internal system rather than the old external one.
a. Under the hood, the MoveTables SwitchTraffic command is executed for the migration workflow.
b. It will first do some pre-checks to ensure that the traffic switch should succeed, such as checking the overall health of the tablets involved, the replication lag for the workflow (as the workflow has to fully catch up with the source before we can do the traffic switch and there's a timeout for that since this should be abrief period of time as the queries are being buffered), and other necessary state across the cluster. If everything looks good then we will proceed with the actual traffic switch.
c. We ensure that there are viable PRIMARY tablets in the source keyspace necessary to setup the reverse VReplication workflow which we will put in place when the traffic switch is complete so that the old system continues to stay in sync with the new and we can cut the traffic backover to the old system if needed for any reason. This offers even more flexibility and confidence as if any unexpected errors or performance issues occur (keep in mind that you may be going from one MySQL, or even MariaDB, version to another and from an unsharded database to a sharded one) then you can quicklyrevert the cutover and investigate the issue. Then once the issues are addressed you can attempt to cut the traffic back over to the new system again — without the pressure of the system being down or needing to complete the final cutover by anyparticular time.
d. We take a lock on the source and target keyspace in the topology server to prevent concurrent changes to these keyspaces in the cluster, along with a named lock on the workflow in the target keyspace to prevent concurrent changesto the workflow itself.
e. We stop writes on the source keyspace and begin buffering the incoming queries (see VTGate Buffering for more details) so that they can be executed on the target keyspace once the traffic switch is complete.
f. We wait for replication in the workflow to fully catch up so that the target keyspace has every write performed against the source and nothing is lost.
g. We create a reverse VReplication workflow that will replicate changes from the target keyspace (new system) back to the source keyspace (old system). This is the workflow that ensures that the old system is kept in sync with writes to the new system in case we need to revert the cutover for any reason (using theMoveTables ReverseTraffic command).
h. We initialize any Vitess Sequences that are being used in the target keyspace. This is done to seamlessly replace auto_increment usage, when the tables are being sharded as part of the migration, to provide the same functionality of auto generating incrementing unique values in a sharded environment.
i. We allow writes to the target keyspace.
j. We update the schema routing rules so that any queries against the tables being migrated will now be routed to the target keyspace (new system).
k. We start the reverse VReplication workflow created in step g.
l. We mark the original VReplication workflow as Frozen so that it is hidden and cannot be manipulated but we retain that information and state.
m. We release the keyspace and named locks taken in step a.
You can remain in this state for as long as you like. It's only when you are 100% confident in the migration and no longer need the option of cutting traffic back over to the old system that you can proceed to complete the migration, which uses the MoveTables complete command,to clean up the workflow and all of its migration related artifacts that were put in place (such as the routing rules).
See How Traffic Is Switched for additional details.
All of this work is done in a fault-tolerant way. This means that anything can fail throughout this process and the system will be able to recover and continue where it left off. This is critical for data imports at a certain scale where things can take many hours, days, or even weeks to complete andthe likelihood of encountering some type of error — even an ephemeral network or connection related error across the fleet of processes involved in the migration — becomes increasingly likely.
Conclusion
Data migrations are a critical part of the lifecycle of any database system. They are sometimes necessary for upgrading to new versions of your existing database system, sharding your existing database system, or moving to an entirely new database system. You've likely been involved in past migrations thathave caused downtime or other issues and may be thinking about the next migration you need to do and how you can avoid those issues.
In walking through how we perform data migrations at PlanetScale we hope that you can see ways to improve your own data migrations and avoid various pitfalls and issues that can lead to undesirable outcomes. We're happy to help you with your next data migration — directly as a customer through our work, or indirectly as a member of ourshared database community through the sharing of information and practices as we've done here.
Happy migrations!]]></content>
        <summary><![CDATA[Data migrations are a critical part of the database lifecycle, and are sometimes necessary for version upgrades, sharding, or moving to a new platform. In many cases, migrations are painful and error-prone. In this article, we walk through how migrations are performed at PlanetScale, and offer advice on how to improve the migration experience.]]></summary>
      </entry>
    
      <entry>
        <title>Faster backups with sharding</title>
        <link href="https://planetscale.com/blog/faster-backups-with-sharding" />
        <id>https://planetscale.com/blog/faster-backups-with-sharding</id>
        <published>2024-07-30T00:00:00.000Z</published>
        <updated>2024-07-30T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Scaling a database presents many challenges, one of which is backups.When your database is small, backups can be taken quickly and frequently.As your database grows to into the terabytes, backups become more of a challenge.Taking a single backup can take many hours or even days, depending on hardware and network conditions.At PlanetScale, we make backing up huge databases both easy and fast, and this is accomplished with sharding.
With a PlanetScale database, you typically don't have to think too hard about the backup and restore process, as we handle all of the difficult parts for you.However, we know many are interested in learning more about what goes on "behind the curtains" of our backup process, and how sharding a database can drastically improve backup time.

How do we back up huge databases correctly, efficiently, and without having a significant negative impact on production workloads?This article aims to answer those questions.
How PlanetScale backups work
All PlanetScale databases are powered by Vitess which exists as a proxy, sharding, and coordination layer that sits between your application and the MySQL instances.The default architecture of an unsharded Base plan database on PlanetScale looks like this:

By default, all query traffic is handled by the primary MySQL instance.The replicas primarily exist for high availability, but can also be used for handling read queries if desired.
The default configuration is to have a backup taken every 12 hours.However, the schedule is configurable, and you can also manually trigger a new database backup.Backups can be viewed and scheduled from the Backups page in the PlanetScale app:

Our internal PlanetScale API routinely checks for pending scheduled backups and starts them when necessary.
If you are managing your own independent Vitess cluster, you can configure the backup engine and storage service to suit your needs.Internally, PlanetScale uses the builtin backup engine since it works well for our backup procedure.It will store the backups themselves in either Amazon S3 or Google GCS, depending on which cloud the database cluster resides in.
The following occurs on the PlanetScale platform for each backup:
The internal PlanetScale API initiates a backup request for a database, which includes backing up both the production branch and all development branches.
This request gets passed on to PlanetScale Singularity, our internal service for managing infrastructure.This spins up a new compute instance in the same cluster as the primaries and replicas.This instance will handle running VTBackup, which manages the backup process.
If a previous backup exists (the common case), this backup gets restored to the dedicated VTBackup instance.PlanetScale uses Vitess' builtin backup policy for these restore operations.They are retrieved from either Amazon S3 or Google GCS.Existing backups are encrypted at rest, so they are decrypted upon arrival.
After restoration is complete, VTBackup spins up a new instance of MySQL, running atop the backup that was just fetched from S3/GCS.
VTBackup then instructs the new MySQL instance to connect to the primary VTGate.It will request a checkpoint in time, and then all changes between the last backup and the checkpoint will be replicated.This is typically a very small % of the total database size.
After everything is caught up, the MySQL instance that was managing the catch-up replication for the backup is stopped.
The regular Vitess backup workflow is started, storing the new full backup to to Amazon S3 or Google GCS.
When working with a sharded database, each shard can complete steps 2-7 in parallel.This parallelization allows backups to be taken quickly, even for extremely large databases.We'll look at some real-life examples of this in the next section.
For step 5, we choose to connect to a primary to get caught up rather than a replica.Why is that?There's a minor trade-off between using the primary vs using a replica to do this catch-up replication.If taken from a primary, it will have the most up to date information.If taken from the replica, we avoid sending additional compute, I/O, and bandwidth demand to the primary server.

However, in our case, the primary is already performing replication to two other nodes.Also, unless it is the first backup, the primary does not need to send the full database contents to the backup server.It only needs to send what has changed since the last backup, ideally only 12 or 24 hours prior.Thus, having the backup server replicate from the primary is typically acceptable from a performance perspective.If this performance hit becomes an issue, backups can be scheduled to happen during lower traffic hours.
Backing up an unsharded database
Let's examine a backup of an unsharded database.This database is 161 Gigabytes, in an unsharded environment, running with one primary and two replicas.Each primary and replica is provisioned with 8 vCPUs and 32 GB of RAM.Here's a screenshot of a recent backup of this database:

The backup took 30 minutes and 40 seconds to complete.Recall that for each backup, PlanetScale must:
Fetch the previous backup from storage
Catch up this backup
Send an fresh backup to storage
Given this, we can approximate the average network throughput for the backup by computing:
(previous_backup_size + new_backup_size) / duration
In this case, the previous backup was 163 GB (some data must have been pruned since).Using the formula, we get 176 MB/s as the average network throughput during this backup.Though that is not extraordinarily fast, it's quite a reasonable number for a database of that size.Backups are scheduled once every 12 hours, so there is no issue with one backup overlapping with another.
The formula above is only to be used for rough approximations.It is not a precise calculation.It does not take into account data compression, catch-up replication, schema structure, throttling, and other factors that affect data size and backup speed.
Backing up a sharded database
Let's now consider a backup of a much larger 20 terabyte sharded database.20 terabytes is 124x the size of the previous 161 gigabyte database.Extrapolating from what we calculated before, we might think that backing up this database would take 63 hours.If true, it would be completely unacceptable for two reasons:
That's a long window of time for something to go wrong (poor network conditions, etc).
If our policy is to back up every 12 hours, backups would overlap or have to be delayed.Backups would either have to wait 2-3 days before beginning again, or multiple would have to be running simultaneously, bogging down the server and network resources.
Let's look at a recent backup from a real production 20 terabyte database running on PlanetScale:

Instead of the projected 63 hours, it took 1 hour, 39 minutes, and 4 seconds.How can this be?The answer is sharding.
This particular database is running on 32 shards, each of which has comparable resources to the unsharded 161 gigabyte database from earlier.In a sharded architecture, the data for the cluster is spread out across instances.If the data is distributed evenly, that means each shard contains approximately 625 gigabytes.Each of these can be backed up in parallel, managed and monitored by PlanetScale's infrastructure.
Using the same formula from before, this means the overall database backup operated at approximately 6.7 GB/s of throughput.That's a much more impressive number!However, each individual shard was operating at closer to 210 MB/s.This is only slightly faster transfer speed than our smaller example.Yet, due to the power of parallelization, we are able to back up the database quickly.
Any sharded database can benefit from this parallelization, whether you have 4 shards or 400.

For fun, let's take a look at one other recent backup of a large, sharded database hosted on PlanetScale:

This backup took a mere 3 hours, 37 minutes, and 11 seconds.This database is an order of magnitude larger than the last example and is running on 256 shards.Again using the same formula, the database backup operated at an average of 35 GB/s of throughput.Extremely fast!However, each shard is only responsible for ~900 gigabytes of data and has an average throughput of 137 MB/s throughout the backup.The performance comes from the power of parallelization, which a sharded database allows for.
Recovering from failure
The speed benefits that one gains from backing up in parallel with sharding also apply in reverse when performing a full database restore.All of the same parallelization can be used, each shard individually restoring the data it is responsible for.This allows the restoration of a massive database to take mere hours rather than days or weeks.Though full database restores should be a less common operation, it's good to know that it can be accomplished quickly in case of an emergency.
Why are backups important?
At first, this may seem like a question with an obvious answer: I want to back up my DB in case the server (or disk on the server) crashes!This is true in some cases, but the introduction of replication adds more nuance to this question.The two replicas already act as a form of "backups" of the primary, helping in the case of a primary server failure.If the primary goes down, there's no need to spin up a new primary directly from a backup.Instead, PlanetScale elects an existing replica to become the new primary.So if we're already "safe" from primary failure, what else are backups important for?
For one, backups are needed when creating a new replica.When the primary goes down, we promote a replica to a primary, but now we need a replacement replica!To do this, Vitess spins up a new empty replica server, restores a backup to that server, then points it to the primary to get caught-up via replication.Without the ability to restore from backup, the new replica would need to replicate the entire database from the primary.This would take a long time and have a negative impact on performance.Backups allow for new replica creation to happen with less negative performance impact on the primary.
Backups also provide a snapshot in time of the entire DB state.This is crucial for some situations.
As one example, we recently heard a story from one of our customers about how a backup was crucial for recovering deleted data.A customer on their platform accidentally deleted a bunch of information from their account, which in turn dropped many rows from their PlanetScale database.These changes go to the primary and are propagated to both replicas.The application also did not have a "soft-delete" feature, meaning that the data was really deleted, rather than just hidden.However, this data still existed in one of the recent backups, and thus was able to be restored.Accidental deletion of data can also be caused by application bugs or malicious attacks.Backups provide a safety net for these scenarios.
PlanetScale makes it simple to restore data from a backup.Backups can be restored to a development branch, and then browsed and cherry-picked as necessary.In fact, when a new replica is spun up after a primary failure, a backup is used to seed the new replica.Then, the regular replication flow is used to catch the replica up with all changes on the newly elected primary.
Backups are also needed to perform point in time recovery in Vitess.
Conclusion
Sharding has many benefits, and backup time is no exception.If backup frequency and overall time taken is important to you and your infrastructure team, using a sharded database is a great way to keep backups performant.]]></content>
        <summary><![CDATA[Sharding a database comes with many benefits: Scalability, failure isolation, write throughput, and more. However, one of the lesser-known benefits comes from improved backups and restore performance.]]></summary>
      </entry>
    
      <entry>
        <title>Building data pipelines with Vitess</title>
        <link href="https://planetscale.com/blog/building-data-pipelines-with-vitess" />
        <id>https://planetscale.com/blog/building-data-pipelines-with-vitess</id>
        <published>2024-07-29T15:00:00.000Z</published>
        <updated>2024-07-29T15:00:00.000Z</updated>
        
        <author>
          <name>Matt Lord</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Vitess is a popular CNCF project that is used to scale some of the largest MySQL installations in the world — by companies like Slack, Square, Shopify, and GitHub. It provides sharding,connection pooling, and many other features that make it easy to scale MySQL horizontally.
Vitess and MySQL are ideally suited for use as an Online Transaction Processing (OLTP) system — where the end-user interacts directly with the system and fast response times are essential as they get productand service information, generating critical business records such as orders, user profiles, and more. They are not optimized for Online Analytical Processing (OLAP) workloads and other use cases and needsthat you will encounter as your product, company, and data needs grow. This is where Change Data Capture (CDC), AKA ETL or Extract-Transform-Load, and Data Pipelinesmore generally come into play as they allow you to maintain in-sync copies of data across various systems that serve specific needs, with CDC being a technique used to track changes in a database and propagate them to other systems. This is useful for a variety of use cases,including data replication, data warehousing, and data integration. This allows you to e.g. maintain a Data Warehouse and/or Data Lakefor analytics and reporting purposes (e.g. quarterly and yearly sales reports), or to integrate data with other systems where you need to work with data that is initially created and updated by your OLTP system.
Vitess primitives
Vitess has a number of primitives or building blocks that make it easy to build your data pipelines. These are features of VReplication, a powerful system that allows for various types of data replication and transformation.For CDC and similar use cases, VReplication provides the VStream API in VTGates (Vitess Gateways) that allows you to stream changes from a Vitess cluster in real-time.
This low-level VStream primitive is then used by popular CDC tools like Debezium to capture changes in Vitess and propagate them to other systems.PlanetScale also uses the VStream API to build the Connect feature, using additional open source drivers for popular CDC/ETL services such as Airbyte(source) and Fivetran (source).
A look under the hood at VStream
VStream is a low-level component, provided via gRPC, that is used internally by VReplicationto replicate data within Vitess for various workflow types such as MoveTables and Reshard — typically from one VTTablet to another. TheVTGate VStream RPC leverages this low-level component to stream data from the Shards within a Vitess Keyspace,providing a single unified change stream spanning the logical database which may consist of hundreds or even thousands of shards. You can see a simple example client that uses the VStream API directly here.
This is what the output looks like, with commands that you can run yourself if you are interested in the lower-level aspects (not necessary if you're going to use an existing connector/driver such as the Debezium Connector for Vitess):git clone git@github.com:vitessio/vitess.git
cd vitess
git checkout main
make build
cd examples/local

./101_initial_cluster.sh; mysql < ../common/insert_commerce_data.sql; ./201_customer_tablets.sh; ./202_move_tables.sh; ./203_switch_reads.sh; ./204_switch_writes.sh; ./205_clean_commerce.sh; ./301_customer_sharded.sh; ./302_new_shards.sh; ./303_reshard.sh; ./304_switch_reads.sh; ./305_switch_writes.sh; ./306_down_shard_0.sh; ./307_delete_shard_0.sh

go run vstream_client.go

# In another terminal, connecting to the VTGate that was started
for i in {1..10}; do
  command mysql --no-defaults -h 127.0.0.1 -P 15306 customer -e "insert into customer (email) values ('${i}@foo.com')"
done

# Cleanup whenever you're done testing
./401_teardown.sh

The VStream client will output the changes that are being streamed from the VTGate that look like this — first snapshotting the current state of the customer table in the sharded customer keyspace, before then streaming the subsequent changes to the table as they happen in real-time:go run vstream_client.go
[type:BEGIN keyspace:"customer" shard:"80-" type:FIELD field_event:{table_name:"customer.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_customer" org_name:"customer_id" column_length:20 charset:63 flags:53251 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_customer" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} keyspace:"customer" shard:"80-" enum_set_string_values:true} keyspace:"customer" shard:"80-"]
[type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"80-"]
[type:ROW row_event:{table_name:"customer.customer" row_changes:{after:{lengths:1 lengths:14 values:"4dan@domain.com"}} keyspace:"customer" shard:"80-"} keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58" table_p_ks:{table_name:"customer" lastpk:{fields:{name:"customer_id" type:INT64 charset:63 flags:53251} rows:{lengths:1 values:"4"}}}}} keyspace:"customer" shard:"80-" type:COMMIT keyspace:"customer" shard:"80-"]
[type:BEGIN keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"80-" type:COMMIT keyspace:"customer" shard:"80-"]
[type:COPY_COMPLETED keyspace:"customer" shard:"80-"]
[type:BEGIN keyspace:"customer" shard:"-80" type:FIELD field_event:{table_name:"customer.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_customer" org_name:"customer_id" column_length:20 charset:63 flags:53251 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_customer" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} keyspace:"customer" shard:"-80" enum_set_string_values:true} keyspace:"customer" shard:"-80"]
[type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-58"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"-80"]
[type:ROW row_event:{table_name:"customer.customer" row_changes:{after:{lengths:1 lengths:16 values:"1alice@domain.com"}} keyspace:"customer" shard:"-80"} keyspace:"customer" shard:"-80" type:ROW row_event:{table_name:"customer.customer" row_changes:{after:{lengths:1 lengths:14 values:"2bob@domain.com"}} keyspace:"customer" shard:"-80"} keyspace:"customer" shard:"-80" type:ROW row_event:{table_name:"customer.customer" row_changes:{after:{lengths:1 lengths:18 values:"3charlie@domain.com"}} keyspace:"customer" shard:"-80"} keyspace:"customer" shard:"-80" type:ROW row_event:{table_name:"customer.customer" row_changes:{after:{lengths:1 lengths:14 values:"5eve@domain.com"}} keyspace:"customer" shard:"-80"} keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-58" table_p_ks:{table_name:"customer" lastpk:{fields:{name:"customer_id" type:INT64 charset:63 flags:53251} rows:{lengths:1 values:"5"}}}} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"-80" type:COMMIT keyspace:"customer" shard:"-80"]
[type:BEGIN keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-58"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"-80" type:COMMIT keyspace:"customer" shard:"-80"]
[type:COPY_COMPLETED keyspace:"customer" shard:"-80" type:COPY_COMPLETED]
[type:BEGIN timestamp:1720544456 current_time:1720544456166536000 keyspace:"customer" shard:"-80" type:FIELD timestamp:1720544456 field_event:{table_name:"customer.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_customer" org_name:"customer_id" column_length:20 charset:63 flags:53251 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_customer" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} keyspace:"customer" shard:"-80"} current_time:1720544456168488000 keyspace:"customer" shard:"-80" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10001@foo.com"}} keyspace:"customer" shard:"-80" flags:1} current_time:1720544456168646000 keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-59"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}} keyspace:"customer" shard:"-80" type:COMMIT timestamp:1720544456 current_time:1720544456168652000 keyspace:"customer" shard:"-80"]
[type:BEGIN timestamp:1720544456 current_time:1720544456182035000 keyspace:"customer" shard:"80-" type:FIELD timestamp:1720544456 field_event:{table_name:"customer.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_customer" org_name:"customer_id" column_length:20 charset:63 flags:53251 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_customer" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} keyspace:"customer" shard:"80-"} current_time:1720544456183630000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10012@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456183642000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-59"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-59"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456183649000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456197796000 keyspace:"customer" shard:"-80" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10023@foo.com"}} keyspace:"customer" shard:"-80" flags:1} current_time:1720544456197810000 keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-59"}} keyspace:"customer" shard:"-80" type:COMMIT timestamp:1720544456 current_time:1720544456197814000 keyspace:"customer" shard:"-80"]
[type:BEGIN timestamp:1720544456 current_time:1720544456211383000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10034@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456211392000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-60"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456211398000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456224248000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10045@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456224258000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-61"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456224261000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456237018000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10056@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456237029000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-62"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456237031000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456249777000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10067@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456250142000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-63"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456250150000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456263391000 keyspace:"customer" shard:"80-" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10078@foo.com"}} keyspace:"customer" shard:"80-" flags:1} current_time:1720544456263407000 keyspace:"customer" shard:"80-" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-60"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-64"}} keyspace:"customer" shard:"80-" type:COMMIT timestamp:1720544456 current_time:1720544456263411000 keyspace:"customer" shard:"80-"]
[type:BEGIN timestamp:1720544456 current_time:1720544456276388000 keyspace:"customer" shard:"-80" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:9 values:"10089@foo.com"}} keyspace:"customer" shard:"-80" flags:1} current_time:1720544456276398000 keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-61"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-64"}} keyspace:"customer" shard:"-80" type:COMMIT timestamp:1720544456 current_time:1720544456276402000 keyspace:"customer" shard:"-80"]
[type:BEGIN timestamp:1720544456 current_time:1720544456289697000 keyspace:"customer" shard:"-80" type:ROW timestamp:1720544456 row_event:{table_name:"customer.customer" row_changes:{after:{lengths:4 lengths:10 values:"100910@foo.com"}} keyspace:"customer" shard:"-80" flags:1} current_time:1720544456289711000 keyspace:"customer" shard:"-80" type:VGTID vgtid:{shard_gtids:{keyspace:"customer" shard:"-80" gtid:"MySQL56/90a3e2d2-3e14-11ef-bb33-30b3ef9417b6:1-62"} shard_gtids:{keyspace:"customer" shard:"80-" gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-64"}} keyspace:"customer" shard:"-80" type:COMMIT timestamp:1720544456 current_time:1720544456289714000 keyspace:"customer" shard:"-80"]

If you are interested in additional lower-level details, you can check out the VStream API documentation.
An example setup
You could use a similar setup to the one described here, but using the Debezium Connector for Vitess rather than theDebezium Connector for MySQL and an AWS RedShift instance rather than PostgreSQL as the target(with RedShift being based on PostgreSQL). This also demonstrates the general rule that in setting these kinds of systems up you would use a Vitess variant of theconnector/driver rather than the MySQL one — with things otherwise being the same.
Happy streaming!]]></content>
        <summary><![CDATA[Learn the basics of Change Data Capture (CDC) and how to leverage Vitess VStream API to build data pipelines.]]></summary>
      </entry>
    
      <entry>
        <title>The State of Online Schema Migrations in MySQL</title>
        <link href="https://planetscale.com/blog/state-of-online-schema-migrations-in-mysql" />
        <id>https://planetscale.com/blog/state-of-online-schema-migrations-in-mysql</id>
        <published>2024-07-23T00:00:00.000Z</published>
        <updated>2024-07-23T00:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[How do you run non-blocking schema changes in MySQL? This is an eternal question. With a plethora of 3rd party solutions and with recent advancements in MySQL, it's difficult to track which solution is preferable for a given schema migration. In this post, we provide a high level overview of the state of MySQL online schema migrations in 2024. We limit the discussion to ALTER TABLE statements, as other DDL statements are typically fast (DROP TABLE is somewhat of an exception, but out of scope of this post).
We'll first examine the native MySQL options: INPLACE and INSTANT. For reference, see Online DDL Operations MySQL 8.0 documentation.
INPLACE, aka InnoDB Online DDL
This is MySQL's first take on non-blocking schema changes. Some types of ALTER TABLE (see above link for exhaustive list of supported changes) are eligible to run with ALGORITHM=INPLACE. An INPLACE schema change is technically non-blocking, with quite a few caveats:
On the server where the query is submitted, normally the primary database server, DML queries (SELECT, INSERT, UPDATE, DELETE, ...) are non-blocking and may proceed to execute. Other DDL statements will block, and that's expected.
The operation is resource greedy: the MySQL server will use as much CPU and disk I/O to complete the change as it can. This can and will impact performance on busy servers.
It requires extra disk space, up to as much disk space as the original table.
It is uninterruptible. The only way to abort is to kill the query aggressively. This then leads to a further massive cleanup operation, consuming more disk I/O.
On replica servers, the operation is NOT non-blocking. Meaning if the ALTER TABLE took 3 hours on the primary server, then from the moment it completes you can expect replication to stall while applying that same change for the next 3 hours or so, creating a massive 3 hour lag.
The replication issue is a deal breaker for most. One way around it is to run the ALTER TABLE on the primary with SQL_LOG_BIN=0 so that it does not replicate. Then, run it similarly individually on each replica. This technique works, but can be the cause for inconsistencies. Did you track all servers? What if you subsequently restore (or bootstrap) a server from backup, where the change never took place? How do you track that? Moreover, this technique will take n times longer to complete, as you need to run the change individually for each server. You may parallelize some of the work, but probably not all of it.
INPLACE conclusions
For these reasons we find that INPLACE is not a good option for non-blocking changes.
INSTANT schema changes
Instant schema changes are almost a holy grail in the world of databases, and where it works, it's the next best thing after pizza (pending any bugs). MySQL offers support for some schema changes to run with ALGORITHM=INSTANT. Originally contributed to MySQL by Tencent six years ago, INSTANT DDL only supported a single type of change: ADD COLUMN. Later on MySQL added support for more changes, like expanding an enum column, or adding and dropping VIRTUAL columns. Recently (one year ago), MySQL 8.0.29 added support for arbitrary ADD COLUMN and DROP COLUMN support.
INSTANT truly runs instantly. It does not need to copy a table, does not need extra disk space, does not hammer the CPU. There's nothing to interrupt because the operation terminates before you've blinked. It also runs instantly on the replicas.
It sounds perfect! And it mostly is, where supported, and with a bit of nuance. Consider again the documentation for supported operations. Looking closely, you can see these types of supported changes:
Changing a column default value.
Adding/removing VIRTUAL columns.
Modify an enum definition.
And more.
What's shared to these changes is that they're all metadata changes. They do not affect existing rows, do not modify the data, do not restructure the table, do not affect indexes. ADD COLUMN & DROP COLUMN are the only supported changes that actually affect table data or how the data is structured. As another caveat, you cannot DROP COLUMN using INSTANT DDL if that column participates in an index.
So you can change a column's default value from 0 to 1, but you cannot make a nullable column non-nullable. You can add and drop GENERATED VIRTUAL columns, but cannot add and drop GENERATED STORED columns. You can modify an enum definition, but you cannot modify a column's type from int to bigint.
The list of unsupported changes includes:
Changing a column's data type.
Adding a column with non-literal default value.
Adding indexes.
Modifying a PRIMARY KEY definition.
Adding/removing foreign keys.
Changing a table's character set.
Making partitioning changes.
Which is to say, there's a long way to go before INSTANT DDL can satisfy the common needs of schema changes. Where possible, INSTANT DDL is wonderful, and in many situations is the preferable and recommended way to go.
INSTANT risks
INSTANT first appears to be risk-free. In most situations, it is! Even if you make a mistake, it can be corrected with a counter-INSTANT operation. That's true for most changes, except when data is destroyed. At this time, the one destructive statement support by INSTANT DDL is DROP COLUMN (for "real", non VIRTUAL columns).
Dropping a column has two main risks to it:
The obvious risk of losing important data, if executed prematurely or accidentally.
The risk of breaking existing queries.
Losing data can obviously be a massive incident, the cause for outage and for long hours or days of recovery. But why is this an INSTANT risk in particular? It's the same damage whether INSTANT or not, right?
The answer is with the human behavior of always choosing INSTANT where possible. You're an instant away from destroying your data, and with no barriers to hold you back, nor a mechanism (short of backups and delayed replicas) to take you back to safety. We'll discuss this shortly as we introduce the concept of Revertibility (specifically in Vitess).
How about breaking existing queries? Maybe the data is truly expendable, or safely aggregated elsewhere, but perhaps a bunch of SELECT or INSERT queries still reference the column?
MySQL offers invisible columns as a means to emulate how your table might look like without a given column. However, it is limited. It only affects queries that do not explicitly use the column name, such as SELECT * FROM my_table .... or INSERT INTO  my_table VALUE (...). But any SELECT the_column FROM my_table query still has full access to columns. In today's world, SELECT * and blind INSERT queries are not as common. Frameworks, tooling, and modern engineering paradigms all tend to be explicit and fully qualified. Invisible columns does not help here.
If dropping the column did cause queries to break, you will then need to either fix all the queries, or attempt to re-introduce the column. Let's now discuss revertibility.
Revertibility
What are your options for undoing a change? For switching back to the previous schema? Let's illustrate using two simple examples.
Say your change was to ALTER TABLE my_table ADD COLUMN name .... This looks harmless, and yet can cause downtime. name can be a common column name. Queries selecting name in a multi-table statement, such as SELECT name, value FROM my_table JOIN another_table USING ..., could fail due to the new ambiguity of name column.
The anti-change for ADD COLUMN is a DROP COLUMN, and since both are supported by INSTANT DDL, chances are you'll be able to recover quickly and relatively safely.
What if your change was a DROP COLUMN? Lost data aside, what is the anti-change you'd apply to restore the previous schema? Not only data was lost, but also metadata. What was the column type? Length? Was it nullable? That information cannot be inferred unless you have the previous schema. In all likelihood, you use version control to manage your schema and are thus able to extract the previous definition. It is worth pointing out, though, that crafting the anti-change of a schema migration is nontrivial.
INSTANT conclusions
Where possible, INSTANT is often the best approach for making online schema changes. However, it is too limited at this time and does not support the majority of common schema changes. It does not provide revertibility in case of data destruction. The MySQL team does not publish concrete plans for INSTANT DDL support in future versions of MySQL.
Solutions external to MySQL
A number of 3rd party tools is available today for running online schema changes for MySQL. We will focus on Vitess, the technology behind PlanetScale's non-blocking schema changes. Other 3rd party tools include gh-ost, pt-online-schema-change, recent newcomer spirit, and others.
These tools all share a similar basic design, but operate differently. The major characteristics share to all are:
They mimic an ALTER TABLE by creating a shadow table with the new schema and slowly copying over data.
They can and often will take longer time to complete as compared with a native MySQL ALTER TABLE.
They require extra disk space, about as much as the existing table (less if you consider fragmentation, more if you're adding bloated indexes, etc.)
They cause binary log bloating (essentially the entire table content goes through the binary logs).
They respect production workload, and will pause or throttle as needed so as to give way to production traffic (hence they're likely to run longer).
They operate in small batches of changes, hence are able to keep replication lag to a minimum (and throttle based on lag).
They are interruptible: the operation can be aborted at no immediate cost (cleanup can be done at a later stage).
They are capable of handling almost every single kind of schema change.
Most have foreign key limitations.
Some partitioning options are not recommended, or are plain incorrect to run using these tools.
To put it out of the way: if your table has a color enum('red','green','blue') column, and you want to add a new enum value, making it color enum('red','green','blue','orange'), you're better off using INSTANT DDL. There are a handful such cases, that are supported by INSTANT DDL as mentioned above, and where it just doesn't make sense to spend hours of migration.
However, for the (still vast) majority of changes, these are still the go-to solutions. First, of course, we've already established that neither INSTANT nor INPLACE cover all types of changes (they cover a minority of possible changes). But this also leads to an emerging behavior: maintaining two different techniques in your flow/automation creates more complexity. If you already have to use one of the 3rd party solutions, you may as well use it all the time.
Both vitess and spirit go an extra mile and can auto detect when a migration can be fulfilled using INSTANT DDL, which means you don't need to think about it or be aware of which particular version supports which changes.
vitess further supports revertibility as first class citizen, able to not only revert back to the original schema, but also to preserve the would-be lost data, while still accounting for any newly added, updated, or removed data since the change.
Note on partitioning
Partitioning is a strange beast, and implemented in MySQL by creating a "small" table per partition. As such, operations on partitions are really operations on sets of tables. Some partitioning related changes should only be served by MySQL. Such is a DROP PARTITION statement for e.g. RANGE partitioned table. Some other partitioning changes are better served by MySQL, and some are best served by online schema change tools.
3rd party conclusions
Most use cases are best served (or only well served) by 3rd party online schema change tools, and those are still the way to go for the foreseeable future.]]></content>
        <summary><![CDATA[Learn about the options for running non-blocking schema changes natively to MySQL, using Vitess, or other tools]]></summary>
      </entry>
    
      <entry>
        <title>Optimizing aggregation in the Vitess query planner</title>
        <link href="https://planetscale.com/blog/optimizing-aggregation-in-the-vitess-query-planner" />
        <id>https://planetscale.com/blog/optimizing-aggregation-in-the-vitess-query-planner</id>
        <published>2024-07-22T00:00:00.000Z</published>
        <updated>2024-07-22T00:00:00.000Z</updated>
        
        <author>
          <name>Andres Taylor</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Introduction
I recently encountered an intriguing bug.A user reported that their query was causing VTGate to fetch a large amount of data, sometimes resulting in an Out Of Memory (OOM) error.For a deeper understanding of grouping and aggregations on Vitess, I recommend reading this prior blog post.
The Query
The problematic query was:select sum(user.type)
from user
    join user_extra on user.team_id = user_extra.id
group by user_extra.id
order by user_extra.id;

The planner was unable to delegate aggregation to MySQL, leading to the fetching of a significant amount of data for aggregation.
Planning and Tree Rewriting
During the planning phase, we perform extensive tree rewriting to push as much work down under Routes as possible.This involves repeatedly rewriting the tree until no further changes occur during a full pass of the tree, a state known as the fixed-point.The goal of this rewriting process is to optimize query execution by pushing operations closer to the data.
Initial Plan
The first plan after horizon expansion looked like this:Ordering (user_extra.id)
└── Aggregator (ORG sum(`user`.type), user_extra.id group by user_extra.id)
    └── ApplyJoin on [`user`.team_id | :user_team_id = user_extra.id | `user`.team_id = user_extra.id]
        ├── Route (Scatter on user)
        │   └── Table (user.user)
        └── Route (Scatter on user)
            └── Filter (:user_team_id = user_extra.id)
                └── Table (user.user_extra)

Trying to Optimize the Plan
We don't split aggregation between MySQL and VTGate in the initial phases, so we couldn't immediately push down the aggregation through the join.However, we can push down ordering under the aggregation.
Pushing Ordering Under Aggregation
By pushing ordering under aggregation, the plan changes to:Aggregator (ORG sum(`user`.type), user_extra.id group by user_extra.id)
└── Ordering (user_extra.id)
    └── ApplyJoin on `user`.team_id = user_extra.id
...

We can't push the ordering further down since it's sorted by the right hand side of the join.Ordering can only be pushed down to the left hand side.This leaves us in an unfortunate situation - ordering is blocking the aggregator from being pushed down, which means we have to fetch all that data, and sort it to do the aggregation.
The Solution
The solution I typically use in these situations involves leveraging the phases we have in the planner.
Phases
We have several phases that run sequentially.After completing a phase, we run the push-down rewriters, then move to the next phase, and so on.
Rewriters perform one of two functions:
Running a rewriter over the plan to perform a specific task.For example, the "pull DISTINCT from UNION" rewriter extracts the DISTINCT part from UNION and uses a separate operator for it.
Controlling when push-down rewriters are enabled. Some rewriters only turn on after reaching a certain phase.
By delaying the "ordering under aggregation" rewriter until the "split aggregation" phase, we can push down the aggregation under the join.This doesn't stop the "ordering under aggregation" rewriter from doing its job, it just has to wait a bit before doing it.
The final tree looks like this:Aggregator (sum(`user`.type) group by user_extra.col)
└── Projection (sum(`user`.type) * count(*), user_extra.col)
    └── Ordering (user_extra.col)
        └── ApplyJoin (on [`user`.team_id = user_extra.id])
            ├── Route (Scatter on user)
            │   └── Aggregator (sum(type) group by team_id)
            │       └── Table (user)
            └── Route (Scatter on user_extra)
                └── Aggregator (count(*) group by user_extra.col)
                    └── Filter (:user_team_id = user_extra.id)
                        └── Table (user_extra)

Most of the aggregation has been pushed down to MySQL, and at the VTGate level, we are left with only SUMming the SUMs we get from each shard.
Conclusion
This optimization demonstrates the complexity of query planning and the importance of efficient tree rewriting in Vitess.By carefully pushing operations closer to the data, we can significantly improve query performance and resource utilization.
For more details on the implementation, you can check out the pull request on GitHub that addresses this optimization.]]></content>
        <summary><![CDATA[The Vitess query planner takes multiple passes over a query plan to optimize it as much as possible before execution. A recent tricky bug report led to an improvement in how the optimizer functions.]]></summary>
      </entry>
    
      <entry>
        <title>Dealing with large tables</title>
        <link href="https://planetscale.com/blog/dealing-with-large-tables" />
        <id>https://planetscale.com/blog/dealing-with-large-tables</id>
        <published>2024-07-10T00:00:00.000Z</published>
        <updated>2024-07-10T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Why do we have large tables?
Production database schemas can be complex, often containing hundreds or even thousands of individual tables.However, when dealing with very large data sets, it's often the case that only a small subset of these tables grow very large, into the hundreds of gigabytes or terabytes for a single table.
Consider the following, greatly simplified database schema for a workout-tracking application we'll call MuscleMaker:

Early on in this product's lifecycle, none of these tables are large enough to cause problems.However, what if this application becomes extremely popular?What if our userbase grows from the thousands to the hundreds of thousands and then into the millions.
In this case, the user table would start small, but eventually would contain several million rows, one per registered user.This is large but still quite a manageable size, with no significant scalability concerns so long as some care is taken with how it is queried and what kinds of indexes we maintain.
As for the exercise table, there are only so many types of exercises that people do.Therefore, the exercise table should not grow larger than a few hundred or a few thousand rows.However, the exercise_log table will have a new row added every time a user completes one exercise.A typical workout is composed of many exercises. Each user will generate 5-10 new rows in the exercise_log each time they work out.Assuming all users work out every day (which, admittedly, might be expecting a lot) this table will grow quickly.This table will gain at least several million rows per day, quickly grow to a total size of many billions or even trillions of rows.
In this scenario, this is the table that could become difficult to scale.Not only will this table likely grow into the terabytes in size, it may also have a large set of hot rows — rows that are frequently accessed.Many users will want to search through their historical exercise data, meaning old rows will likely be queried and aggregated on a regular basis.We cannot treat this table as one with a small set of hot data.
One table growing at a rapid rate is not unique to exercise applications.For example, the message table for a popular chat application would also grow by millions of rows per day.A similar problem arises for a like table in a social media platform's database.There are many such situations where one, or a small number of tables grow much faster than the others.How do we deal with scaling such data, especially when we experience rapid growth?
Option 1: Vertical Scaling
Lets say that initially the database for MuscleMaker has all tables co-located in one PlanetScale database.PlanetScale provides fully-managed Vitess and MySQL powered databases.Vitess is a layer that sits above MySQL, and provides mechanisms for high-availability, scalability, sharding, connection pooling, and more.When connecting to a Vitess database, an application server makes a connection to a VTGate, which acts as a proxy layer.The VTGate then communicates with one or more VTTablets, which in turn communicates with MySQL.Hosted on PlanetScale and powered by Vitess, the architecture of the MuscleMaker database at this point would look like this:

Note that typically we'd also have replica nodes set up for every MySQL instances in a keyspace.These are omitted in this article for simplicity, but it is best practice to use replicas for high availability and disaster recovery.
The most straightforward solution to handling a fast growing table is to scale this database vertically.When it is time to scale vertically, we'll need to increase the capacity of the server running the musclemaker keyspace.When scaling like this is needed, we often consider both compute resources (CPUs and RAM) as well as growing the underlying storage capacity (disk).When you have a fast-growing table but your current levels of compute can handle your workload just fine, you may only need to grow your disk size.How difficult this is depends greatly on how your database is managed.
If you are managing a bare-metal machine, then this will require installing and configuring new disks, as well as migrating your data to the new disks or expanding your existing file system.
If you're running your database on a cloud VM such as an EC2 instance, this process is a bit simpler as you don't have to manage hardware yourself.To grow your disk size, you'll want to:
Take a snapshot of the disk before resizing for backup / safety purposes.
Use the AWS UI or CLI to grow the volume to the desired size
Connect to your instance and tell the OS to grow your disk partition to utilize the added capacity.
With this system, you can do the resize online without taking down your server.However, some other solutions may required migrating to a new server with a larger disk.
On a fully managed solution like PlanetScale, storage scales automatically as usage increases.
However, for our workout app, we will need to grow compute resources along with the increasing table size.We'll be doing a huge amount of inserts and will also be executing queries that access and aggregate historic data on this large table.
With a managed service like PlanetScale, resizing to get more compute is as easy as a few clicks of a button.We also auto-resize storage as your data grows, so handling these cases is simple.If you are managing your own cloud infrastructure or managing bare-metal machines, the process of scaling vertically like this will be more tedious.
This type of scaling works well up to a point.When you have a multi-terabyte database, you typically want a large amount of RAM to keep as much as possible in memory.However, physical machines and cloud VMs can become prohibitively expensive as you start to reach machines that have hundreds of gigabytes of RAM and many CPU cores.Even with a large machine, memory contention can become a problem, slowing down your database and causing problems with your application.A good next step to consider is vertical sharding.
Option 2: Vertical Sharding
Vertical sharding is a fancy phrase with a simple meaning — moving large tables onto separate servers.When sharding vertically, you typically want to identify either the tables that are the largest or have the most reads and writes, and then isolate these onto separate servers.In MuscleMaker, the clear candidate for this is the exercise_log table.
Let's take a look at how we would go about this with a Vitess database.
Before, all of the tables are in the musclemaker keyspace powered by a server with 32 VCPUs with 64 Gigs or RAM and 4 terabytes of provisioned disk space.This machine is having issues keeping up with all of the demand from our application servers.
The next step is to assess what size machine we will need to handle just the exercise_log table on it's own.If this table is currently 2 terabytes and we know that it grows quickly, we will want to start off with extra disk space.We'll choose a machine with 4 TB of disk space, 32 VCPUs and 64 GB of RAM.We will also want to downsize the machine handling the rest of the database to 16 vCPUs and 32 GB of RAM.These two new machines will need to be set up with MySQL, Vitess, and must get added to the Vitess cluster.We'll refer to these two new keyspaces as musclemaker_log and musclemaker_main respectively.
At this point, all of the data (2.5 TB in total) is in the musclemaker keyspace.In Vitess, we must use the MoveTables workflow of the vtcltdclient command line tool to copy the tables into their new keyspaces.The commands to do this look something like this:vtctldclient MoveTables --workflow musclemakerflow create --target-keyspace musclemaker --source-keyspace musclemaker_log --tables "exercise-log"
vtctldclient MoveTables --workflow musclemakerflow create --target-keyspace musclemaker --source-keyspace musclemaker_main --tables "user,exercise"

With large tables, these two steps will take a while (hours).While this is happening, all production traffic will still be routed to musclemaker.Vitess gives us the ability to separately run a command to switch query traffic to the new keyspaces.The commands look like:vtctldclient MoveTables --workflow musclemakerflow --target-keyspace musclemaker_log switchtraffic
vtctldclient MoveTables --workflow musclemakerflow --target-keyspace musclemaker_main switchtraffic

At this point, production traffic is switched to using the two separate keyspaces.After you are confident everything is working well, the musclemaker keyspace can be taken offline.The architecture of the Vitess cluster now looks like this:There would likely also need to be some application-level changes in order to properly use the names of the two keyspaces.
After doing this, you've gained the ability to separately scale the storage and compute for the main database and the log database.This is an excellent solution used by many organizations to divide up their large tables, allowing them to continue to scale when they start to encounter limitations with single-primary setups for heavy write workloads.Over time, the musclemaker_log keyspace can be grown to handle more load.But what happens when even one very large server can't handle a gigantic table?
Option 3: Horizontal Sharding
Horizontal sharding is the ultimate solution for scaling massive tables.Horizontal sharding takes a single table and spreads the rows out across many separate servers based on a sharding strategy.Vitess supports many sharding strategies, one of the most common of these being hashing.With this sharding strategy, we choose a table we want to shard, and then select one of the columns to be our hashed sharding key.Each time we get a new row, a hash is generated for the column value, and this hash is used to determine which server to store the row on.It's important to put some thought into which column is to be used for this, as it can have big implications on the distribution of your data, and therefore the performance of the database.
For example, for exercise_log, one option would be to shard based on a hash of the log_id column.Each time we need to write a new exercise_log row, we would generate the ID for the row, hash the ID, and then use this hash to determine which shard to send the row to.Since a hash is used, this provides a (roughly) even distribution of data across all of the shards.There is a problem with this though: the logs for any given user will be spread out across all shards, as shown in the below diagram:

This means that any time we need to query the log history for one user, we might need to access data on many or all of our MySQL instances.This will be terrible for performance.
We can solve this problem by instead using the user_id as the hashed shard key.Each user_id will produce the same hash, and thus get sent to the same server.This means that when a user adds log events or reads their log, it will all hit the same server.Take a look at how the data is distributed when hashing by user_id instead:

Continuing with the same example Vitess cluster from before, let's see how we would spread out this log table across four shards:
First, we must make changes to the schema and vschema (Vitess schema) of our cluster.We'll kick this process off by adding a table to handle ID generation for the sharded table:create table if not exists exercise_log_id_sequence(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
insert into exercise_log_id_sequence(id, next_id, cache) values(0, 1000, 100);

We also need to tell Vitess about this new table and that it is going to be used for ID generation.The vtctldclient ApplyVSchema command can be used for this, with the following VSchema:{
  "tables": {
    "exercise_log_id_sequence": {
      "type": "sequence"
    }
  }
}

Next up is to tell Vitess that we want to shard the exercise_log table by user_id and use the exercise_log_id_sequence table for ID generation.This would also be applied using vtctldclient ApplyVSchema with this VSchema:{
  "sharded": true,
  "vindexes": { "hash": { "type": "hash" } },
  "tables": {
    "customer": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        }
      ],
      "auto_increment": {
        "column": "log_id",
        "sequence": "exercise_log_id_sequence"
      }
    }
  }
}

Finally, change the schema of the exercise_log table to remove auto_increment.Assigning IDs has shifted to be Vitess' responsibility instead of MySQL:alter table exercise_log change log_id log_id bigint not null;

Next, we need to spin up four new servers that will be our shards.These will be created within the musclemaker_log keyspace.We'll spin up four machines, each with 16 vCPUs, 32 Gigs of RAM, and 2 terabytes of storage.These will be named using Vitess' convention of shard hash range naming: -40, 40-80, 80-c0, c0-.
When the shards are ready, we need to issue a command to shard the data, which will copy it from the unsharded musclemaker_log server into the four shards, also in the musclemaker_log keyspace.vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log create --source-shards '0' --target-shards '-40,40-80,80-c0,c0-'

When this finishes, we can switch the traffic to the sharded table:vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log switchtraffic

The sharding process is now complete.The new architecture of the Vitess cluster looks like this:
Sharding in this way has many benefits beyond just being able to spread out the data across many disks.A few of these include:
Increased write throughput.We are no longer reliant on a single primary to handle all writes.Whenever write performance is bottlenecking, we can upsize shards or add new shards.
Backup speed.Instead of backing up a single table on 1 machine, the table is divided up across many shards.Therefore, taking backups of the table can happen in parallel, vastly speeding up overall backup time.
Failure isolation.Any node that goes down only effects a subset of the table, not the whole data set.We also would typically use replication, and Vitess' auto-failover features to mitigate issues here.
Cost savings.In some instances, running many small cloud VM instances is more affordable than a single, top-of-the-line instance.
Vitess also has support for resharding an already-sharded table.This means that as the data size and I/O workload continues to grow, we can expand out to using more and more shards.We can also downsize or use less shards if demand decreases, or data is purged, in the future.Horizontal sharding gives us infinite options for scaling the capacity of a table.
Conclusion
Vertical scaling, vertical sharding, and horizontal sharding are useful techniques for scaling to handle large tables.Each technique is useful for different phases of the growth of a table.Typically, you scale your entire database vertically, then shard vertically, then use horizontal sharding for ultimate scalability of large workloads.]]></content>
        <summary><![CDATA[Large databases often have a small number of very large tables that makes scaling difficult. How can you scale with these while keeping your database performant? This article covers three techniques.]]></summary>
      </entry>
    
      <entry>
        <title>Sharding strategies: directory-based, range-based, and hash-based</title>
        <link href="https://planetscale.com/blog/types-of-sharding" />
        <id>https://planetscale.com/blog/types-of-sharding</id>
        <published>2024-07-08T15:00:00.000Z</published>
        <updated>2024-07-08T15:00:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In the past, sharding was often a last resort for scaling a growing a database. With solutions like Vitess and PlanetScale, it's much less intimidating to shard your database. However, even with our in-dashboard sharding options, it's still not exactly as simple as clicking a button.
When it's time to shard, you'll still have to come up with a sharding strategy — the procedure the database uses to distribute incoming data across shards. There's a lot that goes into this, and which we'll cover in a short series of posts.
In this article, we'll start by covering some of the different types of sharding you can do: directory or lookup-based, range-based, and hash-based.
For a condensed overview of sharding strategies, you should also check out the Sharding Strategies lesson in our Database Scaling course.
Directory-based sharding
With directory-based sharding, also known as lookup-based sharding, you map your shards with something called a lookup table.
For example, if you want to shard a table based on region so that all rows in a certain region end up on the same shard, you can set up a lookup table that maps that region to the specific shard.
Let’s say we have this members table here that has name and court columns. In this example, court is essentially the region that the member is from. We want to split them up by court, so we create a lookup table that maps each court to a shard.

In this case, we’re using the court column as our shard key. The shard key is the chosen column that will be used as the basis for sharding the table.
After sharding using this technique, the data will be distributed like this:

Pros and cons of directory-based sharding
This can be a great option if you have very specific criteria for separating your data onto different shards. It also makes it pretty simple to add new courts, if needed. If we suddenly have 4 new courts join, we can just spin up 4 new shards if the dataset is large enough or if we want to stay on 4 shards, we can assign the new courts to one of the existing shards by throwing them in the lookup table.
There are, however, some downsides to this method. As you can imagine, if one court has significantly more members than another, you risk having one shard that’s much larger than the others. Additionally, if one court has one or several members that are much more active in whatever you’re doing, you can create a hotspot where that shard is being accessed much more frequently than the others. In short, data distribution can become a problem.
You also have the added factor of the lookup table itself, which results in an extra query every time you need to read or write a row. Before you do anything, you have to first consult the lookup table to see where something lives. In practice, you’d want to heavily cache the lookup table to make it faster, but you’ll still have that extra step regardless.
Range-based sharding
Range-based sharding is another type of sharding strategy that involves splitting up your shard mapping based on a range of values. This is a silly example, but let's say we wanted to shard based on the name column in that members table — so using name as the shard key.
We want any name where the first letter starts with A-G to go to shard 1, H-N to go to shard 2, and so on.
With this sharding strategy, the rows of the table would be distributed like this:

Pros and cons of range-based sharding
The problem with this setup should be immediately clear. Depending on how you’re picking these ranges, you can easily overload shards. In this case, shard 1 has 5 records in it while shard 4 has 0. This is of course because certain letters occur much more frequently as the first letter in a name.
This is one of the downsides of range-based sharding, but there are definitely ways to avoid it. For example, if we started with this method for mapping the ranges, but then quickly noticed that shard 1 was growing much faster than the others, we may decide to break shard 1 up into multiple shards with a reshard operation. We can also make the range for the last shard larger since those letters happen less frequently.

Overall, it’s a relatively simple approach for sharding and gives you a meaningful way to map data to shards.
Hash-based sharding
The final type of sharding we’re going to discuss is hash-based sharding. With this method, you decide which shard key you want to use, then run that value through a hash function. You then shard based on the output of the hash function.
It’s similar in a way to range-based sharding, but this time, we’re using an alphanumeric range instead of say letter-based in the previous example.
For this example, let’s say we have an id for each court member in our members table, and we want to use this id as our shard key. We’re going to run that id through an md5 hash function, and then distribute the output across 4 shards.

Pros and cons of hash-based sharding
One of the pros of this method are that you’re taking human guesswork out of the equation. Rather than relying on a human to guess at how much data will fall into each range, the hash will keep the data evenly distributed, so long as the shard key has high cardinality. There’s less chance of overloading a shard because you’re distributing the data based on a well-proven hash function. However, you still have to put some thought into what shard key you choose here, as you want it to work well with the data access patterns of your application.
With this method, it’s also a bit easier to reshard if you find that certain shards are taking more of a hit than others. The main downside is having the extra hashing step, but in practice, this isn't usually a huge deal.
Comparison of sharding types
You may be wondering how you can decide which type of sharding to go with. At PlanetScale, we typically steer customers toward hash-based sharding as the default. It generally gives you the most even distribution and isn't overly complicated.
The diagram below shows what the distribution looked like in our small example. In this case, hash-based sharding resulted in the most even distribution. Of course, this is a tiny example and not a realistic sharding scenario given the size, but in general this holds true in larger datasets as well.

If you have a database that you think is ready to shard and are curious if PlanetScale is the right solution for you, fill out our contact form, and we'll be in touch.]]></content>
        <summary><![CDATA[Learn about the different types of sharding: directory-based, range-based, and hash-based plus some of the pros and cons of each.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 20</title>
        <link href="https://planetscale.com/blog/announcing-vitess-20" />
        <id>https://planetscale.com/blog/announcing-vitess-20</id>
        <published>2024-06-27T09:01:00.000Z</published>
        <updated>2024-06-27T09:01:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We're delighted to announce the release of Vitess 20 along with version 2.13.0 of the Vitess Kubernetes Operator.
Version 20 focuses on improving the usability and maturity of existing features while continuing to build on the solid foundation of scalability and performance established in previous versions. Our commitment remains steadfast in providing a powerful, scalable, and reliable solution for your database scaling needs.
What's new in Vitess 20
Query Compatibility: enhanced DML support including improved query compatibility, Vindex hints, and extended support for various sharded update and delete operations.
VReplication: multi-tenant imports (experimental).
Online DDL: improved support for various schema change scenarios, dropping support for gh-ost.
Vitess Operator: automated and scheduled backups.
Let's dive deeper into some key highlights of this release.
Query compatibility
The latest Vitess release enhances DML support with features like Vindex hints, sharded updates with limits, multi-table updates, and advanced delete operations.
Vindex hints enable users to influence shard routing:SELECT * FROM user USE VINDEX (hash_user_id, secondary_vindex) WHERE user_id = 123;
SELECT * FROM order IGNORE VINDEX (range_order_id) WHERE order_date = '2021-01-01';

Sharded updates with limits are now supported:UPDATE t1 SET t1.foo = 'abc', t1.bar = 23 WHERE t1.baz > 5 LIMIT 1;

Multi-table updates and multi-target updates enhance flexibility:UPDATE t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON t1.col = t3.col SET t1.baz = 'abc', t1.apa = 23 WHERE t3.foo = 5 AND t2.bar = 7;
UPDATE t1 JOIN t2 ON t1.id = t2.id SET t1.foo = 'abc', t2.bar = 23;

Advanced delete operations with subqueries and multi-target support are included:DELETE FROM t1 WHERE id IN (SELECT col FROM t2 WHERE foo = 32 AND bar = 43);
DELETE t1, t3 FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON t1.col = t3.col;

These features provide greater control and efficiency for managing sharded data. For more details, please refer to the Vitess and MySQL documentation.
VReplication: multi-tenant imports (experimental)
Many web-scale applications use a multi-tenant architecture where each tenant has their own database (with identical schemas). There are several challenges with this approach — like provisioning and scaling potentially tens of thousands of databases and uniformly updating database schemas across them.
A sharded Vitess keyspace is a great option for such a system with a single logical database serving all tenants. Vitess 20 adds support for importing data from such a multi-tenant setup into a single Vitess keyspace, with new --shards and --tenant-id flags for the MoveTables workflow. You would run one such workflow for each tenant, with imported tenants being served by the Vitess cluster.
Online DDL improvements
Vitess migrations now support enum definition reordering. Vitess opts to use enums by alias (their string representation) rather than by ordinal value (the internal integer representation).
Vitess now has better analysis for INSTANT DDL scenarios, enabled with the --prefer-instant-ddl DDL strategy flag. It is able to predict whether a migration can be fulfilled by the INSTANT algorithm and use this algorithm if so.
It also improves support for range partitioning migrations, and opts to use direct partitioning queries over Online DDL where appropriate.
VDiffs can now be run on Online DDL workflows that are still in progress (i.e., not yet cut-over).
Release 20.0 drops support for gh-ost for Online DDL, as we continue to invest in vitess migrations based on VReplication. The gh-ost strategy is still recognized; however:
Vttablet binaries no longer bundle the gh-ost binary. The user should provide their own gh-ost binary, and supply vttablet --gh-ost-path.
Vitess no longer tests gh-ost in CI/end-to-end tests.
Vitess-operator
Automated and scheduled backups are now available as an experimental feature in v2.13.0. We have added a new user guide for this feature.
Vitess and the community
As an open-source project, Vitess thrives on the contributions, insights, and feedback from the community. Your experiences and input are invaluable in shaping the future of Vitess. We encourage you to share your stories and ask questions on GitHub or in the Slack Vitess community.
Getting started
For a seamless transition to Vitess 20, we highly recommend reviewing the detailed release notes. Additionally, you can explore the Vitess documentation for guides, best practices, and tips to make the most of Vitess 20. Whether you're upgrading from a previous version or running Vitess for the first time, our resources are designed to support you every step of the way.
Thank you for your support and contributions to the Vitess project!]]></content>
        <summary><![CDATA[Vitess 20 is now generally available.]]></summary>
      </entry>
    
      <entry>
        <title>Self-managed Vitess vs Managed Vitess with PlanetScale</title>
        <link href="https://planetscale.com/blog/self-run-vs-managed-vitess-with-planetscale" />
        <id>https://planetscale.com/blog/self-run-vs-managed-vitess-with-planetscale</id>
        <published>2024-05-24T14:00:00.000Z</published>
        <updated>2024-05-24T14:00:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[People often ask us — why would I use PlanetScale when Vitess is open source? Surely I can just run it myself? What added benefits do I get with PlanetScale? In this article, we're going to answer all of those questions and more.
PlanetScale’s relationship with Vitess
Let’s start by first defining the relationship between PlanetScale and Vitess.
History of Vitess
Vitess was created at YouTube in 2010 to scale their MySQL instances, primarily surrounding issues they faced with heavy write traffic and connection limitations. In 2015, Vitess was donated to the CNCF, and it achieved graduated status in 2018. Since then, Vitess has been adopted by hundreds of the largest sites on the internet to scale their MySQL clusters. Among these companies are Slack, Etsy, GitHub, HubSpot, Shopify, Square, Pinterest, and more.
PlanetScale on Vitess
PlanetScale was founded in 2018 by the original co-creators of Vitess from YouTube. PlanetScale is the largest contributor to the Vitess codebase, and we employ roughly 75% of the Vitess maintainers.
Our vision for PlanetScale has always been to create the database platform that much of our staff wish they had during their many years operating sites at massive scale. Essentially, making it as easy as possible to create and maintain databases — at any scale. Of course, a crucial part of that is ensuring the underlying technology itself can handle scale. Vitess was the perfect choice, but the mission doesn’t end there.
PlanetScale aims to make Vitess and MySQL easy to use, operate, and maintain. To do this, we have made it as easy as a click of a button to spin up a new Vitess cluster. We also put a lot of care into crafting the perfect developer experience. We have introduced features like database branching, deploy requests, and more to make online schema changes an afterthought.
Resources required to run and maintain Vitess
Companies that we talk to that are at the start of their Vitess implementation journey often ask us what resources are required to set up and maintain Vitess clusters on your own. The PlanetScale team has extensive experience running thousands of MySQL instances in Vitess clusters, not just at PlanetScale, but also in their previous roles at other companies.
In this section, I’ll walk through some of our findings about the cost, time, and other requirements involved to run and manage Vitess. I'll also highlight some public Vitess-user testimonials to help paint a better picture of this in practice.
Time to implement
As you can imagine, the time it takes to implement Vitess varies widely. There is often a non-trivial amount of prep work that you have to do to "ready" your system to use Vitess. You can take a look through the MySQL compatibility documentation to get an idea of what kind of prep work you may need to do. For example, we commonly see stored procedures and CTEs as blockers. The Vitess team is constantly working to close the gap between full compatibility, and has made great strides recently, but it is a good idea to look through the documentation to make sure you are not heavily using any unsupported MySQL features.
Once you do this underlying work, then the fun really begins — implementation. This involves identifying which tables (if any) you will shard, planning the sharding scheme, server provisioning, mapping out initial resource allocation, replica usage, etc. Next, you'll perform the actual migration, which can be just as time consuming if you run into issues.
Let’s look at some examples of companies who have implemented Vitess on their own.
Slack
The team over at Slack wrote a blog post about their experience implementing and working with Vitess. It is a fantastic read, so I highly recommend reading through the full post.
In short, their journey to implementing and fully moving over to Vitess began in July of 2017. It wasn’t until late 2020 that they were fully migrated over. Those three years of migration of course required a lot of resources and focus on this project, first starting with a PoC, then mapping out and implementing the migration, contributing changes to the Vitess codebase, and more.
To start, they decided to build a proof of concept — getting a small feature into production using Vitess.
We decided to build a prototype demonstrating that we can migrate data from our traditional architecture to Vitess and that Vitess would deliver on its promise. Of course, adopting a new datastore at Slack scale is not an easy task. It required a significant amount of effort to set up all the new infrastructure in place.
The authors go on to detail some of the work that went into this. Remember, this is just for a tiny service (relative to Slack's total datastores).
Our goal was to build a working end-to-end use case of Vitess in production for a small feature: integrating an RSS feed into a Slack channel. It required us to rework many of our operational processes for provisioning deployments, service discovery, backup/restore, topology management, credentials, and more. We also needed to develop new application integration points to route queries to Vitess, a generic backfill system for cloning the existing tables while performing double-writes from the application, and a parallel double-read diffing system so we were sure that the Vitess-powered tables had the same semantics as our legacy databases.
The PoC required a ton of upfront work, but proved to be the correct choice for scalability, so they decided to proceed with the full migration.
However, it was worth it: the application performed correctly using the new system, it had much better performance characteristics, and operating and scaling the cluster was simpler. Equally importantly, Vitess delivered on the promise of resilience and reliability. This initial migration gave us the confidence we needed to continue our investment in the project.

Source: https://slack.engineering/scaling-datastores-at-slack-with-vitess
Overall, their journey took a total of 3 years and required extensive work from the team. To get Vitess working for Slack, they even became one of the largest contributors to the Vitess repo.
There are many other stories to tell in these 3 years of migrations. Going from 0% to 99% adoption also meant going from 0 QPS to the 2.3 M QPS we serve today. Choosing appropriate sharding keys, retrofitting our existing application to work well with Vitess, and changes to operate Vitess at scale were necessary and each step along the way we learned something new.
Square
The team over at Square wrote a blog series detailing their experience implementing Vitess. The first post is subtitled “Ripping Vitess apart and putting it back together. And the first sentence in the last part of the series sets the stage for what was to come with their implementation:
It has been quite the challenge bringing Vitess online over our existing MySQL database, then sharding and operating it at greater scale over time.
Back in 2016, Square's Cash product was growing quickly. They knew they needed to find a new solution, and fast.
Cash was growing tremendously and struggled to stay up during peak traffic. We were running through the classic playbook for scaling out: caching, moving out historical data, replica reads, buying expensive hardware. But it wasn’t enough. Each one of these things bought us time, but we were also growing really fast and we needed a solution that would let us scale out infinitely. We had one final item left in the playbook.
We had to shard.
Once they started researching, they stumbled upon Vitess. They, like Slack and most Vitess users, had to first deal with some underlying prep work before they could begin the migration. Once that was complete, they ran into unexpected issues, some of which they had to solve while in production: deadlocks that caused outages, handling scatter queries, keeping transactions ACID, resharding, and much more.
Sharding Cash’s database with Vitess was a massive undertaking that set us up for the future, but it was just the start of the journey.
Other users of Vitess (like YouTube) can make different trade offs — maybe dropping a comment every once in a while isn’t the end of the world for them. But not us. So the first thing we had to do was change our application code so that it wouldn’t do cross shard transactions in critical money-processing portions of the code.
Although a long, arduous process, it was an essential step in scaling to the next level. And their efforts proved it was worth it, as they knew Vitess was the correct long-term solution to set them up for near-infinite scale.
Running Vitess on your own
Now, you may not be quite at Slack or Square scale (yet!), but as you can see, implementing and maintaining a system like Vitess isn’t exactly a hands-off task.
There are of course several companies out there that have deployed and continue to maintain Vitess on their own, so don't shy away from it if you're set on implementing Vitess yourself. You can find some excellent resources on the Vitess website to support you through your work. And if you do hit any roadblocks, the Vitess community is fantastic. The Vitess Slack channel has over 4,000 members at the time of writing this, and they are always happy to provide guidance if needed.
So, in short, it can take a long time to get all the pieces in production, but it really depends on the complexity of your application and the resources you have available to dedicate to the work.
Regardless, it's always nice to have a team with Vitess expertise on-hand to assist you with the process. Especially because making an infrastructure change of this magnitude can be downright scary.
After I run this command we will have completed the first (out of many many) shard splits.
It’s not reversible without very significant data loss.
I am utterly terrified.
I stand up and pace around the room. Is this really it? Have we thought about everything? What if I missed something? What if I screw everything up?
-Jon Tirsen
If you're interested, here are some additional stories about companies implementing and working with Vitess:
Vinted Vitess Voyage
Horizontally scaling the Rails backend of Shop app with Vitess
HubSpot Improving Reliability: Building a Vitess Balancer to Minimize MySQL Downtime
How HubSpot Upgraded a Thousand MySQL Clusters at Once
Activision’s Journey to Scaling Databases with Vitess
Ongoing management tasks
Preparing for and implementing Vitess is one thing, but the work doesn't end there. PlanetScale has huge teams of engineers dedicated to managing the infrastructure that our customers run on. Version upgrades, running backups, Kubernetes cluster maintenance, 24/7 monitoring, managing integrations, and much more. If you're going to run Vitess on your own, you'll need to have a team on hand to manage all of this.
Version upgrades
One important thing to keep in mind is that the Vitess team typically releases 2-3 major versions per year. This means it's easy to fall behind, especially when you have a smaller team managing your Vitess instances. We frequently onboard customers that are 3+ versions behind who just didn't have the time to keep up with testing and implementing the upgrades.
One nice thing about running on PlanetScale's infrastructure is that we completely handle all of the version upgrades for you — without any downtime. In addition to the Vitess updates, we also handle MySQL updates as well.
Resharding
Once you're in production, your app will likely still continue to grow. Teams sometimes end up facing the challenge of resharding much sooner than anticipated. With PlanetScale's support, we are very hands-on in assisting you with this operation.
Cost
Much of the cost that comes with running Vitess amounts to the team size and time spent implementing and maintaining it. Machine-wise, you may end up saving a bit of money running on Vitess, as you can use smaller machines for each partitioned shard than you would if you just vertically scaled. This also allows you to have more predictable infrastructure pricing, as you aren't experiencing such large jumps to scale up.
To give a ballpark number, for smaller/mid-sized applications running Vitess (200GB-1TB), we usually see teams of around 3-6 DBAs focused on day-to-day database operations. Plus more (closer to 10) for the initial implementation. If you have a much larger application (1+ TB), you're looking at at least 10 people at all times.
With a PlanetScale Enterprise plan, the raw infrastructure costs are usually in line with what you'd pay running Vitess on your own. Apart from your cloud bill for the infrastructure, you'll pay PlanetScale for management and enterprise support. The PlanetScale costs scale with the resources and storage that you provision.
The time and additional people costs you save by having our team manage everything for you typically makes PlanetScale a more cost-effective solution than running Vitess on your own.
We are very happy with our decision to migrate to PlanetScale Metal which enabled us to achieve the rare outcome of improvements in performance, cost, and reliability – a win for our customers and our business.
Aaron Young, Engineering Manager @ Block
What additional features do you get with PlanetScale?
While we do run on Vitess under the hood, that’s not all you get with PlanetScale. We give you the Vitess cluster with a bunch of add-ons on top.
Vitess maintenance and support
A PlanetScale-managed Vitess cluster gives you much more than Vitess under the hood. One of the biggest perks is your access to in-house Vitess support and expertise. Our support team truly becomes an extension of your own team.
Our Enterprise support package gives you direct access to our team through a shared private Slack channel and recurring video calls (if desired). You're able to ask us any questions you have about performance, architecture, query optimization, and whatever else you need. We work hard to make sure that all of our customers are successful and have a great experience on PlanetScale.
You'll also get hands-on assistance at every step of the migration process. We are in close contact with you starting with the proof of concept, onto planning out the sharding scheme, planning and implementing the migration, performing version upgrades, and continuing to support you throughout our entire relationship — well beyond getting into production. We even hold the pager for you, often detecting and mitigating any issues well before your team is even aware they existed.
Databases are hard. We would rather PlanetScale manage them. We wanted the support PlanetScale offers because they are the experts in the field. We’ve seen this come to fruition in our relationship.
-Chris Karper, VP of Engineering at MyFitnessPal
Another huge perk is our close relationship with Vitess. Because we employ around 75% of the Vitess maintainers and are the largest contributors to the repository, we have a huge wealth of knowledge and a large concentration of Vitess expertise.
Dashboard for branching, deploy requests, and more
Because PlanetScale’s mission is to make it as easy as possible to manage your database clusters at any size, we put a lot of thought and care into our dashboard to make it simple to leverage both Vitess and PlanetScale features.

Vitess comes with built-in support for online schema changes. However, you cannot do schema changes through the VTAdmin UI. With PlanetScale, you get access to deploy requests. Deploy requests are used to make safe, reviewable, diffable schema changes to your database.
This enables your team to work quickly, efficiently, and safely.
PlanetScale Global Network

The PlanetScale Global Network is our edge infrastructure that is responsible for automatically routing reads to the closest replica. This is an additional layer on top of Vitess that you only get with PlanetScale. It also supports the following features:
Latency-based routing
Near-infinite connection pooling
In-app private data access
First-class serverless support
PlanetScale Connect
Compliance and security
PlanetScale is committed to delivering a powerful and easy-to-use database platform while keeping your data secure. We have a number of crucial certifications and compliance measures in place as well, which are essential for customer audits. Some of these security measures that are baked into the PlanetScale product include:
SOC 1 Type 2
SOC 2 Type 2+ HIPAA
BAAs available
PCI compliant (on PlanetScale Managed)
Compliance with GDPR and other global privacy regulations
Data locality
Private database connectivity through PrivateLink or GCP Private Service Connect
Audit and security logs
IP restrictions
And more
Learn more in our Security documentation.
Query monitoring and insights
PlanetScale Insights is our in-dashboard query performance analytics tool that allow you to track performance down to the individual query level. This gives you a great view of metrics such as query latency, errors, and anomalies — any query activity that we detect as out of the ordinary. You'll also receive schema recommendations to improve database performance, reduce memory and storage, and improve your schema based on production database traffic.

PlanetScale can help
Adding a massive piece of technology to your infrastructure can certainly be daunting. The path to production sometimes doesn't go according to plan and can often require much more time and money than originally budgeted.
At PlanetScale, we see time and time again companies who know Vitess is the right solution for them, but aren't sure if they want to spend the time implementing and managing it on their own. If you're in that boat, it's easy to set up a call with our Technical Solutions team to see if we're the right fit for you. Our sales process aims to make it as easy as possible to get all of your questions answered as fast as possible. You're able to jump straight into a technical evaluation right away — either via email or call.
There are also instances where your desired configuration isn't something that would work well with PlanetScale, and we don't shy away from telling you if so. At the very least, a quick chat with us will hopefully at least bring clarity that you're definitely choosing the correct path.
Again, we employ the majority of the Vitess maintainers, and our entire team at PlanetScale has extensive experience implementing and maintaining huge Vitess clusters. If you are thinking about implementing Vitess on your own and are curious how we can help, we'd love to hear from you. Simply fill out our contact form, and we will be in touch.]]></content>
        <summary><![CDATA[PlanetScale and Vitess have a close relationship. Learn what it looks like to run Vitess on your own vs using PlanetScale. We cover cost, time to implement, management, and more.]]></summary>
      </entry>
    
      <entry>
        <title>Achieving data consistency with the consistent lookup Vindex</title>
        <link href="https://planetscale.com/blog/vitess-consistent-lookup-vindex" />
        <id>https://planetscale.com/blog/vitess-consistent-lookup-vindex</id>
        <published>2024-04-29T00:00:00.000Z</published>
        <updated>2024-04-29T00:00:00.000Z</updated>
        
        <author>
          <name>Harshit Gangal</name>
        </author>
        
        <author>
          <name>Deepthi Sigireddi</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[What are Vindexes
Vitess uses Vindexes (short for Vitess Index) to associate rows in a table with a designated address known as Keyspace ID. This allows Vitess to direct a row to its intended destination, typically a shard within the cluster.
Vindexes play a dual role: enabling data sharding through Primary Vindexes and facilitating global indexing via Secondary Vindexes. Through this mechanism, Vindexes serve as an indispensable tool for routing queries in a sharded database, ensuring optimal performance and scalability.
Lookup Vindex
Lookup Vindex as a Secondary Vindex is used to direct Select/Update/Delete queries to the appropriate shard without incurring the performance penalty associated with scatter-gather operations—wherein the query is sent to all shards for processing.
When data is inserted into a table, a separate global index table maintains the mapping of a secondary index column to the corresponding Keyspace ID. This mapping information is later used to efficiently route queries to the destination shard.
Secondary Vindexes can be unique or non-unique, and we’ll illustrate both types. Let us look at an example to see how this works.
Data table definition
USER Table+-------+--------------+------+-----+
| Field | Type         | Null | Key |
+-------+--------------+------+-----+
| id    | bigint       | NO   | PRI |
| name  | varchar(255) | YES  |     |
| phone | bigint       | YES  | UNI |
| email | varchar(255) | YES  |     |
+-------+--------------+------+-----+

Non unique Vindex table definition
NAME_USER_VDX Table+-------------+--------------+------+-----+
| Field       | Type         | Null | Key |
+-------------+--------------+------+-----+
| name        | varchar(255) | NO   | PRI |
| id          | bigint       | NO   | PRI |
| keyspace_id | binary(8)    | YES  |     |
+-------------+--------------+------+-----+

Unique Vindex table definition
PHONE_USER_VDX Table+-------------+-----------+------+-----+
| Field       | Type      | Null | Key |
+-------------+-----------+------+-----+
| phone       | bigint    | NO   | PRI |
| keyspace_id | binary(8) | YES  |     |
+-------------+-----------+------+-----+

When executing a query like select id, phone, email from user where name = 'Alex', the query planner uses the lookup Vindex table, name_user_vdx, to map the value Alex to its corresponding Keyspace ID. This lets the planner direct the query to a single destination shard rather than to all shards, thus avoiding a costly scatter-gather operation.
Of particular interest is the Consistent Lookup Vindex, a type of Secondary Vindex, which further enhances the efficiency and reliability of this routing mechanism.
Consistent lookup Vindex
The user data table and lookup Vindex tables are both sharded in most cases to enable optimal performance and storage. The sharding column for the user table and the Vindex tables are likely to be different. In the scenario above, let's consider the sharding columns to be:
Table
Primary Vindex Column
User
id
Name_User_Vdx
name
Phone_User_Vdx
phone
Changing data in the user table through DML statements (Insert/Update/Delete) leads to changes to rows in the Vindex tables as well. To maintain consistency between the user data table and the Vindex tables, all these operations will need to occur in a transaction that spans multiple shards. This means we need to implement a costly protocol like 2PC (Two Phase Commit) to guarantee Atomicity and Isolation (A and I from ACID). Not using a proper multi-shard transaction for these operations can lead to partial commit and inconsistent data.
Consistent Lookup Vindex uses an alternate approach that makes use of careful locking and transaction sequences to guarantee consistency without using 2PC for all DML operations. This allows Vitess to provide a consistent view of the user data table even when record in the Vindex tables may be inconsistent.
When data is being modified, Vitess uses 3 connections to perform DML operations. Let’s call them Pre, Main and Post. Any transaction open on these connections will follow a well-defined sequence of operations. Committing a transaction will result in the following:
Commit on Pre
Commit on Main
Commit on Post
A failure in any of these steps rolls back the remaining open transactions in the same order.
Let’s look at an example to see how this works.
Sample rowsUSER:
+-----+------+------------+-----------------+
| id  | name | phone      | email           |
+-----+------+------------+-----------------+
| 100 | Alex | 8877991122 | alex@mail.com   |
| 200 | Emma | 8811229988 | emma@mail.com   |
+-----+------+------------+-----------------+

NAME_USER_VDX:
+------+-----+--------------------------+
| name | id  | keyspace_id              |
+------+-----+--------------------------+
| Alex | 100 | 0x313030                 |
| Emma | 200 | 0x323030                 |
+------+-----+--------------------------+

PHONE_USER_VDX:
+------------+--------------------------+
| phone      | keyspace_id              |
+------------+--------------------------+
| 8811229988 | 0x323030                 |
| 8877991122 | 0x313030                 |
+------------+--------------------------+

Delete operation
Deletion of Lookup Vindex table data happens through the Post connection.
Example: delete from user where id = 100
First select all the lookup columns from the User Table
Main: select id, name, phone from user where id = 100 for update
Delete the Lookup Vindex Rows
Post-Transaction:
delete from name_user_vdx where name = 'Alex' and id = 100
delete from phone_user_vdx where phone = 8877991122
Delete the User Table Row
Main: delete from user where id = 100
On Commit, suppose the Main transaction succeeds but the Post transaction fails. Let’s see how we are still able to maintain consistency.
Updated rowsUSER:
+-----+------+------------+-----------------+
| id  | name | phone      | email           |
+-----+------+------------+-----------------+
| 200 | Emma | 8811229988 | emma@mail.com   |
+-----+------+------------+-----------------+

NAME_USER_VDX:
+------+-----+--------------------------+
| name | id  | keyspace_id              |
+------+-----+--------------------------+
| Alex | 100 | 0x313030                 |
| Emma | 200 | 0x323030                 |
+------+-----+--------------------------+

PHONE_USER_VDX:
+------------+--------------------------+
| phone      | keyspace_id              |
+------------+--------------------------+
| 8811229988 | 0x323030                 |
| 8877991122 | 0x313030                 |
+------------+--------------------------+

If a select query is receivedselect count(*) from user where name = 'Alex'
A lookup call will happen with name = 'Alex' to the name_user_vdx Vindex which will return the shard destination with keyspace_id of 0x313030. When the query is sent down to the specific shard a matching row does not exist in the User table any longer and hence will return no results.+----------+
| count(*) |
+----------+
|        0 |
+----------+

The lookup Vindex table may be inconsistent with the User table but the results returned for the query remained consistent with the User table.
Insert operation
Insertion of Lookup Vindex table data happens through the Pre connection.
Example: insert into user(id, name, phone, email) values (300, 'Emma', 8877991122, 'xyz@mail.com')
Insert into Lookup Vindex tablePre-Transaction:
insert into name_user_vdx(name, id, keyspace_id) values ('Emma', 300, '0x333030')No error as name is a non-unique column.
insert into phone_user_vdx(phone, keyspace_id) values (8877991122, '0x333030')This results in a duplicate key error as it is a unique column. Note that this row is left over from the error we got during the previous delete operation. We’ll get into the details of how this is handled in a minute.
Insert the User table RowMain: insert into user(id, name, phone, email) values (300, 'Emma', 8877991122, 'xyz@mail.com')
Handling of Duplicate Key Error in Lookup Vindex:
Lock the lookup row so that no other transaction can race with the current operation.Pre-Transaction: select phone, keyspace_id from phone_user_vdx where phone = 8877991122 for update
Lock the main table row to ensure that the row we want to insert does not exist yet and no other transaction can race with the current operation.
Main: select phone from user where phone = 8877991122 for updateBecause we previously deleted the corresponding row for this select, it will return no results. This tells us that the lookup Vindex table has an orphan row which can be updated with the new value from the insert statement.
Pre-Transaction: update phone_user_vdx set keyspace_id = ‘0x333030’ where phone = 8877991122
Updated rowsUSER:
+-----+------+------------+-----------------+
| id  | name | phone      | email           |
+-----+------+------------+-----------------+
| 200 | Emma | 8811229988 | emma@mail.com   |
| 300 | Emma | 8877991122 | xyz@mail.com    |
+-----+------+------------+-----------------+

NAME_USER_VDX:
+------+-----+--------------------------+
| name | id  | keyspace_id              |
+------+-----+--------------------------+
| Alex | 100 | 0x313030                 |
| Emma | 200 | 0x323030                 |
| Emma | 300 | 0x333030                 |
+------+-----+--------------------------+

PHONE_USER_VDX:
+------------+--------------------------+
| phone      | keyspace_id              |
+------------+--------------------------+
| 8811229988 | 0x323030                 |
| 8877991122 | 0x333030                 |
+------------+--------------------------+

Update operation
Update of Lookup Vindex table data happens through a Delete operation followed by an Insert operation. We already know that Delete operation is handled through Post connection and Insert operation through Pre connection.
In the special case of an update where the Vindex column value is unchanged, it will cause lock wait timeout on the Insert operation (on the Pre connection) as the row lock will be held by the Delete operation (on the Post connection). To mitigate this, updating Vindex column data with the same value as before is turned into a no-op for lookup Vindex tables.
However, it is still possible to run into this limitation if the same lookup Vindex value is deleted and inserted as two different statements inside the same transaction.
If you want to learn more about this feature in Vitess, check out the Vitess documentation.They have docs on Vindexes, Unique and Non-Unique lookup vindexes.You are also welcome to join the vitess community.]]></content>
        <summary><![CDATA[How we implemented a consistent lookup Vindex in Vitess to ensure data consistency without using 2PC]]></summary>
      </entry>
    
      <entry>
        <title>The MySQL adaptive hash index</title>
        <link href="https://planetscale.com/blog/the-mysql-adaptive-hash-index" />
        <id>https://planetscale.com/blog/the-mysql-adaptive-hash-index</id>
        <published>2024-04-24T00:00:00.000Z</published>
        <updated>2024-04-24T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[If you're using MySQL, you likely have indexes that are powered by B-trees.The B-tree is a powerful data structure, and is frequently used to construct indexes in relational databases.If you are using the InnoDB storage engine, it is the only choice for your index, save for spatial indexes.However, MySQL has a secret weapon for making lookups with these types of indexes even faster: the Adaptive Hash Index, or AHI.Before jumping into how this works, let's take a few moments to review B-trees, the InnoDB buffer pool, and how these work together during index lookups.
B-Tree indexes
The B-tree data structure has been used by computer systems for decades.It is particularly useful in the context of file system and data storage applications, due to the fact that each node can store many values.This is useful in algorithms that interface with storage systems, as the size of each node can be set to work well with the unit(s) of storage, for example aligning with the page size of a hard-drive disk.
As its name suggests, a B-tree is a specific type of tree structure, but differs in several ways, including:
Each node contains up to N values, rather than just one
Each node can have up to N+1 children
All elements are stored in-order for a left-to-right traversal
All leaf nodes are on the same level of the tree
Let's look at an example B-tree build using first names from a database:

In this example, N = 2, and all of the value slots in all of the nodes are full.For InnoDB B-tree indexes, N is usually much larger, as it is determined by how many values can fit within an InnoDB page (this defaults to 16k).In addition to this, nodes are often left with some buffer of extra space to allow for faster future inserts.
When you create an index on an InnoDB table, MySQL constructs a B-tree containing the values from the indexed column.The child pointer before any given value points to another node containing values that are less than it, and the pointer after contains ones that are greater.Thus, a left-to-right traversal of this structure should visit all of the data elements in order.
In the general case, the size of a B-tree node can be set to whatever the programmer desires.In InnoDB, index nodes are set to match the InnoDB page size.This is pretty large, so each node can potentially store hundreds of values.
Say we want to search for the name paul in this B-tree.

The steps to accomplish this are:
Compare paul and ian, determine ian < paul.
Check and see that paul < remy, so traverse down middle pointer.
Compare paul to first liam and then omar. It is greater than both, so traverse down third pointer.
Discover paul as the first element in the leaf node.
Doing this is much faster than a linear scan of all elements would have been.However, a search operation is not quite O(1), as multiple comparisons and page loads are typically needed for a single lookup.With only two values per node and a small data set, the benefit of a B-tree compared to a linear search is less apparent.But the benefits are huge for large data sets.InnoDB B-tree indexes rarely need to go beyond, say, 5 levels, even for tables with millions of rows.
There's lots more that could be said about B-trees.However, the important take-away is that B-trees provide ultra-fast lookups for queries that utilize the index.Yet they are not quite a constant-time operation, and require multiple page reads and comparisons.
The InnoDB buffer pool
When using InnoDB, all of your data, including any and all indexes you have on your columns, are stored in pages on disk.As MySQL executes and fulfills queries, it needs to be able to load subsets of this data into RAM for faster access.In MySQL, the region of memory MySQL use for storing InnoDB data is known as the buffer pool, and its size is configurable using the innodb_buffer_pool_size configuration option.As MySQL runs and reads rows and indexes, it grabs pages (16k by default in InnoDB) from disk and puts them into this region of RAM.
Generally, the larger this pool is, the better the performance of your database will be, up to the point where the pool size = the size of your data set.If the size of the data is larger than the space available in the pool, MySQL will eventually have to evict pages from the buffer pool to make room for new ones to be loaded.MySQL uses a modified Least Recently Used (LRU) algorithm for this.
Due to this structure, when MySQL starts performing B-tree index lookups, initially many pages will have to be loaded from disk, a more costly I/O operation.However, as it processes more and more lookups, many of the these pages may end up residing in the buffer pool, meaning each B-tree node visit and comparison executes faster because it's primarily hitting memory rather than disk.This helps performance, but it still requires looking at multiple pages as the B-tree is used to complete a search.
Hash lookups and performance
Another approach to performing fast lookups on large sets of data is to use hashing.With a hashing approach, we store all of the data (in this case, the first names of our users) in a large table.When performing a lookup for the name paul, we run this through a hash function that generates a repeatable output index.This index can be used to do a direct lookup into the hash table.
In the absence of collisions, a hash table lookup is a O(1) operation.Assuming your hash table data is in memory, this is even faster than a lookup into a B-tree index.

Though MySQL broadly supports both BTREE and HASH indexes, the InnoDB engine actually does not support HASH.If you try to create an index with USING HASH on an InnoDB-powered table, MySQL will instead create a B-tree index.
For instance, doing the following works without error:mysql> CREATE INDEX alias_index ON user(username) USING HASH;
Query OK, 0 rows affected, 1 warning (0.79 sec)
Records: 0  Duplicates: 0  Warnings: 1

However, the information_schema tells us that it really is a B-tree index that was created:SELECT table_name, index_name, index_type
    FROM information_schema.statistics
    WHERE table_schema = 'quiz'
    AND table_name = 'user';
+------------+-------------+------------+
| TABLE_NAME | INDEX_NAME  | INDEX_TYPE |
+------------+-------------+------------+
| user       | alias_index | BTREE      |
| user       | PRIMARY     | BTREE      |
+------------+-------------+------------+
2 rows in set (0.00 sec)

It is a bit unfortunate that InnoDB does not support building on-disk, hash based indexes, as this type could be useful and more performant than a B-tree in some instances.However, this lack of support does not mean that hashing is not used at all for index lookups.
The adaptive hash index
Though InnoDB does not support on-disk hash indexes, MySQL has a feature to do in-memory hash lookups for indexing.This is known as the adaptive hash index.This feature can be used to speed up already-fast B-tree lookups, accelerating the performance of the queries that utilize these indexes as a part of their query plan.This acts as a layer that sits between the execution of MySQL and the in-memory buffer pool.

As its name suggests, the adaptive hash index (AHI) is constructed at runtime, and its usage adapts to the characteristics of your workload.If MySQL observes that a particular value is getting repeatedly looked up in a B-tree index, an entry in the AHI can be created either with the full value, or a prefix of the value.For future lookups of this same value (which are likely, since MySQL observed its repeated use) it will use the AHI instead of the B-tree.The keys of the AHI are the values (or value prefixes) of the underlying index.The values are pointers, referring to where the data for this value lives within the InnoDB buffer pool.

The pointers in the adaptive hash index only point to data within the buffer pool.Thus, the buffer pool needs to be sufficiently large for the AHI to kick in.If it is small and there are a lot of evictions taking place, it is not worth using it.Thankfully, MySQL is able to automatically adjust its use of the AHI based on the behavior it observes in the buffer pool.If conditions are not right for its use (few repeated lookups, small buffer pool, etc), MySQL will reduce or eliminate its use.
Though it can speed up queries, there is a bit of overhead to maintaining this special hash index.The feature can be enabled or disabled via the innodb_adaptive_hash_index configuration option.It is typically enabled by default, but if you have a workload that you know will not benefit from it, it can be disabled using innodb_adaptive_hash_index=0 in your configuration file.
Testing performance of the adaptive hash index
Let's run a few tests to see how the adaptive hash index can help a workload.We'll start with something very simple.I'll execute the following query 500k times in two different scenarios: with and without the AHI enabled.SELECT user_id, username, bio FROM user WHERE username = 'willpeace1';

This is executing against a users table with a little over 390 million rows in it.With it disabled, we get the following timing:$ python3 same_query.py 500000 1
starting query load
completed in 35.6 seconds
QPS = 14043.57

This goes fast, because I have a B-tree index already set on the username column.While this workload was running, I executed SHOW ENGINE INNODB STATUS \G;.This command provides information about what is going on with the InnoDB storage engine.You can inspect the INSERT BUFFER AND ADAPTIVE HASH INDEX section to see if and how much the adaptive hash index is being used.-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 11826, free list len 13206, seg size 25033, 8599 merges
merged operations:
 insert 52823, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 276707, node heap has 0 buffer(s)
...
0.00 hash searches/s, 418334.67 non-hash searches/s

The key line to observe here is the last one, which indicates that no hash lookups are occurring.This is expected, since we have disabled the feature.
Now after enabling the adaptive hash index and restarting the server, let's try again:$ python3 same_query.py 500000 1
starting query load
completed in 29.94 seconds
QPS = 16701.1

We achieved about a 16% speed up.This is a small, but still quite useful performance boost.B-tree index lookups are already very fast, but we have layered an additional optimization on top to make it even faster.I grabbed information about the buffer pool while this ran, and this shows that the AHI was being used:-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 11445, free list len 13587, seg size 25033, 1881 merges
merged operations:
 insert 11507, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 276707, node heap has 3 buffer(s)
...
350953.05 hash searches/s, 50985.01 non-hash searches/s

Now, many hash searches are occurring.There are still some non-hash searches happening as well, which makes sense since the query still needs to access the actual data in the row, not just the index value.
Let's try another workload.This time, we'll re-use the same query, except instead of always searching for the same username, on each execution we'll randomly select one username from a pool of one thousand.Though this specific workload is unrealistic, this can be thought of as a database and workload with a large amount of cold data, and a small amount of hot data.
Without the adaptive hash index disabled, we get:$ python3 load.py 500000 1
---
starting query workload
completed in 54.16 seconds
QPS = 9231.62

With it enabled, we instead see:$ python3 load.py 500000 1
---
starting query workload
completed in 43.24 seconds
QPS = 11562.05

In this workload, we got a 20% performance improvement.
These tests were executed on on a table with over 390 million rows:mysql> SELECT count(*) FROM user;

+-----------+
| count(*)  |
+-----------+
| 398748007 |
+-----------+

Even with such a large data set, the B-tree index for the username column was only 4 levels deep (this was checked with the help of innodb_ruby).Running these types of workloads on tables with smaller indexes, and therefore shorter B-tree indexes, may result in less noticeable speedup.On the other hand, workloads using deeper B-tree indexes may see even more performance improvement.
Conclusion
When dealing with large, multi-terabytes databases and workloads with hundreds of thousands of queries per second, even small improvements like this can have a big impact on query latency and database server infrastructure needs.The adaptive hash index provides such improvements for already-fast B-tree index lookups, helping certain types of workloads get a boost in performance.Whether or not the AHI will help your workload depends heavily on data access patterns and the size of your InnoDB buffer pool.]]></content>
        <summary><![CDATA[The adaptive hash index helps to improve performance of the already-fast B-tree lookups]]></summary>
      </entry>
    
      <entry>
        <title>Introducing global replica credentials</title>
        <link href="https://planetscale.com/blog/introducing-global-replica-credentials" />
        <id>https://planetscale.com/blog/introducing-global-replica-credentials</id>
        <published>2024-04-17T16:01:00.000Z</published>
        <updated>2024-04-17T16:01:00.000Z</updated>
        
        <author>
          <name>Matt Robenolt</name>
        </author>
        
        <author>
          <name>Iheanyi Ekechukwu</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[You can now create global replica credentials that will automatically route reads to the nearest read replica from anywhere on the planet.
All PlanetScale clusters are made up of, at minimum, one primary and two replicas in the same region spread across three availability zones. With today's release, it's now much easier to utilize this extra compute by creating a single password dedicated for replica use. Global replica credentials automatically load-balance across the replicas in a region for you.
These new credentials also seamlessly route to the read-only region with the lowest latency without any code changes, even as you add or remove new read-only regions. The underlying architecture, PlanetScale Global Network, also manages this without requiring you to reconnect. This allows you to add a read-only region and start using it immediately.
If you have existing replica credentials for each of your read-only regions, you can now swap them out for a single global replica credential.
Using global replica credentials is as simple as clicking “Connect” from the PlanetScale dashboard, selecting “Replica”, and generating your new credentials. You can also generate credentials from the CLI or API. If you have the CLI installed and you want to try replica credentials out in a MySQL console, you can run pscale shell --replica. See the Replica documentation for more information.

Building PlanetScale Global Network
For the past few years, we have been quietly building and architecting a database connectivity layer structured like a CDN, this layer we call the PlanetScale Global Network.
This layer is responsible for a few things:
Terminating every MySQL connection globally.
Handling our RPC API (to support things like database-js).
Connection pooling at a near infinite scale.
TLS termination.
Routing connections to your database.
At a high level, the first step when connecting to your database is connecting to our Edge network. Where this is like a CDN, is this first hop is nearest to you geographically, or nearest to your application. Whether you are directly in AWS next door, or on the other side of the world, connections begin at our edge network.
Once in our network, we fully terminate the MySQL connection and TLS handshake at the outermost layer, closest to your application. And connection pooling happens here, similar to running an instance of ProxySQL in your datacenter. From this outer layer, connections are able to be multiplexed and tunneled back to your origin database over a small number of long held, encrypted connections that are already warmed up and ready to go.
The net effect of this is TCP and TLS handshake is faster, since there's less network latency and less hops needed. By terminating this closest to you, a handshake can happen faster. By doing this at the MySQL connection layer is what separates us from a traditional CDN or Network Accelerator. Similar to an HTTP proxy, the MySQL connection pooler is able to complete the full MySQL handshake as well closest to you, and keep chatter over a shorter geographic distance.
What's in a Credential?
A Credential to us is broken up into three pieces, the Credential, a Route, and an Endpoint. The Route is shared to every geographic region at our edge, and the Credential remains inside your unique database cluster region. While the Endpoint is the hostname you use to connect to us, typically something like aws.connect.psdb.cloud or gcp.connect.psdb.cloud.
The Route and Credential are stored in etcd which we are able to watch for changes in near realtime and respond to mutations, or deletions as soon as they happen.
The Route
In practice, a Route to us looks like this:message Route {
  string branch = 1;
  repeated string cluster = 2;
  fixed64 expiration = 3;
  ...
}

If you notice, this Route contains no authentication information and is not authoritative for auth, it is effectively a mapping of username to a list of clusters. This list of clusters covers where this database branch is running.
In the case of a normal, single Primary credential, this Route may look like:Route(branch="abc123", cluster=["us-east-1"])

In the case of a Replica credential that has multiple read-only regions, this cluster expands for that list of all of them.Route(branch="abc123", cluster=["us-east-1", "us-west-2"])

Now we have a single Route with multiple options of where we could go.
The Credential
I'm not going to go into a bunch of detail here since a lot of it is internal implementation, but the Credential is effectively the source of truth for authentication and only exists inside the same cluster as your database.
It relatively looks like this:enum TabletType {
  TABLET_TYPE_UNSPECIFIED = 0;
  TABLET_TYPE_PRIMARY = 1;
  TABLET_TYPE_REPLICA = 2;
  TABLET_TYPE_RDONLY = 3;
}

message Credential {
  string branch = 1;
  bytes password_sha256 = 2;
  psdb.data.v1.Role role = 3;
  fixed64 expiration = 4;
  TabletType tablet_type = 5;
}

In here, we contain more information about the Credential, including the Role/ACL assigned at creation, the information needed to verify the password is correct, and an additional TabletType which indicates if this is intended for a primary, replica, or readonly database. This is all automatic and is just an internal representation.
While the Route is just a mapping of username -> cluster, the Credential contains the rest of the information needed to fully connect to the underlying database with the correct ACLs and to which TabletType.
The Endpoint
Now the endpoint is where the first bit of magic happens. As you may have noticed in the product, we surface two different hostname options, a "Direct" and an "Optimized". The "Direct" has the form of {region}.connect.psdb.cloud and the "Optimized" is of the form {provider}.connect.psdb.cloud.
The Direct endpoint is the most straightforward, and represents the Edge node in that region explicitly. You can choose really any region you'd like as the first hop to route through and you'll still get to the correct destination, but we give you the endpoint that is closest to the database, not the endpoint closest to you. But really, you can pick any public endpoint in the same provider if you're clever.
The Optimized endpoint is backed by a latency-based DNS resolver. In AWS, for example, this is their Route53 latency-based routing policy. Which is most of the magic to resolve aws.connect.psdb.cloud to the nearest edge region to you. This means whether you're connecting from your local machine with pscale connect or from the datacenter next to your database, you get routed through the closest region to you, which gives us the CDN effect.
Putting this together
With these three bits, we can put together the story for how we can route to any of your replicas geographically.
Starting with the initial connection pool at our edge, this applies exactly the same to a connection over HTTP and MySQL protocol.
Once a connection is established to us, regardless of where your database is located, your connection is terminated at this same edge node in our network. When a new region is added, the underlying Route is mutated to add the new cluster.
Since we maintain warm connections between all of our regions ready to go, we utilize these to measure latency continuously as a part of regular health checking. So, for example, the us-east-1 edge node is continuously pinging its peers, similar to a mesh network and measuring their latency.
Once a Route is seen over the etcd watcher, before it's accessible to being used, we are able to simply sort the list of clusters based on their latency times we already are tracking. We periodically re-sort every Route if/when latency values change. This keeps the "next hop" decision always clusters[0] in practice. In the event if a hard failure (if for some reason this entire region were down), we could go over to the next option if there were multiple choices.
Ultimately, because the connection is already established with us during all of this, the Route is utilized on a per-query basis, thus without needing to reconnect or anything, we can route you to the lowest latency next hop in realtime.
Similarly, when read-only regions are added and removed, we only need to mutate this Route with a new set of what regions your database is in, and we just maintain a sorted list ready to go.]]></content>
        <summary><![CDATA[With global replica credentials, you can now automatically route reads to the closest replica.]]></summary>
      </entry>
    
      <entry>
        <title>Profiling memory usage in MySQL</title>
        <link href="https://planetscale.com/blog/profiling-memory-usage-in-mysql" />
        <id>https://planetscale.com/blog/profiling-memory-usage-in-mysql</id>
        <published>2024-04-11T00:00:00.000Z</published>
        <updated>2024-04-11T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When considering the performance of any software, there's a classic trade-off between time and space.In the process of assessing the performance of a MySQL query, we often focus on execution time (or query latency) and use it as the primary metric for query performance.This is a good metric to use, as ultimately, we want to get query results as quickly as possible.
I recently released a blog post about how to identify and profile problematic MySQL queries, with a discussion centered around measuring poor performance in terms of execution time and row reads.However, in this discussion, memory consumption was largely ignored.
Though it may not be needed as often, MySQL also has built-in mechanisms for gaining a deep understanding of both how much memory a query is using and also what that memory is being used for.Let's take a deep dive through this functionality and see how we can perform live monitoring of the memory usage of a MySQL connection.
Memory statistics
In MySQL, there are many components of the system that can be individually instrumented.The performance_schema.setup_instruments table lists each of these components, and there are quite a few:SELECT count(*) FROM performance_schema.setup_instruments;

+----------+
| count(*) |
+----------+
| 1255     |
+----------+

Included in this table are a number of instruments that can be used for memory profiling.To see what is available, try selecting from the table and filtering by memory/.SELECT name, documentation
  FROM performance_schema.setup_instruments
  WHERE name LIKE 'memory/%';

You should see several-hundred results.Each of these represent a different category of memory that can be individually instrumented in MySQL.Some of these categories contain a short bit of documentation describing what this memory category represents or is used for.If you'd like to see only memory types that have a non-null documentation value, you can run:SELECT name, documentation
  FROM performance_schema.setup_instruments
  WHERE name LIKE 'memory/%'
  AND documentation IS NOT NULL;

Each of these memory categories can be sampled at several different granularities.The various levels of granularity are stored across several tables:SELECT table_name
  FROM information_schema.tables
  WHERE table_name LIKE '%memory_summary%'
  AND table_schema = 'performance_schema';

+-----------------------------------------+
| TABLE_NAME                              |
+-----------------------------------------+
| memory_summary_by_account_by_event_name |
| memory_summary_by_host_by_event_name    |
| memory_summary_by_thread_by_event_name  |
| memory_summary_by_user_by_event_name    |
| memory_summary_global_by_event_name     |
+-----------------------------------------+

memory_summary_by_account_by_event_name Summarizes memory events based on accounts (An account is a combination of a user and host)
memory_summary_by_host_by_event_name Summarizes memory events at a host granularity
memory_summary_by_thread_by_event_name Summarizes memory events at a MySQL thread granularity
memory_summary_by_user_by_event_name Summarizes memory events at a user granularity
memory_summary_global_by_event_name A global summary of memory statistics
Notice that there is no specific tracking for memory usage at a per-query level.However, this does not mean we cannot profile the memory usage of a query!To accomplish this, we can monitor the usage of memory on whatever connection the query of interest is being executed on.Because of this, we'll focus our use on the memory_summary_by_thread_by_event_name table, as there is a convenient mapping between a MySQL connection and a thread.
Finding usage for a connection
At this point, you should set up two separate connections to your MySQL server on the command line.The first is the one that will execute the query you want to monitor memory usage for.The second will be used for monitoring purposes.
On the first connection, run these queries to get your connection ID and thread ID.SET @cid = (SELECT CONNECTION_ID());
SET @tid = (SELECT thread_id
    FROM performance_schema.threads
    WHERE PROCESSLIST_ID=@cid);

Then grab these values.Of course, yours will likely look different than what you see here.SELECT @cid, @tid;

+------+------+
| @cid | @tid |
+------+------+
|   49 |   89 |
+------+------+

Next up, execute some long-running query you'd like to profile the memory usage for.For this example, I'll do a large SELECT from a table that has 100 million rows in it, which should take awhile since there is no index on the alias column:SELECT alias FROM chat.message ORDER BY alias DESC LIMIT 100000;

Now, while this is executing, switch over to your other console connection and run the following, replacing the thread ID with the one from your connection:SELECT
    event_name,
    current_number_of_bytes_used
FROM performance_schema.memory_summary_by_thread_by_event_name
WHERE thread_id = YOUR_THREAD_ID
ORDER BY current_number_of_bytes_used DESC

You should see results along the lines of this, though the details will depend highly on your query and data:+---------------------------------------+------------------------------+
| event_name                            | current_number_of_bytes_used |
+---------------------------------------+------------------------------+
| memory/sql/Filesort_buffer::sort_keys | 203488                       |
| memory/innodb/memory                  | 169800                       |
| memory/sql/THD::main_mem_root         | 46176                        |
| memory/innodb/ha_innodb               | 35936                        |
...

This indicates the amount of memory for each category being used at the exact moment this query was executed.If you run this query several times while the other SELECT alias... query is executing, you may see differences in the results, as memory usage for a query is not necessarily constant over its whole execution.Each execution of this query represents a sample at a moment of time.Thus, if we want to see how the usage changes over time, we'll need to take many samples.
The documentation for memory/sql/Filesort_buffer::sort_keys is missing from the performance_schema.setup_instruments table.SELECT name, documentation
    FROM performance_schema.setup_instruments
    WHERE name LIKE 'memory%sort_keys';

+---------------------------------------+---------------+
| name                                  | documentation |
+---------------------------------------+---------------+
| memory/sql/Filesort_buffer::sort_keys | <null>        |
+---------------------------------------+---------------+

However, the name indicates that it is memory being used for sorting data from a file.This makes sense as a large part of the expense of this query would be sorting the data so that is can be displayed in descending order.
Collecting usage over time
As a next step, we need to be able to sample this memory usage over time.For short queries this will not be as useful, as we'll only be able to execute this query once, or a small number of times while the profiled query is executing.This will be more useful for longer-running queries, ones that take multiple seconds or minutes.These, would be the types of queries we'd want to profile anyways, as these are the ones likely to use a large portion of memory.
This could be implemented fully in SQL and invoked via a stored procedure.However, in this case, let's use a separate script in Python to provide monitoring.#!/usr/bin/env python3

import time
import MySQLdb
import argparse

MEM_QUERY='''
SELECT event_name, current_number_of_bytes_used
  FROM performance_schema.memory_summary_by_thread_by_event_name
  WHERE thread_id = %s
  ORDER BY current_number_of_bytes_used DESC LIMIT 4
'''

parser = argparse.ArgumentParser()
parser.add_argument('--thread-id', type=int, required=True)
args = parser.parse_args()

dbc = MySQLdb.connect(host='127.0.0.1', user='root', password='password')
c = dbc.cursor()

ms = 0
while(True):
    c.execute(MEM_QUERY, (args.thread_id,))
    results = c.fetchall()
    print(f'\n## Memory usage at time {ms} ##')
    for r in results:
        print(f'{r[0][7:]} -> {round(r[1]/1024,2)}Kb')
    ms+=250
    time.sleep(0.25)

This is a simple, first stab at such a monitoring script.In summary, this code does the following:
Get the provided thread ID to monitor via command line
Set up a connection to a MySQL database
Every 250 milliseconds, execute a query to get the top 4 used memory categories and print a readout
This could be adjusted in many ways depending on your profiling needs.For example, tweaking the frequency of the ping to the server or changing how many memory categories are listed per iteration.Running this while a query is executing provides results like this:...
## Memory usage at time 4250 ##
innodb/row0sel -> 25.22Kb
sql/String::value -> 16.07Kb
sql/user_var_entry -> 0.41Kb
innodb/memory -> 0.23Kb

## Memory usage at time 4500 ##
innodb/row0sel -> 25.22Kb
sql/String::value -> 16.07Kb
sql/user_var_entry -> 0.41Kb
innodb/memory -> 0.23Kb

## Memory usage at time 4750 ##
innodb/row0sel -> 25.22Kb
sql/String::value -> 16.07Kb
sql/user_var_entry -> 0.41Kb
innodb/memory -> 0.23Kb

## Memory usage at time 5000 ##
innodb/row0sel -> 25.22Kb
sql/String::value -> 16.07Kb
sql/user_var_entry -> 0.41Kb
innodb/memory -> 0.23Kb
...

This is great, but there's a few weaknesses.It would be nice to see more than the top 4 memory usage categories, but increasing that numbers increases the size of this already-large output dump.It would also be nice to have an easier way to get a picture of the memory usage at-a-glance via some visualizations.This could be done by having the script dump the results to a CSV or JSON, and then loading them up later in a visualization tool.Even better, we could plot the results we are getting live, as the data is streaming in.This provides a more up-to-date view, and allows us to observe the memory usage live as it is happening, all in one tool.
Plotting memory usage
In order make this tool even more useful and provide visualizations, a few changes are going to be made.
The user will provide connection ID on the command line, and the script will be responsible for finding the underlying thread.
The frequency at which the script requests memory data will be configurable, also via the command line.
The matplotlib library will be used to generate a visualization of the memory usage.This will consist of a stack plot with a legend showing the top memory usage categories, and will retain the past 50 samples.
It's quite a bit of code, but is included here for the sake of completeness.#!/usr/bin/env python3

import matplotlib.pyplot as plt
import numpy as np
import MySQLdb
import argparse

MEM_QUERY='''
SELECT event_name, current_number_of_bytes_used
  FROM performance_schema.memory_summary_by_thread_by_event_name
  WHERE thread_id = %s
  ORDER BY event_name DESC'''

TID_QUERY='''
SELECT  thread_id
  FROM performance_schema.threads
  WHERE PROCESSLIST_ID=%s'''

class MemoryProfiler:

    def __init__(self):
        self.x = []
        self.y = []
        self.mem_labels = ['XXXXXXXXXXXXXXXXXXXXXXX']
        self.ms = 0
        self.color_sequence = ['#ffc59b', '#d4c9fe', '#a9dffe', '#a9ecb8',
                               '#fff1a8', '#fbbfc7', '#fd812d', '#a18bf5',
                               '#47b7f8', '#40d763', '#f2b600', '#ff7082']
        plt.rcParams['axes.xmargin'] = 0
        plt.rcParams['axes.ymargin'] = 0
        plt.rcParams["font.family"] = "inter"

    def update_xy_axis(self, results, frequency):
        self.ms += frequency
        self.x.append(self.ms)
        if (len(self.y) == 0):
            self.y = [[] for x in range(len(results))]
        for i in range(len(results)-1, -1, -1):
            usage = float(results[i][1]) / 1024
            self.y[i].append(usage)
        if (len(self.x) > 50):
            self.x.pop(0)
            for i in range(len(self.y)):
                self.y[i].pop(0)

    def update_labels(self, results):
        total_mem = sum(map(lambda e: e[1], results))
        self.mem_labels.clear()
        for i in range(len(results)-1, -1, -1):
            usage = float(results[i][1]) / 1024
            mem_type = results[i][0]
            # Remove 'memory/' from beginning of name for brevity
            mem_type = mem_type[7:]
            # Only show top memory users in legend
            if (usage < total_mem / 1024 / 50):
                mem_type = '_' + mem_type
            self.mem_labels.insert(0, mem_type)

    def draw_plot(self, plt):
        plt.clf()
        plt.stackplot(self.x, self.y, colors = self.color_sequence)
        plt.legend(labels=self.mem_labels, bbox_to_anchor=(1.04, 1), loc="upper left", borderaxespad=0)
        plt.xlabel("milliseconds since monitor began")
        plt.ylabel("Kilobytes of memory")

    def configure_plot(self, plt):
        plt.ion()
        fig = plt.figure(figsize=(12,5))
        plt.stackplot(self.x, self.y, colors=self.color_sequence)
        plt.legend(labels=self.mem_labels, bbox_to_anchor=(1.04, 1), loc="upper left", borderaxespad=0)
        plt.tight_layout(pad=4)
        return fig

    def start_visualization(self, database_connection, connection_id, frequency):
        c = database_connection.cursor();
        fig = self.configure_plot(plt)
        while(True):
            c.execute(MEM_QUERY, (connection_id,))
            results = c.fetchall()
            self.update_xy_axis(results, frequency)
            self.update_labels(results)
            self.draw_plot(plt)
            fig.canvas.draw_idle()
            fig.canvas.start_event_loop(frequency / 1000)

def get_command_line_args():
    '''
    Process arguments and return argparse object to caller.
    '''
    parser = argparse.ArgumentParser(description='Monitor MySQL query memory for a particular connection.')
    parser.add_argument('--connection-id', type=int, required=True,
                        help='The MySQL connection to monitor memory usage of')
    parser.add_argument('--frequency', type=float, default=500,
                        help='The frequency at which to ping for memory usage update in milliseconds')
    return parser.parse_args()

def get_thread_for_connection_id(database_connection, cid):
    '''
    Get a thread ID corresponding to the connection ID
    PARAMS
      database_connection - Database connection object
      cid - The connection ID to find the thread for
    '''
    c = database_connection.cursor()
    c.execute(TID_QUERY, (cid,))
    result = c.fetchone()
    return int(result[0])

def main():
    args = get_command_line_args()
    database_connection = MySQLdb.connect(host='127.0.0.1', user='root', password='password')
    connection_id = get_thread_for_connection_id(database_connection, args.connection_id)
    m = MemoryProfiler()
    m.start_visualization(database_connection, connection_id, args.frequency)
    connection.close()

if __name__ == "__main__":
    main()


With this, we can do detailed monitoring of executing MySQL queries.To use it, first get the connection ID for the connection you want to profile:SELECT CONNECTION_ID();

Then, executing the following will begin a monitoring session:./monitor.py --connection-id YOUR_CONNECTION_ID --frequency 250

When executing a query on the database, we can observe the increase in memory usage, and see what categories of memory are the largest contributors.

This visualization can also help us to clearly see what kinds of operations are memory hogs.For example, here is a snippet of a memory profile for creating a FULLTEXT index on a large table:

The memory usage is significant, and continues to grow into using hundreds of megabytes as it executes.
For another example of how you can use MySQL to profile memory usage, see check out this DBAMA presentation and the corresponding GitHub repository.
Conclusion
Though it may not be needed as often, having the ability to get detailed memory usage information can be extremely valuable when the need for detailed query optimization arises.Doing this can reveal when and why MySQL may be cause memory pressure on the system, or if a memory upgrade for your database server may be needed.MySQL provides a number of primitives that you can build upon to develop profiling tooling for your queries and workload.]]></content>
        <summary><![CDATA[Learn how to visualize the memory usage of a MySQL connection]]></summary>
      </entry>
    
      <entry>
        <title>Summer 2023: Fuzzing Vitess at PlanetScale</title>
        <link href="https://planetscale.com/blog/summer-2023-fuzzing-vitess-at-planetscale" />
        <id>https://planetscale.com/blog/summer-2023-fuzzing-vitess-at-planetscale</id>
        <published>2024-04-09T00:00:00.000Z</published>
        <updated>2024-04-09T00:00:00.000Z</updated>
        
        <author>
          <name>Arvind Murty</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[My name is Arvind Murty, and from May to July of 2023, I worked on Vitess via an internship with PlanetScale.
I was first introduced to Vitess when I was in high school as a potential open-source project for me to work on. I had been interested in working on one because they’re a relatively easy way to get some real-world experience in large-scale software development. Vitess seemed like an good place to start, so I started contributing, mostly on internal cleanup. I had been doing this on and off until the spring of 2023, when Andrés Taylor approached me about doing an internship under his guidance. Needless to say, I agreed.
My focus at PlanetScale
When I started in mid-May, Andrés gave me my instructions: find as many bugs in the Vitess planner as possible.
We first looked into a tool called SQLancer. From its README:
SQLancer (Synthesized Query Lancer) is a tool to automatically test Database Management Systems (DBMSs) in order to find logic bugs in their implementation. We refer to logic bugs as those bugs that cause the DBMS to fetch an incorrect result set (e.g., by omitting a record).
SQLancer had been very successful at finding bugs in well-established DBMSs, such as SQLite and MySQL, so we thought it might work well for Vitess. But there were three main problems:
Vitess ideally should perfectly mimic MySQL, quirks included. SQLancer on the other hand compares queries to an oracle, which determines if queries are logically correct.
Vitess has the added layer of the VSchema. The VSchema has many added considerations, such as sharding keys, which changes how Vitess plans the query.
It would take a lot of work to properly integrate Vitess with SQLancer, due to each DBMS tester in SQLancer essentially being written completely separately with similar logic.
Vitess planner bug hunting strategy
We decided to go for the low-hanging fruit and build our own random query generator. Which turned out to not be that low-hanging since it yielded a bunch of failing queries. Andrés had already made a quick random query fuzzer that tested queries with aggregation, GROUP BY, ORDER BY, and LIMIT, so I started to build off of it in this PR. From a given set of tables, the fuzzer randomly selects a multiset of the tables, then chooses a random multiset of columns to provide to the clauses (SELECT, GROUP BY, WHERE, etc.) and the random expression generator. Once the query is generated, it’s run on both Vitess and MySQL, and the results and errors are compared. If there is a mismatch, it is reported.
Adding most types of queries was pretty straightforward (for example, for derived tables, generate a query q, then generate another query with q as a table), but there were two functionalities that were more complicated: random expressions and query simplification. Andrés had already built both of these, but for our purposes, they needed to be modified.
The query simplifier is a tool used to automatically simplify queries that produce errors. It uses a brute-force approach, removing or modifying nodes in the AST and checking if the new, simpler query still exhibits the same error. If it does, the simplifier is called on the new query. However, it was not originally intended to be used for end-to-end tests, so we had to figure out how to make it work — specifically, how to supply the VSchema information. After that, I made some minor improvements to the simplifier and refactored it in this PR.
The original random expression generator only generated random literal expressions, so the first step was to add columns. This was fairly simple for tables I knew the schema for, but became more complicated once I added derived tables and wanted to randomly choose columns from them.
The other improvement I made was to add aggregation to the expressions. Because aggregation can only exist in the SELECT statement or the GROUP BY, ORDER BY, and HAVING clauses, I had to make sure the generator only produced aggregations for the statements and clauses in which they are allowed.
Conclusion
The fuzzer can always be improved, and I think the first step that should be taken is complicating or randomizing the schema and VSchema. All of the queries currently run on the widely-used EMP (employee) and DEPT (department) tables using a standard sharding based on EMPNO (employee number) and DEPTNO (department number), respectively. The other main improvement would be to clean up the code; currently, there is a flag testFailingQueries that prevents certain types of queries that were known to fail from being generated. With the query planner being improved since I completed my work on the fuzzer, this flag can either be deleted altogether, or at the very least be removed from many spots.
My experience on Vitess at PlanetScale, while short, was instructive in more ways than one. Not only did I get to make some meaningful contributions, but I also learned how software development as a team works. For those two and a half months I was essentially a temporary member of the query serving team. And while I mainly worked with Andrés, I participated in the daily stand-ups and occasionally worked with the other members, for which I’d like to thank Harshit, Florent, and Manan. And of course thank you to Andrés for spearheading this project and mentoring me along the way.]]></content>
        <summary><![CDATA[My experience working as an intern in the Vitess query serving team for PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>How PlanetScale makes schema changes</title>
        <link href="https://planetscale.com/blog/how-planetscale-makes-schema-changes" />
        <id>https://planetscale.com/blog/how-planetscale-makes-schema-changes</id>
        <published>2024-04-04T09:00:00.000Z</published>
        <updated>2024-04-04T09:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Engineering team velocity is one of our top priorities at PlanetScale, both for our own teams and for all developers using the tools we build.
One of our early goals when building PlanetScale was to make the absolute best schema change process for engineering teams.We've been iterating on this for the past 3 years and are increasingly happy with how easy it is to change our application's schema.
Our main API is a Ruby on Rails application that is connected to two PlanetScale databases.One of these databases contains the majority of our business and user data.The other is a larger, sharded database built to handle the massive scale needed to power PlanetScale Insights.
Our team makes schema changes to these databases nearly every day. We do this through the exact same tooling we've built for our customers.With the addition of our own "GitHub Actions Bot" which automates steps specific to our Rails application.In this post, I'll share our current process for making schema changes at PlanetScale.
Production database schema changes
The majority of Rails applications in the world update their production schema on each deployment.They will run rails db:migrate as part of their CI process immediately before deploying code to production.This process works well for many teams, but tends to suffer from growing pains as both the size of the data and engineering team grows.
There are two primary problems we see develop:
As data size grows, running schema changes (DDL) directly against production becomes increasingly dangerous and time consuming.
Larger teams deploying frequently get blocked by each others schema changes and suffer from higher coordination costs.
The cost of having a painful migration process is high.For teams where these problems are most severe, engineers will start finding ways to avoid making schema changes. By either designing their features differently, or abusingjson columns instead of a proper schema design.
Fixing the process:The solution to this starts with severing the tie between schema migrations and code. Allowing each to go out independently of each other.There are many of benefits to doing this. The largest being: engineers are now forced to think more deeply about how their code and database schema changes interact with each other.
The next step in fixing the process is introducing an online schema change tool.
Online schema changes
"Online" is a term used in database circles to describe a change that does not require downtime.The Rails community has done an excellent job at building tooling and education on how to complete migrations without downtime.Every Rails developer I know has memorized which schema changes may cause locking.There are even gems that guide us into making changes safely.
Even with all this education and tooling, the problem is not fully solved without a solution at the database layer.
Many applications developers are not aware that there is a whole suite of tooling built by "at scale" companies to handle this problem.These are known as online schema change tools.These tools replace rails db:migrate and run the schema changes in a way so that production traffic to the database is not interrupted.The benefit of this is that application developers no longer need to keep track of which schema changes may cause a table to lock.The schema change tooling will make the change in a way that is always safe, mitigating much of the fear around schema changes.
Most large scale companies you can think of are using some variety of online schema change tools for their migrations.
Here at PlanetScale, the team which maintains the online schema change tooling for Vitess is on our staff.This is also built directly into PlanetScale through our safe migrations feature.
Timing application code with schema changes
With each schema change in the database, it is also crucial to consider how the application layer will handle the change.We can have the best schema change tooling in existence, but if we make mistakes with how our application interacts with the database, downtime can still occur.
A common misconception we see among developers is that they think they can atomically deploy both their schema and application code at the same time.This is not possible.For each schema change made, the application needs to be setup to handle both the current and future schema.Without doing so, errors will ensue.
The perfect workflow for each company involves solutions at both the database and application level.
Our schema change workflow
When developers at PlanetScale are making a schema change, the process begins locally.The application code is modified in a git branch, and any corresponding changes to the database schema are applied in a local instance of MySQL.After local development and testing is complete, they then commit their changes and open a pull request on GitHub.
Pull request bot
We've built a "pull request bot" with GitHub Actions that will detect any schema changes that need to be made as a part of a pull request.When it sees changes, it will create a PlanetScale branch, run the migrations, open a deploy request (PlanetScale's method for making a schema change) and comment the result in GitHub.

A few notes about this workflow:
We only create a PlanetScale branch when there are schema changes
Our CI runs against local MySQL for lowest latency
We use the planetscale_rails gem to run the migrations
Each of these steps will vary per application, but the general flow will be similar.
With our bot, we are able to use the PlanetScale API to detect the class of changes being made to the database.The bot then generates comments based on the characteristics of the changes, including instructions for the sequence of steps needed to make the change safely for our application.Each application is different, but they all have similar needs when making schema changes.
When removing a column, application code must be deployed before the schema is changed
When adding a column, application code must be deployed after the schema is changed
For a deeper look at how this pattern works, check out our Backward compatible database changes blog post.
The deploy request
The bot automatically opens a deploy request for us and leaves a comment linking to the change in GitHub. This allows our team to review both the schema change as well asthe code. Giving full context around why and what is being changed.

Before the schema change is even deployed, checks are run through a linter to catch any common mistakes.
The deploy request will make the schema changes using a Vitess online migration. This protects production and also allows us to quickly revert the schema change if wenotice anything going wrong.
When we have multiple team members making schema changes at the same time, PlanetScale will create a queue for each change. This allows each change to be deployed automatically in the orderit was added to the queue. It has safety benefits as well, PlanetScale runs safety checks not only on each schema change, but on the resulting database schemawith all changes combined. This protects against mistakes when multiple people are making changes at once.
How can I do this?
By combining our GitHub Actions bot with PlanetScale deploy requests, we've found our schema change process to be delightful.Developers are able to make changes quickly while also feeling confident that the change they are deploying will not impact production.
The bot ties together the online schema change tooling built into PlanetScale, with the specific needs of our Rails application.With these combined, developers can move quickly and confidently.
Our process to deploy code and schema are purposefully separate, forcing our engineers to think through each step of their change. As well as allowingus to move quickly and never have code changes blocked behind another team members unrelated schema migration.
We've implemented our bot using GitHub Actions, however a similar workflow can be achieved with other CI tools as well.On the PlanetScale side, all of the API calls needed are available via the pscale CLI.
We have published many of the key pieces of our workflow in our GitHub Actions docs page.With this page, you can mix and match the examples to fit the needs of your team and application.
If you have a Rails application, we've published instructions directly in the planetscale_rails readme.]]></content>
        <summary><![CDATA[Learn how PlanetScale uses GitHub Actions and PlanetScale to automate schema changes on our own application.]]></summary>
      </entry>
    
      <entry>
        <title>Identifying and profiling problematic MySQL queries</title>
        <link href="https://planetscale.com/blog/identifying-and-profiling-problematic-mysql-queries" />
        <id>https://planetscale.com/blog/identifying-and-profiling-problematic-mysql-queries</id>
        <published>2024-03-29T00:00:00.000Z</published>
        <updated>2024-03-29T00:00:00.000Z</updated>
        
        <author>
          <name>Ben Dicken</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Though we try our best to avoid it, it's easy to let underperforming queries slip through the cracks in our workloads, negatively impacting the performance of a database system.This is especially true in large-scale database environments, with many gigabytes or terabytes of data, hundreds of tables, and thousands of query patterns being executed on a daily basis.
Thankfully, MySQL has the ability to collect data that can be leveraged for identifying problematic queries, and can also do profiling on them in order to drill into their poor performance.In this article, I'll go over several built-in techniques for how to do this in native MySQL.If you use PlanetScale, this type of information can be gathered more easily and intuitively using the PlanetScale Insights dashboard.I'll include a brief discussion of this feature later on.
For this article, I'll be using the following schema as an example.

A fake workload has been run on this database, and later on you'll see some queries that operate on these tables.
Enable performance_schema
If you have a large application that executes numerous queries on the database, it can be overwhelming to know where to start looking for problems.How do you know which queries are under-performing?One way to determine this would be to execute your queries manually in a shell session and examine the resulting timing.You could also use your web application in a browser to see which page loads and requests appear to be running slowly.However, there are more principled ways to identify these queries.
You can use information from the tables in performance_schema to help us identify these.To use this, first switch over to this database.USE performance_schema;

You also should check to ensure that performance_schema is enabled.It should be enabled by default, but it can be manually turned off or disabled if the host MySQL is running on has insufficient memory, as all of the information it tracks is stored in an in-memory PERFORMANCE_SCHEMA storage engine.If it is enabled, you should see the following:SHOW VARIABLES LIKE 'performance_schema';

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| performance_schema | ON    |
+--------------------+-------+

Identifying slow queries
The next step is to figure out what queries to focus our efforts on fixing.There are a few techniques you can use to identify these using the tables in performance_schema and sys.There are a LOT of tables in the performance_schema database:SELECT COUNT(*)
    FROM information_schema.tables
    WHERE table_schema = 'performance_schema';

+----------+
| COUNT(*) |
+----------+
| 113      |
+----------+

(Your number may be different depending on your version of MySQL).
A good place to start is looking at the information provided by the events_statements_summary_by_digest table.Let's take a look at one row for starters, the one with the highest average execution count:SELECT * FROM performance_schema.events_statements_summary_by_digest
    WHERE schema_name = 'game'
    ORDER BY avg_timer_wait
    DESC LIMIT 1 \G;

***************************[ 1. row ]***************************
SCHEMA_NAME                 | game
DIGEST                      | 8ee55a42fea07da3d5e786413a6c1bd...
DIGEST_TEXT                 | SELECT `p1` . `username` , `m` ...
COUNT_STAR                  | 2486
SUM_TIMER_WAIT              | 425862156000000
MIN_TIMER_WAIT              | 149411000000
AVG_TIMER_WAIT              | 171304165000
MAX_TIMER_WAIT              | 380467000000
SUM_LOCK_TIME               | 1306000000
...

There are quite a few columns in each row.However, the columns such as SUM_TIMER_WAIT and AVG_TIMER_WAIT are good to look at if you want to see your most expensive queries.The times shown here are given in picoseconds, and thus need to be divided by 1 trillion to convert to seconds.In this case, the AVG_TIMER_WAIT (the average time for this query) is .17 seconds, and this query was only executed 7580 times (see the COUNT_STAR value in the result set above).
There are also a bunch of tables in the sys database that keep useful stats for identifying slow queries.You can check out the info in tables like statements_with_sorting, statements_with_runtimes_in_95th_percentile, and statements_with_full_table_scans.As an example of this, let's examine the queries with the highest average number of rows examined from the statements_with_runtimes_in_95th_percentile:SELECT substring(query,1,50), avg_latency, rows_examined_avg
  FROM sys.statements_with_runtimes_in_95th_percentile
  ORDER BY rows_examined_avg
  DESC LIMIT 10;

+----------------------------------------------------+-------------+-------------------+
| substring(query,1,50)                              | avg_latency | rows_examined_avg |
+----------------------------------------------------+-------------+-------------------+
| SELECT `alias` FROM `chat` . `message` LIMIT ?     | 2.13 s      | 10000000          |
| SELECT `alias` FROM `message` LIMIT ?              | 881.20 ms   | 2500004           |
| EXPLAIN ANALYZE SELECT `p1` .  ... ` WHERE `m` . ` | 223.97 ms   | 1081990           |
| SELECT `p1` . `username` , `m` ... ` WHERE `m` . ` | 57.75 ms    | 421446            |
| SELECT * FROM `message` LIMIT ?                    | 70.52 ms    | 125125            |
| EXPLAIN ANALYZE SELECT `id` FR ... E `created_at`  | 56.55 ms    | 100000            |
| SELECT `id` FROM `message` WHERE `created_at` BETW | 58.28 ms    | 100000            |
| SELECT SUBSTRING ( QUERY , ?,  ... s_examined_avg` | 76.72 ms    | 1498              |
| SELECT SUBSTRING ( QUERY , ?,  ... `rows_examined` | 75.59 ms    | 1495              |
| SELECT SUBSTRING ( QUERY , ?,  ... ned` AS DECIMAL | 78.97 ms    | 1494              |
+----------------------------------------------------+-------------+-------------------+

For example, you may also be interested in taking a look at things from the perspective of a table instead of a query.You can look at stats on a per-table basis for how many row reads are being served via indexes versus without.Say you have a chat database with a message table that you know is heavily used.You can find these stats using the table_io_waits_summary_by_index_usage table.SELECT `OBJECT_SCHEMA`, `OBJECT_NAME`, `INDEX_NAME`, `COUNT_STAR`
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE object_schema = 'game' AND object_name = 'message';

+---------------+-------------+------------+------------+
| OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | COUNT_STAR |
+---------------+-------------+------------+------------+
| game          | message     | PRIMARY    | 0          |
| game          | message     | to_id      | 0          |
| game          | message     | from_id    | 164500     |
| game          | message     | <null>     | 2574002473 |
+---------------+-------------+------------+------------+

This indicated that there have been over 2 billion row reads that were not fulfilled by an index (the row where INDEX_NAME is <null>).Either this table needs one or more indexes added to it, or the queries using this table need to be updated, or both!
There's also a bunch of cool stats you can look at over in the sys table.Here's one quick example.You can grab data on how many full table scans are being executed by each query using the sys.statements_with_full_table_scans table.USE sys;
SELECT query, db, exec_count, total_latency
    FROM sys.statements_with_full_table_scans
    ORDER BY exec_count DESC
    LIMIT 5;

+-------------------------------------------------------------------+------+------------+---------------+
| query                                                             | db   | exec_count | total_latency |
+-------------------------------------------------------------------+------+------------+---------------+
| SELECT `class` , `size` FROM `spaceship` WHERE `size` > ?         | game | 8422       | 26.65 s       |
| SELECT `p1` . `username` , `m` ... ` WHERE `m` . `created_at` > ? | game | 6742       | 6.45 min      |
| SELECT * FROM `earned_achievem ... id` AND `ea` . `player_id` > ? | game | 6718       | 969.17 ms     |
| SELECT * FROM `item` WHERE NAME LIKE ? LIMIT ?                    | game | 5676       | 969.43 ms     |
| SELECT NAME , `size` FROM `planet` WHERE `population` > ?         | game | 5625       | 173.02 ms     |
+-------------------------------------------------------------------+------+------------+---------------+

This shows which queries are triggering full table scans.In this case, there's a ton, but this would be expected since there are no indexes on this example database other than the default ones on the primary keys.
For more information about how to work with the performance_schema and sys tables, check out this video.
Inspecting with EXPLAIN
By this point, you've hopefully gathered a collection of queries that need to be further inspected.The next step is to do some root-cause analysis into why these queries are taking a long time and reading too many of rows.Of course, one great way to drill into the behavior of a query is with EXPLAIN or EXPLAIN ANALYZE.In this example, I'm going to use EXPLAIN ANALYZE.
For example, running the following query:EXPLAIN ANALYZE SELECT p1.username, m.to_id, p2.username, m.from_id
    FROM message m
    LEFT JOIN player p1 ON m.to_id = p1.id
    LEFT JOIN player p2 ON m.from_id = p2.id
    WHERE m.created_at > '2020-10-10 00:00:00';

Gives detailed information regarding the query plan and costs of the various steps.+--------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop left join  (cost=333272 rows=332036) (actual time=0.515..320 rows=345454 loops=1)                                           |
|     -> Nested loop left join  (cost=217059 rows=332036) (actual time=0.475..240 rows=345454 loops=1)                                       |
|         -> Filter: (m.created_at > TIMESTAMP'2020-10-10 00:00:00')  (cost=100846 rows=332036) (actual time=0.153..188 rows=345454 loops=1) |
|             -> Table scan on m  (cost=100846 rows=996208) (actual time=0.148..159 rows=1e+6 loops=1)                                       |
|         -> Single-row index lookup on p1 using PRIMARY (id=m.to_id)  (cost=0.25 rows=1) (actual time=54.5e-6..70.9e-6 rows=1 loops=345454) |
|     -> Single-row index lookup on p2 using PRIMARY (id=m.from_id)  (cost=0.25 rows=1) (actual time=137e-6..153e-6 rows=1 loops=345454)     |
+--------------------------------------------------------------------------------------------------------------------------------------------+

There are a number of things to look at when inspecting this output.Generally, if you see table scans over large tables, this is something you should try to mitigate with an index.More broadly, any time you see a node with a high cost or rows value, this probably deserves further tuning.Perhaps the query can be re-written, or perhaps one or more indexes can help improve performance.
For more information about EXPLAIN, see our MySQL for Developers course or our How to read MySQL EXPLAINS blog post.
Preparing to instrument a query
Explain is a great tool, but you can also profile a query to determine how much time it spends in each stage of execution.To do this, first ensure that the proper instruments and consumers are enabled so that information can be gathered appropriately.In order to do this, run the following:UPDATE performance_schema.setup_instruments
    SET ENABLED = 'YES', TIMED = 'YES';
UPDATE performance_schema.setup_consumers
    SET ENABLED = 'YES', TIMES = 'YES';

If you'd like, you can be more selective at this step.Rather than enabling this setting for all of instruments and consumers, you can enable subsets of the rows in these two tables.To enable it for the stages of query execution, you could run the following.UPDATE performance_schema.setup_instruments
  SET ENABLED = 'NO', TIMED = 'NO';
UPDATE performance_schema.setup_instruments
  SET ENABLED = 'YES', TIMED = 'YES'
  WHERE NAME LIKE '%stage/%';

Ultimately, it's up to you based on what level of comfort you have with possible performance hits of profiling.If you're not very concerned with the profiling overhead, just turn everything on and then disable later when you're finished.
Next, ensure that history tracking is enabled.If you have not configured it before, you'll probably see the following when you look at the setup_actors configuration:SELECT * FROM performance_schema.setup_actors;
+------+------+------+---------+---------+
| HOST | USER | ROLE | ENABLED | HISTORY |
+------+------+------+---------+---------+
| %    | %    | %    | YES     | YES     |
+------+------+------+---------+---------+

If you want you can leave these settings as-is.However, this means that performance schema and history tracking will be enabled for all users.This could have a (small) adverse effect on the overall performance of your system.You can optionally enable it only for one specific user (this would be the user you need to run your test queries on).If you choose to go this route, turn it off globally:UPDATE performance_schema.setup_actors
  SET ENABLED = 'NO', HISTORY = 'NO'
  WHERE HOST = '%' AND USER = '%';

And then enable it only for the user(s) that you want to track for:INSERT INTO performance_schema.setup_actors
  (HOST, USER, ROLE, ENABLED, HISTORY)
  VALUES ('your_host', 'your_user', '%', 'YES', 'YES');

Profiling the query
Now, let's profile one of our problematic query.First, get the ID of the connection you are going to run the query on.SET @connection_thread = (
  SELECT thread_id
    FROM performance_schema.threads
    WHERE PROCESSLIST_ID = CONNECTION_ID() );

Next, execute the query you want to profile.Immediately after doing this, run the following query to determine the starting event ID of that execution:SELECT thread_id, statement_id, SUBSTRING(sql_text,1,50)
  FROM performance_schema.events_statements_history_long
  WHERE thread_id = @connection_thread
  ORDER BY event_id DESC LIMIT 20;

Find the statement you want to profile and put its ID into a variable.SET @statement_id = ?;

Finally, run the following to see the profiling information for that query execution.SET @eid = (SELECT event_id FROM performance_schema.events_statements_history_long WHERE statement_id = @statement_id);
SET @eeid = (SELECT end_event_id FROM performance_schema.events_statements_history_long WHERE statement_id = @statement_id);
SELECT event_name, source, (timer_end-timer_start)/1000000000 as 'milliseconds'
  FROM performance_schema.events_stages_history_long
  WHERE event_id BETWEEN @eid AND @eeid;

This should provide a timing breakdown that looks something like this:+------------------------------------------------+----------------------------------+--------------+
| event_name                                     | source                           | milliseconds |
+------------------------------------------------+----------------------------------+--------------+
| stage/sql/starting                             | init_net_server_extension.cc:110 | 0.2400       |
| stage/sql/Executing hook on transaction begin. | rpl_handler.cc:1481              | 0.0010       |
| stage/sql/starting                             | rpl_handler.cc:1483              | 0.0110       |
| stage/sql/checking permissions                 | sql_authorization.cc:2169        | 0.0020       |
| stage/sql/checking permissions                 | sql_authorization.cc:2169        | 0.0000       |
| stage/sql/checking permissions                 | sql_authorization.cc:2169        | 0.0030       |
| stage/sql/Opening tables                       | sql_base.cc:5859                 | 0.0950       |
| stage/sql/init                                 | sql_select.cc:759                | 0.0050       |
| stage/sql/System lock                          | lock.cc:331                      | 0.0110       |
| stage/sql/optimizing                           | sql_optimizer.cc:355             | 0.0270       |
| stage/sql/statistics                           | sql_optimizer.cc:699             | 0.1040       |
| stage/sql/preparing                            | sql_optimizer.cc:783             | 0.0590       |
| stage/sql/executing                            | sql_union.cc:1676                | 735.3020     |
| stage/sql/end                                  | sql_select.cc:795                | 0.0010       |
| stage/sql/query end                            | sql_parse.cc:4805                | 0.0020       |
| stage/sql/waiting for handler commit           | handler.cc:1636                  | 0.0060       |
| stage/sql/closing tables                       | sql_parse.cc:4869                | 0.0080       |
| stage/sql/freeing items                        | sql_parse.cc:5343                | 0.2810       |
| stage/sql/cleaning up                          | sql_parse.cc:2387                | 0.0010       |
+------------------------------------------------+----------------------------------+--------------+

In this case, you can see that the query spent the majority of the time in the execution stage.However, this would also reveal if the query had spent a lot of time waiting on a lock or on optimizing, in which case you could dig further into the problem.
PlanetScale Insights
As we've seen, MySQL provides a lot of capability to drill into problematic queries in your workload, and what we've discussed here is really only scratching the surface.However, it should also be clear that gleaning this information can be tedious.Getting exactly what you want requires significant poking around and digging through tables in performance_schema and sys.
Many of these same observations can be gathered much easier using PlanetScale Insights.Insights provides a plethora of useful visualizations to help you get an overview of the performance of your database and automatic detection of anomalous behavior.

You can also use it to drill in on specific queries.For example, you can look at all queries executed over some window of time, and sort by different statistics such as rows read.This can help you quickly identify slow queries and ones where adding an index might be worthwhile.

Insights allows you to gain a deep understanding of your workload.This gives you more time to focus on improving your queries, developing software, and working efficiently.]]></content>
        <summary><![CDATA[MySQL has built-in functionality for collecting statistics on and profiling your MySQL queries. Learn how to leverage these features to identify problems.]]></summary>
      </entry>
    
      <entry>
        <title>The Problem with Using a UUID Primary Key in MySQL</title>
        <link href="https://planetscale.com/blog/the-problem-with-using-a-uuid-primary-key-in-mysql" />
        <id>https://planetscale.com/blog/the-problem-with-using-a-uuid-primary-key-in-mysql</id>
        <published>2024-03-19T09:00:00.000Z</published>
        <updated>2024-03-19T09:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Universally Unique Identifiers, also known as UUIDs, are designed to allow developers to generate unique IDs in a way that guarantees uniqueness without knowledge of other systems. These are especially useful in a distributed architecture, where you have a number of systems and databases responsible for creating records. You might think that using UUIDs as a primary key in a database is a great idea, but when used incorrectly, they can drastically hurt database performance.
In this article, you'll learn about the downsides of using UUIDs as a primary key in your MySQL database.
The many versions of UUIDs
At the time of this writing, there are five official versions of UUIDs and three proposed versions. Let's take a look at each version to better understand how they work.
UUIDv1
A UUID version 1 is known as a time-based UUID and can be broken down as follows:

While much of modern computing uses the UNIX epoch time (Jan 1, 1970) as its base, UUIDs actually use a different date of Oct 10, 1568, which is the date that the Gregorian calendar started to be more widely used. The embedded timestamp within a UUID grows in 100 nanoseconds increments from this date, which is then used to set the time_low, time_mid, and time_hi segments of the UUID.
The third segment of the UUID contains the version as well as time_hi and occupies the first character of that segment. This is true for all versions of UUIDs as shown in subsequent examples. The reserved portion is also known as the variant of the UUID, which determines how the bits within the UUID are used. Finally, the last segment of the UUID is the node, which is the unique address of the system generating the UUID.
UUIDv2
Version 2 of the UUID implemented a change compared to version 1, where the low_time segment of the structure was replaced with a POSIX user ID. The theory was that these UUIDs could be traced back to the user account that generated them. Since the low_time segment is where much of the variability of UUIDs reside, replacing this segment increases the chance of collision. As a result, this version of the UUID is rarely used.
UUIDv3 and v5
Versions 3 and 5 of UUIDs are very similar. The goal of these versions is to allow UUIDs to be generated in a deterministic way so that, given the same information, the same UUID can be generated. These implementations use two pieces of information: a namespace (which itself is a UUID) and a name. These values are run through a hashing algorithm to generate a 128-bit value that can be represented as a UUID.
The key difference between these versions is that version 3 uses an MD5 hashing algorithm, and version 5 uses SHA1.
UUIDv4
Version 4 is known as the random variant because, as the name implies, the value of the UUID is almost entirely random. The exception to this is the first position in the third segment of the UUID, which will always be 4 to signify the version used.

UUIDv6
Version 6 is nearly identical to Version 1. The only difference is that the bits used to capture the timestamp are flipped, meaning the most significant portions of the timestamp are stored first. The graphic below demonstrates these differences.

The main reason for this is to create a value that is compatible with Version 1 while allowing these values to be more sortable since the most significant portion of the timestamp is upfront.
UUIDv7
Version 7 is also a time-based UUID variant, but it integrates the more commonly used Unix Epoch timestamp instead of the Gregorian calendar date used by Version 1. The other key difference is that the node (the value based on the system generating the UUID) is replaced with randomness, making these UUIDs less trackable back to their source.
UUIDv8
Version 8 is the latest version that permits vendor-specific implementations while adhering to RFC standards. The only requirement for UUIDv8 is that the version be specified in the first position of the third segment as all other versions.
UUIDs and MySQL
Using UUIDs (mostly) guarantees uniqueness across all systems in your architecture, so you might be inclined to use them as primary keys for your records. Be aware that there are several tradeoffs to doing so when compared to an auto-incrementing integer.
Insert performance
Whenever a new record is inserted into a table in MySQL, the index associated with the primary key needs to be updated so querying the table is performant. Indexes in MySQL take the form of a B+ Tree, which is a multi-layered data structure that allows queries to quickly find the data they need.
The following diagram demonstrates what a relatively simple version of this structure looks like with six entries with values from 1 to 6. If a query comes asking for 5, MySQL will start at the root node and know from there that it has to traverse down the right side of the tree to find what it's looking for.
For simplicity, these diagrams display a B-Tree instead of a B+ Tree. The key difference is that in a B+Tree, the leaf nodes contain a reference to the actual data, while in a B-Tree, the leaf nodes do not.

If values 7-9 are added, MySQL will split the right node and rebalance the tree.

This process is known as page splitting, and the goal is to keep the B+ Tree structure balanced so that MySQL can quickly find the data it's looking for. With sequential values, this process is relatively straightforward; however, when randomness is introduced into the algorithm, it can take significantly longer for MySQL to rebalance the tree. On a high-volume database, this can hurt user experience as MySQL tries to keep the tree in balance.
For more information about how B+ Trees work, we have a dedicated video in our MySQL for Developers course.
Higher storage utilization
All primary keys in MySQL are indexed. By default, an auto-incrementing integer will consume 32 bits of storage per value. Compare this with UUIDs. If stored in a compact binary form, a single UUID would consume 128 bits on disk. Already, that is 4x the consumption of a 32-bit integer. If instead you choose to use a more human readable string-based representation, each UUID could be stored as a CHAR(36), consuming a whopping 288 bits per UUID. This means that each record would store 9 times more data than the 32-bit integer.
In addition to the default index created on the primary key, secondary indexes will also consume more space. This is because secondary indexes use the primary key as a pointer to the actual row, meaning they need to be stored with the index. This can lead to a significant increase in storage requirements for your database depending on how many indexes are created on tables using UUIDs as the primary key.
Finally, page splitting (as described in the previous section) can also negatively impact storage utilization as well as performance. InnoDB assumes that the primary key will increment predictably, either numerically or lexicographically. If true, InnoDB will fill the pages to about 94% of the page size before creating a new page. When the primary key is random, the amount of space utilized from each page can be as low as 50%. Due to this, using UUIDs that incorporate randomness can lead to excessive use of pages to store the index.
Best ways to use a UUID primary key with MySQL
If you absolutely need to use UUIDs as the unique identifier for records in your table, there are a few best practices you can follow to minimize the negative side effects of doing so.
Use the binary data type
While UUIDs are often sometimes as 36-character strings, they can also be represented in their native binary format as well. If converted to a binary value, you can store it in a BINARY(16) column, which reduces the storage requirements per value down to 16 bytes. This is still quite a bit larger than a 32-bit integer, but is certainly better than storing the UUID as a CHAR(36).create table uuids(
  UUIDAsChar char(36) not null,
  UUIDAsBinary binary(16) not null
);

insert into uuids set
  UUIDAsChar = 'd211ca18-d389-11ee-a506-0242ac120002',
  UUIDAsBinary = UUID_TO_BIN('d211ca18-d389-11ee-a506-0242ac120002');

select * from uuids;
-- +--------------------------------------+------------------------------------+
-- | UUIDAsChar                           | UUIDAsBinary                       |
-- +--------------------------------------+------------------------------------+
-- | d211ca18-d389-11ee-a506-0242ac120002 | 0xD211CA18D38911EEA5060242AC120002 |
-- +--------------------------------------+------------------------------------+

Use an ordered UUID variant
Using a UUID version that supports ordering can mitigate some of the performance and storage impacts of using UUIDs by making the generated values more sequential which avoids some of the page splitting issues described earlier. Even when they are being generated on multiple systems, time-based UUIDs such as version 6 or 7 can guarantee uniqueness while keeping values as close to sequential as possible. The exception to this is UUIDv1, which has the least significant portion of the timestamp first.
Use the built-in MySQL UUID functions
MySQL supports generating UUIDs directly within SQL; however, it only supports UUIDv1 values. While it is not a great practice to use them by themselves, there is a helper function in MySQL called uuid_to_bin. Not only does this function convert the string value to binary, but you can use the option 'swap flag', which will reorder the timestamp portion to make the resulting binary more sequential.set @uuidvar = 'd211ca18-d389-11ee-a506-0242ac120002';
-- Without swap flag
SELECT HEX(UUID_TO_BIN(@uuidvar)) as UUIDAsHex;
-- +----------------------------------+
-- | UUIDAsHex                        |
-- +----------------------------------+
-- | D211CA18D38911EEA5060242AC120002 |
-- +----------------------------------+

-- With swap flag
SELECT HEX(UUID_TO_BIN(@uuidvar,1)) as UUIDAsHex;
-- +----------------------------------+
-- | UUIDAsHex                        |
-- +----------------------------------+
-- | 11EED389D211CA18A5060242AC120002 |
-- +----------------------------------+

Use an alternate ID type
UUIDs are not the only type of identifier that provides uniqueness within a distributed architecture. Considering they were first created in 1987, there has been plenty of time for other professionals to propose different formats such as Snowflake IDs, ULIDs, or even NanoIDs (which we use at PlanetScale).# Snowflake ID
7167350074945572864

# ULID
01HQF2QXSW5EFKRC2YYCEXZK0N

# NanoID
kw2c0khavhql

Conclusion
Using a UUID primary key in MySQL can (nearly) guarantee uniqueness in a distributed system; however, it comes with several tradeoffs. Luckily, with the many versions available and several alternatives, you have options that can better address some of these tradeoffs. After reading this article, you should be in a better position to make an informed decision about the ID type you choose when architecting your next database.]]></content>
        <summary><![CDATA[Understand the different versions of UUIDs and why using them as a primary key in MySQL can hurt database performance.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 19</title>
        <link href="https://planetscale.com/blog/announcing-vitess-19" />
        <id>https://planetscale.com/blog/announcing-vitess-19</id>
        <published>2024-03-08T09:01:00.000Z</published>
        <updated>2024-03-08T09:01:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We're thrilled to announce the release of Vitess 19, the latest version packed with enhancements aimed at improving the scalability, performance, and usability of your database systems. With this release, the Vitess team continues our commitment to providing a powerful, scalable, and reliable database clustering solution for MySQL.
What's new in Vitess 19
Dropping Support for MySQL 5.7: As Oracle marked MySQL 5.7 end of life in October 2023, we're also moving forward by dropping support for MySQL 5.7. We advise users to upgrade to MySQL 8.0 while on Vitess 18 before making the jump to Vitess 19. However, Vitess 19 will still support importing from MySQL 5.7.
Deprecations: We're cleaning house to streamline our offerings and improve maintainability. This includes deprecating several VTTablet flags, MySQL-specific tags of the Docker image vitess/lite, and changes to the EXPLAIN statement format.
Breaking Changes: Notably, ExecuteFetchAsDBA now rejects multi-statement SQL, enforcing stricter security and stability practices.
New Metrics: We're introducing new metrics for stream consolidations and adding the build version to /debug/vars to provide deeper insights and traceability.
Enhanced Query Compatibility: This release brings support for multi-table delete operations, a new SHOW VSCHEMA KEYSPACES query, and several other SQL syntax enhancements that broaden Vitess's compatibility with MySQL.
Apply VSchema Enhancements: We've added a --strict sub-flag and corresponding gRPC field to the ApplyVSchema command, ensuring that only known parameters are used in Vindexes, enhancing error checking and config validation.
Tablet Throttler: Throttlers now communicate via gRPC only. HTTP communication is no longer used. This closes a possible vulnerability vector.
Online DDL: Support for backoff for cut-over attempts in the face of locking. Support for forced cut-over.
Incremental Backup: Support for backup names and empty backups.
Table lifecycle: Quicker cleanup flow.
Performance improvements: Including a new connection pool for the Tablets, faster hashing in sharded Vitess clusters, and faster aggregations in the Gates.
New and updated features
Let's take a closer look at some of the key features.
Query compatibility enhancements
Vitess 19 introduces several SQL syntax improvements and compatibility features, including:
Support for AVG() aggregation function on sharded keyspaces, utilizing a combination of SUM and COUNT.
Non-recursive Common Table Expressions (CTEs) support, allowing for more complex query constructions.
Tablet throttler
Inter-throttler communication is now solely based on gRPC. HTTP communication is no longer supported.
Online DDL
Vitess migration cut-over now uses back-off in the face of table locks. If unable to cut-over, subsequent attempts take place at increasing intervals. This reduces the impact on an already overloaded production system.
Online DDL also supports forced cut-over, at either predetermined timeout or on demand. Forced cut-over prioritizes the completion of cut-over operations over production traffic and terminates queries and transactions that conflict with the cut-over.
See this PR for more information.
Incremental backup
The flag Backup|BackupShard –incremental-from-pos now accepts a backup name as the backup starting point.
An empty incremental backup is now allowed, and the Backup|BackupShard command returns with a success error code, even though no backup manifest or other artifacts are created.
Table lifecycle
The table GC mechanism is now more responsive to tables that need to be garbage collected and can observe operations that generate GC tables. For example, it can capture the result of an ALTER VITESS_MIGRATION … CLEANUP command and move the table through the relevant stages within seconds rather than taking several minutes or hours.
Breaking change: ExecuteFetchAsDBA
The command ExecuteFetchAsDBA now rejects multi-statement input. Previously, multi-statement input was implicitly allowed but resulted in undefined and undesired behavior: errors were only reported for the first statement and silently dropped for successive statements. The connection was left in an undefined state and could leak results to subsequent users of the connection pool. The schema tracker would not be notified of changes until the connection was closed. We will introduce formal multi-statement support in a future version.
Performance improvements
Following the trend over the past 3 years, this new Vitess release is faster than the previous one in all the benchmarks we track in Arewefastyet. We've fixed several performance regressions from Vitess 18 and introduced significant performance improvements.
New connection pool
The connection pool for MySQL connections in the Tablets has been rewritten from scratch. The new pool is architected over several lock-free stacks and provides significantly lower query latencies, lower and more fair wait times, and more efficient usage of idle connections. This is particularly noticeable in Vitess clusters with external tablets (i.e., clusters where the Tablet and the MySQL instance are deployed on different hosts) and busy Vitess clusters with many point queries.
Faster hashing in sharded Vitess clusters
The VIndex hasher for textual columns was previously implemented using the x/text/collate package, which allocates a linear amount of memory based on the length of the column being hashed. We've replaced it with a custom, backwards-compatible implementation that is both faster and uses a constant amount of memory. This is a very significant performance improvement for sharded tables that use large textual columns as sharding keys.
Faster comparisons in cross-shard aggregations
The performance of cross-shard aggregations that use ORDER or GROUP BY qualifiers has been greatly improved by introducing Tiny Weights. The query executor in the VTGates now tags all the SQL values from the upstream shards with a compressed form of their weight string, allowing constant-time comparisons while performing aggregations.
A call to the community
We're excited to see how you'll use Vitess 19 to scale your database systems. As always, we're eager to hear your feedback and experiences. Join us on our GitHub or Slack channel to share your stories, ask questions, and connect with the Vitess community.
Getting started
Upgrading to Vitess 19 is straightforward, but we recommend reviewing the detailed release notes for a smooth transition. Check out our documentation for comprehensive guides and tips.
Thank you for your continued support and contributions to the Vitess project. Here's to making database scaling even easier and more efficient with Vitess 19!
The Vitess Team]]></content>
        <summary><![CDATA[Vitess 19 is now generally available.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale forever</title>
        <link href="https://planetscale.com/blog/planetscale-forever" />
        <id>https://planetscale.com/blog/planetscale-forever</id>
        <published>2024-03-06T09:00:00.000Z</published>
        <updated>2024-03-06T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[PlanetScale is an infrastructure company. Our service is mission critical, and we value reliability above all else. Reliability isn’t just an uptime percentage on your status page. It means your business is self-sustaining. Every unprofitable company has a date in the future where it could disappear. With an ever changing world and economy, this is a situation fraught with risk. We’ve chosen to build a company that can last forever. This is why I have made the decision to prioritize profitability for PlanetScale.
As of today, PlanetScale can project profitability after the following key decisions: to part ways with members of our team (primarily Sales and Marketing) and sunsetting our Hobby plan. I would like to express my extreme gratitude for the people we are parting ways with today.
Our Hobby plan will be retired on April 8th, 2024. You can find more information about the plan deprecation, such as exporting your data or upgrading, in our Hobby plan FAQ documentation.
I understand completely that retiring our free tier will cause inconvenience to some of our users. I hope people understand that we take our mission incredibly seriously, and this decision was not made lightly. If this puts you in a difficult situation, please email support@planetscale.com and we will see what we can do to help you.
PlanetScale is the main database for companies totaling more than $50B in market cap. Whether you count calories, paid for your morning coffee, sent a work message, or bought a new pair of shoes, it is almost certain that you have interacted with our technology today. PlanetScale has been recognized by Deloitte as one of the fastest growing tech companies in the US.
Our reliability, scalability, and customer service speaks for itself. The database space is full of unnecessary hype. We are opting out. Our technology is proven among some of the world’s largest tech companies. We do not need to give away endless amounts of free resources in order to keep growing.
To put it briefly, PlanetScale's commitment to reliability and sustainability drives every decision we make. Prioritizing profitability ensures that we can continue providing mission-critical services to our users well into the future.]]></content>
        <summary><![CDATA[PlanetScale is committed to providing a reliable and sustainable platform for our customers, not just in the short-term, but forever. For this reason, we are prioritizing profitability.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing schema recommendations</title>
        <link href="https://planetscale.com/blog/introducing-schema-recommendations" />
        <id>https://planetscale.com/blog/introducing-schema-recommendations</id>
        <published>2024-02-28T12:00:00.000Z</published>
        <updated>2024-02-28T12:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[For the last two years, we’ve been working on making PlanetScale Insights the best built-in MySQL database monitoring tool. Today, we’re releasing a significant upgrade: Schema recommendations.
With schema recommendations, you will automatically receive recommendations to improve database performance, reduce memory and storage, and improve your schema based on production database traffic.
Schema recommendations uses query-level telemetry to generate tailored recommendations in the form of DDL statements that can be applied directly to a database branch and then deployed to production.
How to use schema recommendations
To find the schema recommendations for your database, go to the “Insights” tab in your PlanetScale database and click “View recommendations.” You will see the current open recommendations for your database.
Also, if you are subscribed to your database’s weekly database report, you will get an email with your first recommendations.

Each recommendation will have the following:
An explanation of the recommended changes, including some of the benefits of the recommended change (E.g., reduced memory and storage, decreased execution time, prevent ID exhaustion)
The schema or query that it will affect
The exact DDL that will apply the recommendation
The option to apply the recommended change to a branch for testing and a safe migration
You should evaluate each recommendation based on your specific use case. Read the schema recommendations documentation for more information on each recommendation.
Once you better understand the recommendation, you can apply the recommendation by either:
Applying it directly through a database branch with a few clicks
Making the schema change directly in your application or ORM code
Learn more about applying recommendations in the documentation.
How PlanetScale detects schema recommendations in your database
We’ve built a system that we internally refer to as the “Schema Advisor.” It can make schema recommendations and understand when a schema change closes an existing open recommendation.
Each time a production branch’s schema changes within PlanetScale, an event is emitted to Kafka. This triggers a background job to examine the schema for potential recommendations.
We can determine from the schema alone for some recommendations, such as finding duplicate indexes. We also use the databases’ recent query performance and statistics for other recommendations, such as index recommendations.
We first identify potential slow query candidates for index suggestions using Insights query data. We then use Vitess’s query parser and semantic analysis utilities to extract potential indexable columns for the query.
When adding indexes, column order is critically important. To get that right, we patched our fork of MySQL to create another variant of the ANALYZE TABLE ... UPDATE HISTOGRAM commandthat allows us to extract the cardinalities of each column without impacting the database’s statistics table.
With all this information combined, we can make recommendations on how to improve a database’s schema.
Supported schema recommendations
Today, we are launching with four different schema recommendations, but we will add more over time.
Adding indexes for inefficient queries
Removing redundant indexes
Preventing primary key ID exhaustion
Dropping unused tables
Adding indexes for inefficient queries
Indexes are crucial for relational database performance. With no indexes or suboptimal indexes, MySQL may have to scan a large number of rows to satisfy queries that only match a few records. This results in slow queries and poor database performance. The right index can reduce query execution time from hours to milliseconds. You can read more about how database indexes work in this blog post.
To find missing indexes, Insights scans your query performance data daily to identify queries over the past 24 hours for frequently issued queries with a high aggregate ratio of rows read compared to rows returned. It will then parse the query to extract indexable columns, estimate each column’s cardinality (number of unique values) to determine optimal column order and suggest a suitable index.
Removing redundant indexes
While indexes can drastically improve query performance, having unnecessary indexes slows down writes and consumes additional storage and memory.
Insights scans your schema every time it is changed to find redundant indexes. We suggest removing two types of indexes:
Exact duplicate indexes - an index that has the same columns in the same order
Left prefix duplicate indexes - an index that has the same columns in the same order as the prefix of another index
Redundant indexes are remarkably common. Our initial set of recommendations found that 33% of PlanetScale databases have redundant indexes that they may benefit from removing.
Preventing primary key ID exhaustion
As new rows are inserted, it’s possible for auto-incremented primary keys to exceed the maximum allowable value for the underlying column type. When the column reaches the maximum value, subsequent inserts into the table will fail, which can cause an outage for your application. This has been at the root of numerous high-profile outages throughout the years. With monitoring, it is very preventable.
Insights scans all of the AUTO INCREMENT primary keys in your database schema and checks the current AUTO INCREMENT value daily to identify where you might be approaching primary key ID exhaustion. If Insights detects that one of the columns is above 60% of the maximum allowable type, it will recommend changing the underlying column to a larger type.
Additionally, Insights scans queries to parse joins and correlated subqueries to find foreign keys and suggests increasing the column size for those columns.
Dropping unused tables
Dropping unused tables can help clean up data that is no longer needed and reduce storage. If the table is large, it can also decrease backup and restore time.
Insights scans your query performance data daily to identify if any tables are more than four weeks old and haven’t been queried in the last four weeks.
Example: Adding a new index
Let’s walk through an example of applying a new index recommendation to a database. To start, we’ll create a simple posts table:CREATE TABLE `posts` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(255),
  `text` text,
  PRIMARY KEY (`id`)
)

After the table is in production, two different queries start querying against it: The first inserts rows to the posts table in a loop and the second query performs a lookup on the posts.title in a loop:select posts.id from posts where posts.title = ?

As we add more rows to the posts table, a pattern emerges:

As we insert more rows into the posts table, the p50 time for a posts.title increases linearly. At this point, our queries are taking nearly a second, which is not good.
Luckily, our add an index recommendation runs once daily and can identify that this query pattern could benefit from an index. In the recommendation, we see a list of the queries that can use the new index, as well as a description of the DDL that will create the new index:

We can use the “Create and apply” option to create a database branch in PlanetScale and apply the recommended DDL. Then, once we have tested everything on the branch with our existing queries, we deploy it to production. With the change in production, we can now return to our initial query to see the impact of adding the new index:

As expected, the p50 query time dropped drastically after the recommendation was deployed to production due to adding the recommended index.
For more information on schema recommendations inside of PlanetScale Insights, read the schema recommendation documentation.]]></content>
        <summary><![CDATA[Automatically receive recommendations to improve database performance, reduce memory and storage, and improve your schema based on production database traffic.]]></summary>
      </entry>
    
      <entry>
        <title>Foreign key constraints are now generally available</title>
        <link href="https://planetscale.com/blog/foreign-key-constraints-are-now-generally-available" />
        <id>https://planetscale.com/blog/foreign-key-constraints-are-now-generally-available</id>
        <published>2024-02-16T09:00:00.000Z</published>
        <updated>2024-02-16T09:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        <author>
          <name>Rick Branson</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Today, we are announcing that foreign key constraints are generally available on PlanetScale within any unsharded database.
Foreign key constraints can be used to enforce referential integrity in your database. During our beta phase in the last two months, over 2,400 PlanetScale databases have enabled foreign key constraints. You can read more about our foreign key constraints support in the PlanetScale documentation.
Previously, we wrote about how we overcame the technical challenges of supporting foreign key constraints alongside features, such as database branching, non-blocking schema changes with Online DDL, gated deployments, and database imports.
 If you want to horizontally scale your database with sharding and need foreign key constraints, reach out to us, and we can chat more about your database requirements and PlanetScale.
How to enable foreign key constraints in your database
To enable foreign key constraints in your PlanetScale database, go to your database’s ”Settings” page and check the box to Allow foreign key constraints.
On the database’s ”Dashboard” page, you will see a loading spinner that says it is “Enabling foreign key constraints.” Once it no longer shows, you can use foreign key constraints in your PlanetScale database!
For most cases, foreign key constraints should work as expected in PlanetScale. There are a few cases to be aware of that are unsupported or result in less ideal behavior. You can read more in the limitations section of the foreign key constraints documentation.
If you don't have an existing PlanetScale database and have an existing internet-accessible MySQL or MariaDB database that uses foreign key constraints, you can also import it into PlanetScale using our database import tool.
Foreign keys versus foreign key constraints
Foreign key constraints are often confused with foreign keys, which PlanetScale has supported since day zero.
Foreign keys are a logical association between tables in a relational database based on the value of two related columns. It enables you to look up related data based on matching values between specific columns between the two tables. For example, given a simple project management schema, a tasks table may contain a column named project_id that can be used to look up the related project based on the id column of the projects table.
A foreign key constraint is a database construct that takes foreign keys a step further and forces the foreign key relationship's integrity (referential integrity). Namely, it ensures that a child table can only reference a parent table when the appropriate row exists in the parent table. This helps keep the related data consistent.
Using the same example as above, if foreign key constraints are not defined and a project is deleted, the database will take no action on any associated tasks. With foreign key constraints configured between projects and tasks, the database engine can perform cascading actions on the related tasks such as deleting them as well, or nulling the project_id field.
While foreign key constraints can help ensure referential integrity, they can cause degraded performance in high concurrency workloads and introduce more complexity in the database. As with any database feature, developers should weigh the advantages and disadvantages of using foreign key constraints for your specific application.
Helpful documentation on foreign key constraints
The following are useful documentation pages for if you choose to or choose not to use foreign key constraints in your PlanetScale database:
Foreign key constraints documentation
Operating without foreign key constraints
Strategies for maintaining referential integrity]]></content>
        <summary><![CDATA[You can now enable foreign key constraints to enforce referential integrity in your PlanetScale database.]]></summary>
      </entry>
    
      <entry>
        <title>Amazon Aurora Pricing: The many surprising costs of running an Aurora database</title>
        <link href="https://planetscale.com/blog/amazon-aurora-pricing-the-many-surprising-costs-of-running-an-aurora-database" />
        <id>https://planetscale.com/blog/amazon-aurora-pricing-the-many-surprising-costs-of-running-an-aurora-database</id>
        <published>2024-02-15T15:00:00.000Z</published>
        <updated>2026-03-09T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Amazon describes Aurora as a scalable database that's simple to manage, but if you've ever gone through the "Create database" wizard, you know that's not quite an accurate statement. From instance types to storage configurations to monitoring your database, there is much to consider when trying to price an Amazon Aurora cluster.
In this article, we'll cover all the little details you'll need to understand the Amazon Aurora pricing model to get an accurate estimate for an Aurora database.
The information provided in this article is specific to Aurora and not Aurora Serverless. Aurora Serverless is a different configuration that has its own pricing model.
What is Amazon Aurora
Amazon Aurora is a MySQL and Postgres-compatible database platform on AWS that simplifies the process of creating and managing a MySQL database. It streamlines the process of provisioning the necessary infrastructure to run a production MySQL database and provides a number of features that are not available in a standard RDS configuration. These features include automatic failover, read replicas, and the ability to scale the compute and storage resources of your database with minimal downtime.
That said, the pricing structure is not as straightforward as you may believe. There are a number of factors to consider when pricing an Aurora cluster. Let's take a look at what those are, specifically for running a MySQL workload.
Instance type
Instance type refers to the CPU and memory allocated to the underlying MySQL compute node.
When creating an Aurora cluster, one of the first things you are asked is what instance type to select. If you've ever looked at a list of instance types with no context, it's rather confusing. However, there is a naming convention that can be used to understand what each segment of the name means:

The class of instance type, along with the size, has a dramatic impact on the bottom line at the end of the month. For example, the table below outlines the cost differences between three different instance types that share the same class but have different costs per size:
Instance type
vCPUs
Memory (GiB)
Hourly cost
db.x2g.large
2
32
$0.377
db.x2g.4xlarge
16
256
$3.016
db.x2g.16xlarge
64
1024
$12.064
Burstable vs. memory-optimized
Instance types are categorized into two buckets for Aurora: burstable and memory-optimized. Burstable instance types offer some cost advantages because Amazon runs them with fewer allocated resources during typical load while allowing them to 'burst' when higher demand hits for a limited amount of time. This is in contrast to memory-optimized instance types that consistently run at what they are advertised. Because the performance of burstable instances is relatively inconsistent when compared to memory-optimized, Amazon does not recommend them for production workloads.
Reserved instances vs. on-demand pricing
Another factor to consider is opting to prepay for a certain number of hours to use your compute nodes, which is known as using 'reserved instances.' Amazon will discount the on-demand rate if you opt to prepay for reserved instances for a term of either one or three years. The discount varies by the length of the contract, as well as the specific instance type selected. If you can account for how many instances you will use for how long, it can be a great way to lower the cost of your AWS bill.
Replicas
It is a best practice for all production databases to have at least one replica of the same instance type attached to the Aurora cluster but in a different AZ. Replicas are required for quicker failover in case the writer node goes offline. Replicas can also be used to help reduce downtime while rolling out required Aurora instance modifications that require restarts, like changing the instance size or configuration parameters. Therefore, for highly critical applications where downtime can result in significant financial loss or impact on reputation, you should consider having one primary and two replicas of the same instance type in different AZs. However, this effectively multiplies the costs of your selected instance type by the number of replicas you opt to have. It also increases the labor burden of administrating correct replica placement and rollover orchestration.
Storage configuration
In Amazon Aurora, storage works a bit differently when compared to RDS or even configuring your own database cluster on AWS using EC2. Those configurations store your data on underlying EBS volumes and provide a number of choices. On the other hand, Aurora uses dedicated storage appliances that auto-provision 'blocks' of data to store your data. This enables Amazon to auto-scale your storage without you having to worry about running out of space.
As a result, your storage configuration options are limited to standard vs I/O-optimized.
With a standard storage configuration, you are billed on the amount of storage used and the IOPS consumed. With I/O-optimized, Amazon increases the cost for the selected instance type but does not charge you for IOPS consumed. If your application is very I/O-heavy, selecting I/O-optimized storage may reduce your overall bill. Cost savings generally start to appear when your I/O charges exceed 25% of your total Aurora bill. To make the best choice that minimizes your cost, you'll need to have a good understanding of your application's I/O requirements as Amazon does not recommend one storage configuration over the other.
Data transfer costs
Another metric to consider is how much data will be sent to or read from your Aurora cluster. Amazon charges for data transfer in many scenarios, whether that's between availability zones within a region, between regions themselves, or even exiting the public internet. The specific amount charged depends on the services being used.
There are a few scenarios where Amazon does not charge for data transfer, such as data replication within a region or data transferred between an Aurora writer/reader node to an EC2 instance within the same AZ, most other scenarios would incur additional costs for data transfer. Ensuring that your application is hosted in the same region as the replica it is using can help reduce data transfer costs.
Cross-region replication
If you want data replicated to another region to be closer to users, Global database is the name of Amazon's service that enables this for Aurora clusters. Global database allows you to have a separate read-only Aurora cluster that your data will be replicated to. Each region is independently scalable, so understanding the compute requirements for that region and selecting an appropriate instance type can help optimize costs.
Using Global database has additional costs associated with it. To start, you'll pay for the compute instances and storage consumed just like you would in your primary region. In addition to that, Amazon charges additional fees per 1 million write operations performed, as well as standard data transfer costs to replicate data between regions.
Connection pooling
Every connection to a MySQL database consumes CPU and memory resources.
AWS offers RDS Proxy as a lightweight proxy layer between your database clients and writer/reader compute nodes. RDS Proxy will perform connection pooling as well to better manage the resources consumed by the compute nodes. In the case of a node outage, RDS Proxy can also more quickly detect the outage and redirect traffic to another node without dropping client connections.
This is an additional service on top of Aurora and thus incurs an additional charge on your AWS bill.
Change management
Many changes to your database will require your instances to be rebooted, which will incur some downtime while the operation is performed. Some operations, such as adjusting the instance type, can be performed using read replicas with minimal downtime. Another option to reduce downtime is to use a blue/green deployment to create an identical environment where changes can be made before redirecting traffic to that new environment. Amazon recommends using blue/green deployments to make various database configuration changes, such as performing version updates or making schema changes. The process of redirecting traffic is known as a switchover, and while all connections will still be dropped during the switchover, the downtime is reduced when compared to performing changes directly on the cluster.
To learn more about blue/green deployments, check out our branching comparison piece between Aurora and PlanetScale.
As mentioned earlier, when creating a blue/green deployment, you'll have an identical environment set up for you. This means that your compute costs (selected instance type and replica count) will double while both environments are online. In addition to this, Amazon does not remove the old environment after a switchover and requires that you manually remove it before your bill goes back to normal.
Backups
Backups are a critical part of a disaster recovery plan and should be prioritized when pricing your database. With automated backups, you are charged for the amount of storage you use (in GB) minus the latest size of your database. This allows you to have one full backup at no charge, but you will be charged for any subsequent incremental backups. You can also configure how long you need these backups to be kept for before being automatically deleted.
You are also given the option to create manual snapshots of your database. These snapshots are not part of the automated system and are retained even if your database is deleted. Charges for manual snapshots are based on the size of the snapshot, which is not part of the free automated backup allowances.
Monitoring
There are a number of different solutions used to monitor an Aurora database.
By default, Amazon Aurora will send a wide array of metrics to CloudWatch every minute for both the cluster and each individual instance at no extra cost. Metrics range from the number of active transactions to the time it takes to replicate between replicas. There are far too many default metrics to cover here; check Amazon's documentation for the full list.
To get this information in real time, you can use Enhanced Monitoring as a service add-on. Enhanced Monitoring uses CloudWatch Logs, which are charged based on how much data is sent into CloudWatch each month. Associated costs may differ drastically between clusters, but you do get more granular data to analyze as needed.
Finally, Performance Insights allows you to gather data from the actual database engine, giving you information that the MySQL process sees such as database load and query performance. Performance Insights bills based on how long you wish to retain the data, up to two years. The first seven days are included at no additional cost, but you will be billed for any data retention beyond that.
Comparing to PlanetScale
After thoroughly covering various cost considerations when creating an Aurora cluster, let's look at how PlanetScale addresses the same areas and how we bill for these features.
Instance types
When creating a PlanetScale database, we also ask you to select an instance type. However, our selection is dramatically simplified from what Aurora offers. We do not have a concept of burstable vs. memory-optimized, and we display the CPU and memory allocation along with the price when choosing the type. Below is the current pricing grid for creating a Base plan database in the AWS us-east-1 region:

Additionally, each of the displayed types is actually for a cluster of MySQL servers. We follow best practices for all of the databases created on PlanetScale, including having your data replicated across three availability zones.
Learn about the technology we build on in our deep dive into what a PlanetScale database is.
Storage configuration
Our storage configuration is also simplified in the sense that all storage on PlanetScale is the same on network-attached instance types. With PlanetScale Metal, you can choose the appropriate storage size for your workload.
One of the biggest benefits of Metal is the performance improvements. You don't have to purchase a special I/O optimized plan with PlanetScale. Metal instances come loaded with unlimited IOPS by default. On top of that, you will likely save money compared to your equivalent Aurora I/O optimized workload.
Data transfer
We don't have any data transfer costs. The only exception to this is PlanetScale Managed, where PlanetScale can be deployed within a sub-account in your AWS organization. Then, you would pay the same data transfer costs as AWS charges for Aurora databases.
Cross-region replication
PlanetScale offers the ability to create read-only regions, where your data is replicated to a selected region that is closer to your application servers in that region. It is very similar to Global database, except you don't need to pay for the data transfer costs, and you still get a full MySQL cluster with the same resources as the selected instance type.
Connection pooling
PlanetScale for Postgres clusters utilize PgBouncer for connection pooling and automated failovers.
PlanetScale also offers fully-managed Vitess clusters — a database clustering system that was originally developed at YouTube to address scaling issues in 2010. This allows us to take advantage of the connection pooling and load-balancing features that Vitess provides.
A component of all MySQL clusters managed by Vitess is known as vtgate, a lightweight proxy that routes traffic to the correct MySQL node in the cluster. This component is similar to RDS Proxy in that it speaks the MySQL protocol and manages all of the connections to the MySQL nodes, reducing the resources utilized by each node. vtgate communicates with our topology service to know which node in the cluster that traffic should be routed to, and in sharded configurations, it will actually split up the query and send it to the correct node to make sure the requested data can be returned to the client even if it exists across multiple shards.
Since vtgate is a core component of Vitess, PlanetScale can support a nearly unlimited number of connections and it is included in the cost of running a PlanetScale database.
Change management
PlanetScale is significantly more managed than Aurora. While Aurora requires you to create a blue/green deployment, which increases complexity and cost (along with dropping clients to apply changes), many of the scenarios that blue/green deployments are designed for are handled automatically by our platform or our engineers.
For example, when new versions of MySQL are released, our engineers meticulously test them to make sure the database engine is stable and capable of running on PlanetScale's infrastructure. For instance type changes, we take advantage of the fact that our database nodes run in pods on Kubernetes, a container orchestration tool. This allows us to perform rolling upgrades on your compute nodes by spinning up replicas that are configured with the new instance type, replicating your data to the new instances, and redirecting traffic using vtgate. This process is completely automated and does not require any downtime.
For schema updates, PlanetScale users can utilize the concept of database branches and deploy requests. A branch is a completely isolated MySQL cluster that contains a copy of the schema of the upstream branch. You can use this branch to apply and test changes to your schema before using a deploy request to merge changes into the upstream database branch without taking your database offline to do so. The branches used to apply schema changes (called development branches) are designed to consume minimal resources, so you won't pay for an exact copy of your database to take advantage of these features.
All databases on PlanetScale support at least one production branch and one development branch. Base plan databases have an additional development branch included, with the ability to add more at $10.00 per month. You can also add more production branches to Base plan databases at the cost of the selected instance type, which makes the plan very customizable. Enterprise databases have even further flexibility, more of which is detailed on the Pricing page in our documentation.
Backups
Our paid plans have a preconfigured, automatic backup that runs every twelve hours, included at no additional charge. You may also create your own schedule with a predefined retention period or a manual backup of your database, just like Aurora. For backups performed or scheduled outside of the defaults, you will be charged based on the size of those backups.
Monitoring
While Aurora sends your metrics to CloudWatch and requires you to monitor your Aurora database in a completely separate tool on AWS, PlanetScale prioritizes database monitoring directly with our dashboard. To check the health of any node within your cluster, simply select it from the Overview tab of your database.
Insights is the tool we provide to all databases to monitor queries and identify performance bottlenecks. Insights allows you to select a specific time range to see all queries executed within that range, which can be extremely useful for troubleshooting performance issues.
Insights is included with all databases at no additional cost, with 7 days of query data retention.
Conclusion
Aurora is a powerful and scalable database service, but Amazon Aurora pricing is not as straightforward as it is advertised. There are a number of factors to consider when pricing an Aurora cluster, and it can be difficult to get an accurate estimate without understanding all of the little details. This is in contrast to PlanetScale, where we strive to make pricing as simple and straightforward as possible while even including some features at no additional cost.]]></content>
        <summary><![CDATA[Amazon Aurora is pitched as a straightforward and scalable database service on AWS, but there are associated costs that you might not be aware of.]]></summary>
      </entry>
    
      <entry>
        <title>Three common MySQL database design mistakes</title>
        <link href="https://planetscale.com/blog/three-common-mysql-database-design-mistakes" />
        <id>https://planetscale.com/blog/three-common-mysql-database-design-mistakes</id>
        <published>2024-02-13T09:00:00.000Z</published>
        <updated>2024-02-13T09:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Many years ago, I worked for a telematics company that ingested data from hundreds of thousands of devices worldwide. There was a point of incredible growth where we onboarded a customer that gave us a massive number of new devices and a huge bump in revenue. It was a great moment for the company's trajectory, but the increased amount of data being processed highlighted a massive flaw in our system.
The ID column of the data history table (which logged every event that occurred across all devices) was created with the INT data type, and it was quickly running out of space.
It wasn't an issue immediately, but if that column ran out of space, our entire system would come to a halt. Funnily enough, we built a quick tool called “the doomsday clock,” which would roughly calculate the date this would occur. Had we expected this, we would have designed the database with a different type that would have more easily accommodated growth like this, but of course, the results of our decisions are always more obvious in hindsight.
Let's take a look at this issue and a few other common database design mistakes when setting up your database.
Suboptimal data type
The scenario described in the intro of this article highlights the importance of selecting a data type that's big enough to accommodate your existing data, as well as any potential growth you might experience.
This applies to more than just numerical types, though. For example, if you attempted to write a string with 300 characters into a VARCHAR(255) column, MySQL would return an error and reject the write if it is in strict mode, which is the default setting for MySQL. If MySQL is not in strict mode, attempting to insert string data into a column that exceeds its length causes the data to be truncated, losing potentially important data.
If you want to learn more about the different string types in MySQL, check out our article breaking down the text options available to you.
Conversely, you can also select columns that store too much data. While this won't have as much of a negative impact as not having enough room, there are storage and performance implications with over-provisioning columns. Let's assume you have a column storing the US zip code, which is typically five digits. You could default to using INT for the column type (which stores a 32-bit integer), but you'd allocate far more storage than necessary. Utilizing a SMALLINT would be a better choice, as it stores a 16-bit integer and would be more than enough to store a zip code.
These are only a few small examples of selecting an inappropriate data type for your columns.
Missing or redundant indexes
Indexes in MySQL speed up data access by building a separate structure that's optimized to return data if the query's criteria match the configuration of that index.
Indexes are very important when designing a fast database. When indexes aren't utilized, any SQL queries that do not use pagination or provide a LIMIT will perform a scan on that table. When scanning, MySQL will start reading from the first row until it has found every row that matches the criteria. If you have a heavily used query on a particularly large table, repeatedly scanning the table can have massive negative performance implications.
To see the practical effects of missing indexes, our very own Aaron Francis saved the SaaS for one of our customers and documented his process on YouTube!
On the other hand, you can have too many indexes as well.
Every index created will utilize additional storage, so having unused or duplicate indexes directly impacts how much you are paying for that storage. Whenever data is updated or inserted into a table with indexes, MySQL needs to update those indexes (along with their associated statistics) to ensure they are accurate regardless of whether they are used. This can be a time-intensive operation that can create a bad user experience.
If you want to learn more about how to effectively use indexes, we have a whole section covering them in our MySQL for Developers course.
Improperly storing semi-structured data
Over the past 20 years, utilizing NoSQL to store semi-structured data has gained favor with companies that need to process vast amounts of data very quickly.
Plenty of dedicated solutions are available on the market for storing this kind of data. However, MySQL is actually very capable in this area as well. Most semi-structured data stored in a database is represented as JSON. The most obvious way to store this would be to store the string in a TEXT column, but this is definitely not the most optimal way.
MySQL has a dedicated JSON column type that is designed to store JSON in an efficient binary storage format.
Using JSON over TEXT has several key benefits. The first is that InnoDB, the most commonly used MySQL database engine, natively supports querying and filtering based on data within the JSON object stored in the column, removing the need to manually filter after results have been returned to your application code. MySQL also supports building indexes based on data within JSON. This enables fast searches, allowing you to return rows based on your queries more quickly.
If you want to learn more about the JSON data type, our blog has several articles on the topic, including JSON type basics and how to index JSON columns.
Conclusion
Whatever happened to that ID column that was running out of space? Well, luckily enough, the column type was a signed integer, meaning we were able to reseed it to -2,147,483,648. This effectively doubled our ID capacity by assigning negative numbers as IDs and incrementing towards 0. It's not the prettiest solution, but it did help us avoid a rather large amount of downtime that would be required to update the schema for nearly all tables in our database.
Designing a database for growth is no simple task, and things can get out of hand in a hurry. We've touched on only a few potential database design mistakes here, but every MySQL use case is unique and has its own challenges.
If you've encountered design issues, tell us more on Twitter, and make sure to tag @planetscale!]]></content>
        <summary><![CDATA[Learn about a few common mistakes when designing your MySQL database schema.]]></summary>
      </entry>
    
      <entry>
        <title>OAuth applications are now available to everyone</title>
        <link href="https://planetscale.com/blog/oauth-applications-are-now-available" />
        <id>https://planetscale.com/blog/oauth-applications-are-now-available</id>
        <published>2024-02-06T12:00:00.000Z</published>
        <updated>2024-02-06T12:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[One year ago, we launched OAuth applications in limited beta alongside the PlanetScale API. Starting today, everyone can create an OAuth application in public beta and build integrations that seamlessly authenticate with PlanetScale and allow management access to your users’ PlanetScale organizations and databases from your application.
An OAuth application in PlanetScale allows you to get authorization from your users for which organizations and databases the PlanetScale API can interact with.
Building on top of the PlanetScale database platform
We often describe PlanetScale as a “database platform” because it is not just a database. It is much more than that. The platform contains a whole workflow for making safe schema changes with no downtime, alongside intelligent features for monitoring, scaling, caching your queries, and more.
With the PlanetScale API, OAuth applications, and the CLI, you can build on top of the PlanetScale platform, expanding its capabilities across other tools and platforms. For example, you can use GitHub Actions workflows to create database branches from pull requests in GitHub, closely integrating into your existing workflows. The PlanetScale API and OAuth applications expand what is possible with your database.
Some examples of what you can do with the PlanetScale API:
Automatically create and delete database branches from CI/CD pipelines or data migration tooling
Programmatically build out new environments that connect to PlanetScale database branches for testing
Get information about a PlanetScale user, database, branch, organization, and deploy request
Check the status of deploy requests in the deploy queue
Automate creating and deleting database connection strings for internal users or tools
Create, update, approve, deploy, and delete deploy requests programmatically from tooling outside of PlanetScale
You can learn more about what endpoints the API contains in the PlanetScale API documentation.
Examples of live OAuth applications
During the limited beta, we have had various companies building OAuth-based integrations such as:
Netlify
Netlify built an integration that allows you to connect your PlanetScale account to a Netlify site, assign database branches to different deploy contexts, and use the connection object to insert a connection into your database call.
PlanetScale blog post on how the Netlify integration streamlines database management
Netlify integration documentation
Vantage
Vantage built an integration that allows their customers to see their overall PlanetScale costs alongside their other infrastructure providers. Vantage will automatically ingest and visualize the costs accordingly through the OAuth application.
Vantage blog post about the integration
PlanetScale Vantage integration documentation
Cloudflare
Cloudflare built a database integration that allows you to connect to a database from your Cloudflare Worker by getting the right configuration from your PlanetScale database and adding it as a secret to your Worker.
Cloudflare integration documentation
Seltzer
Seltzer is a new web-based GUI database client that integrates with PlanetScale through OAuth, allows you to interact with your PlanetScale database in the browser, and can easily switch between database branches.
Seltzer website
These are just a few examples of what can be built on top of PlanetScale’s OAuth applications. Now, what are you going to build?
Start building today
If you are interested in building on top of PlanetScale and allowing your users to authenticate with PlanetScale to gain management access to their organizations and databases, go to your organization’s OAuth applications page. You can create an OAuth application and immediately start building.
For more documentation on setting up an OAuth application, see the OAuth page in the PlanetScale API documentation.
If you build something you would like to share with us, please email us at education (at) planetscale.com. We would love to hear about your experience building the application, and we may even feature your application in future blog posts, videos, or social media.
If you encounter any issues while building an integration, please let us know since this feature is now in public beta. The PlanetScale Support team will be able to help you out.]]></content>
        <summary><![CDATA[You can now build integrations that seamlessly authenticate with PlanetScale and allow management access to your users’ organizations and databases from your application.]]></summary>
      </entry>
    
      <entry>
        <title>Deprecating the Scaler plan</title>
        <link href="https://planetscale.com/blog/deprecating-the-scaler-plan" />
        <id>https://planetscale.com/blog/deprecating-the-scaler-plan</id>
        <published>2024-02-05T09:00:00.000Z</published>
        <updated>2024-02-05T09:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Update: The Scaler plan has now been fully deprecated. All databases that were not migrated have been upgraded to the Scaler Pro plan. Scaler Pro has since been renamed to the Base plan.
Last year, PlanetScale launched the next evolution of our pricing: Scaler Pro. Fast forward to today, our customers have spoken with their data — it’s now our fastest-growing product. Time and time again, we’ve heard that straightforward resource-based pricing just works. It leads to simpler system design and more predictable billing.
Today, we’re giving notice that we are discontinuing our serverless Scaler plan. Scaler will be removed as a product option on February 12th and existing Scaler customers will have 2 months to move to Scaler Pro. This decision is based on thousands of customers’ feedback and a belief that everyone deserves predictable pricing for their database. Everything we said in our Scaler Pro announcement post seven months ago is even more true today.
PlanetScale for serverless
This does not mean we’re not the home of the best database for serverless — it’s the opposite. Our architecture was built from the ground up to scale in all the dimensions important for infinitely scalable edge applications. We can handle millions of connections, distribute data around the globe, and soon, we will be able to automatically route incoming connections to the closest region with our intelligent edge.
Serverless has tenets like "scale to zero" and optimizes for periods of no use. PlanetScale powers websites and applications that need to be on 24/7/365 around the globe, and turning a database off is the last thing that we optimize for. Instead, we focus on horizontal scaling via sharding, change management, and smooth no-downtime upgrades to help databases with real-world usage.
Most importantly, PlanetScale scales. We are the primary database for companies totaling well over $40B in market cap, with individual clusters containing 1000s of nodes and 100s of terabytes of data. We’ve proven that from our smallest PS-10 to our largest enterprise databases, we’re built for the future of software.
Putting the "server" in "serverless"
Looking at the industry, we’ve all collectively learned a lot in the last few years.
We’ve always known that the phrase "serverless" is an oxymoron. Every request and every query from a serverless product is run on a computer somewhere, and the limitless scale and novel pricing methods may provide alternate units and easier orchestration, but we’re still using servers.
The more we’ve seen PlanetScale adopted, the more we are convinced that serverless as a deployment method is extremely useful — it encourages scaling in stateless applications and allows customers to get started faster. It also puts immense strain on legacy databases. The storm of 10,000 connections after being on the front page of Reddit is the opposite of what the creators of PostgreSQL imagined was possible 25 years ago.
That’s why we’ve continued to make sure that we’re the best place to put state for serverless applications — from effortlessly pushing the limits with one million concurrent connections to putting our database connections on the edge with HTTP/3 — we care deeply about giving developers at every stage access to the best database from wherever they deploy their applications.
It’s also why we’re removing serverless pricing and focusing on building the bedrock of the modern database to power the internet. We don’t want to hide the fact that we’re running on servers — we’re proud of it!
Simply put, being the best database for serverless workloads does not require being serverless. Holding the state for massive horizontal applications is a completely different problem, and it’s one we’re laser-focused on solving.
What does this mean for you?
If you’re using a Scaler plan database today, you can seamlessly migrate it over to a Scaler Pro plan by visiting the dashboard and selecting a Scaler Pro cluster size. On Scaler Pro, you’ll get unlimited usage of the cluster size you’ve selected. And your database will be upgraded with two replicas to improve resiliency and scale out your reads.
If you’re new to selecting database cluster sizes, we’ve made it as simple as possible by displaying CPU and memory usage on the PlanetScale dashboard. We recommend sizing up when your workload consistently reaches 70% or above of the available CPU. If you need help selecting a size, our Support team is available to make recommendations.
In the coming weeks, we will email every current Scaler customer with recommendations on which Scaler Pro cluster size will work best for their workload. We’ll also include a snapshot of what their bill would look like under Scaler Pro to make sure that we’re transparent about the impact of these changes.
On April 12th, we will automatically migrate any remaining Scaler plan databases over to Scaler Pro. This will not change the amount of resources or disrupt your database in any way. It will just be a plan change that will impact your bill.
If you’re a happy PlanetScale customer and are interested in discounts, reach out to us to see what options you have. We have flexible options that can help you save on the database you know and use with a discount program that allows you to grow and scale without concerns.
Again, if you have any questions, do not hesitate to reach out to our support team or read our Scaler Pro upgrade FAQ.]]></content>
        <summary><![CDATA[Today, in our effort to continue being the best database for serverless and applications that require massive scale, we are deprecating the Scaler plan.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale branching vs. Amazon Aurora blue/green deployments</title>
        <link href="https://planetscale.com/blog/planetscale-branching-vs-amazon-aurora-blue-green-deployments" />
        <id>https://planetscale.com/blog/planetscale-branching-vs-amazon-aurora-blue-green-deployments</id>
        <published>2024-02-02T09:00:00.000Z</published>
        <updated>2024-02-02T09:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Blue/green deployments are Amazon's answer to performing maintenance tasks on its RDS and Aurora database offerings. Creating a blue/green deployment spins out a copy of your existing database environment to make changes to. This approach has some similarities with PlanetScale's approach, known as branching, but deep down, the solutions are very different.
This article will cover what Amazon Aurora blue/green deployments are and how they differ from PlanetScale branching.
What is a blue/green deployment?
Before we explore what blue/green deployments are in the context of Amazon Aurora, it's worth discussing what they are on their own. DevOps is traditionally an eight-step framework that allows organizations to quickly get new features into production. Step six of this process is the deployment process, where built artifacts are pushed into production and released for your users to take advantage of.
Blue/green is a deployment strategy used during the deploy phase of DevOps to get code to production seamlessly.
When new features are released as part of a blue/green strategy, you'd typically have two identical environments that can both act as production. As an example, "blue" is the current production environment, and "green" is used to identify the other environment that will become production once the new artifacts are released. When the artifacts are deployed to the green environment, you'd reroute traffic to it and make green your new production environment.
The traffic alternates back and forth between the two environments as they trade their production status based on the latest release of the code.

How do Amazon Aurora blue/green deployments work?
Aurora applies much of the same process outlined above to your database.
When you set up a blue/green deployment for a database hosted on Amazon Aurora, AWS will create a clone of your current Aurora cluster and spin up a brand new "green" environment. Once the new cluster is configured, AWS will configure binlog replication between the two clusters to keep changes synchronized between them. This new environment can then be used to change your database schema or configuration.
Once you have made the desired changes, you'd execute a switchover, promoting the "green" environment to production and route traffic to it.
When a switchover is triggered, several things happen before traffic is directed to the new production environment. Amazon will run the environments through a series of guardrails, which ensure that the changes made are compatible and that no long-running operations are being run against the blue environment. Once that's done, Amazon will perform the following actions:
Stop new writes and drop all connections to the database.
Wait for any final writes and replication operations to execute.
Switch blue to read-only.
Rename the resources in both environments so that the green environment's resources adopt the original names of the blue environment's resources.
From this point, all transactions are now processed in a green environment, and the resources in the blue environment are left online, but prefixed with "old."
Why use blue/green deployments on Amazon Aurora?
Amazon recommends using blue/green deployments to perform maintenance operations that Aurora clusters occasionally require. Here are a few common tasks where blue/green deployments can help reduce the downtime that is typically required.
Schema changes
Blue/green deployments are used to implement minor schema changes, such as adding new columns at the end of a table, creating indexes, or dropping indexes. However, blue/green deployment uses binlog replication to keep the two environments in sync. This means that schema changes should be limited to what is currently supported by binlog replication in MySQL.
Version upgrades
New versions of MySQL are released on a fairly regular basis. These can be major versions with new and improved features or minor versions with security and vulnerability patches. Either way, they should be installed promptly. Creating a blue/green deployment within Amazon lets you test the new version in an isolated environment that matches your current production environment.
Updating the compute instance types
If your selected Aurora instance type runs out of compute resources, you'll need to select a new instance type with more resources. This operation will typically shut down the old compute nodes and boot up new ones, causing the database to be unavailable during this operation. Blue/green deployments can also reduce downtime when scaling instances.
What is PlanetScale branching?
Each branch in PlanetScale's branching technology for Vitess constitutes its own Vitess cluster and includes several infrastructure pieces. For instance, the data resides on a tablet, a pod on the Kubernetes infrastructure that runs the MySQL process with a sidecar process called vttablet. This process enables the pod to communicate with the rest of the Vitess environment. Meanwhile, vtgate is a lightweight proxy routing service that routes MySQL traffic to the correct tablet in conjunction with our topology service. If a tablet goes down for any reason, our systems automatically reroute traffic to a functional tablet and allocate another tablet to replace the downed instance.
In production branches, at least one replica can always be used for read-only workloads or as a backup if the write node goes offline. Our paid tiers automatically have additional replicas, and the solution is fully customizable in Enterprise. On the other hand, development branches use a low-cost instance by default, but this is customizeable if you need more power.
Creating a new development branch spins up a new Vitess cluster entirely isolated from your production database branch. The schema from the upstream branch is copied into the new branch, providing you with the same database structure. As a result, development branches are ideal for building new features that require schema changes. Schema changes can be merged in a non-blocking manner with the upstream branch using deploy requests, which are similar to pull requests on GitHub. This feature allows your team to collaborate and review schema changes before safely merging them with no downtime.

Data Branching®
In addition to branching, PlanetScale supports Data Branching®, allowing you to create a new branch with a copy of your production data by restoring the latest version of a production backup to the new branch. This enables developers to have an isolated environment to test new features or run analytics without affecting their production environment.
Data Branching® is available on our Base plan and above.
Comparing PlanetScale with Aurora Blue/Green deployments use cases
Schema changes
Aurora blue/green deployments rely on binlog replication to sync data between the two clusters, so any schema changes performed on the green side must be compatible with binlog replication; otherwise, they will prevent the switchover operation.
PlanetScale approaches this differently by analyzing the delta between the two schemas and generating the DDL statements to execute on the upstream branch in the correct order. This operation is done using a deploy request. When the deploy request is merged, a "ghost" table is created on the upstream database to apply the new schema. The data is then synchronized between the old table and this new "ghost" table until you can put it into production.
To learn more about how we calculate the changes and generate the DDL statements, check out our article where we discuss our three-way diff schema merge process.
Adjusting instance types
Blue/Green deployments can be used in Aurora to increase switchover success rates through its guardrails. Many instance modifications in Aurora commonly require downtime, such as changing the instance class to scale up or down. Switchover can reduce downtime for such operations, but it is still disruptive since all client connections will be dropped during the process.
Unlike Aurora, we can seamlessly perform rolling upgrades to the tablets without taking down your database. This is based on our use of Vitess on Kubernetes. To make instance type changes, you'd select the new instance type, and our backend systems will do the rest. This allows your applications to continue to operate without being taken offline.
Version upgrades
When you first create an Aurora cluster, you can opt to apply minor version upgrades automatically during a predefined maintenance window. However, this can be disruptive due to the downtime to apply the changes or due to undesired behavior changes in the database software. You can use blue/green deployments to have more control over the database version upgrade process, reducing downtime and associated risks. However, this is a labor-intensive task and requires significant planning.
PlanetScale engineers carefully verify that new versions are compatible with the system before they are applied. Once the changes are validated, they will be applied via rolling upgrades, taking advantage of the Vitess routing and Kubernetes technologies. This approach avoids the stress of having a disruptive maintenance window or further compatibility issues simply because Aurora is dropping support for version-specific features.
Other considerations when using Aurora Blue/Green deployments
Cost considerations
When creating an Aurora blue/green deployment, the new cluster is an exact copy of your existing cluster, including the associated costs for any configured read replicas. This means that if you have one write node and three reader nodes, there will be a point in time where you are effectively paying for eight total nodes (4 in blue and 4 in green) and only really using half of them. Once a switchover is performed, your blue environment is left running, which incurs additional costs.
In PlanetScale, we only initially spin up the resources you need to make schema changes. If you want more resources or want to use your production data, you can do so, but it is not a necessity. This keeps the costs down when needing to change your database.
Failing back
Even though the blue environment remains available for read-only operations, Amazon does not permit you to fail back in any way. This means that if you need to revert the changes made in the green environment, you will need to create a new blue/green deployment, undo any previously made changes, and perform a switchover to the blue environment.
PlanetScale databases support a feature called Schema revert, which will undo the changes made to the schema while still keeping any writes that occurred since the deploy request was merged. As mentioned, we use a ghost table to apply changes and sync the data between the live and ghost tables. With the Schema revert feature enabled, we retain the former production table for a period of time but continue to sync changes into it. When you revert the changes, our system will simply flip the statuses of the two tables, making the old production table active again, but it will contain the writes since the merge.
For more information about reverts, check out our article that dives deeper into how reverts work.
Potential for data inconsistency issues
Amazon's blue/green deployment initially duplicates only compute resources and clones data storage using a copy-on-write mechanism. This can help with storage costs when running parallel environments but introduces potential data inconsistencies across environments. Since writes are allowed in the green environment, the same data can technically be changed in both environments. If this happens, Amazon has no easy or automated way to reconcile which version is correct. Resolving conflicts is challenging, and the responsibility for data consistency falls on you.
As mentioned earlier, PlanetScale branches are isolated MySQL clusters. When you merge schema changes, the data in production is unaffected. Branches with safe migrations enabled (which is required to use deploy requests) are protected from DDL statements to avoid situations like this.
Planned downtime
Even though blue/green attempts to minimize downtime for operations performed using it, some downtime is still involved since performing a switchover will drop all active connections. Long-running operations still permitted by Amazon's guardrails will cause this downtime to last even longer than the marketted "less than a minute" since the process needs to wait for those to complete.
PlanetScale does not drop connections for common maintenance tasks, as mentioned in this article, so your applications can continue to operate without being taken offline.]]></content>
        <summary><![CDATA[Learn the key differences between Amazon Aurora blue/green deployments and PlanetScale branching.]]></summary>
      </entry>
    
      <entry>
        <title>Databases at scale</title>
        <link href="https://planetscale.com/blog/databases-at-scale" />
        <id>https://planetscale.com/blog/databases-at-scale</id>
        <published>2024-01-31T13:00:00.000Z</published>
        <updated>2024-01-31T13:00:00.000Z</updated>
        
        <author>
          <name>Rick Branson</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Learn about the three main aspects of database scaling: storage, compute, and network.]]></summary>
      </entry>
    
      <entry>
        <title>Considerations for building a database disaster recovery plan</title>
        <link href="https://planetscale.com/blog/considerations-for-building-a-database-disaster-recovery-plan" />
        <id>https://planetscale.com/blog/considerations-for-building-a-database-disaster-recovery-plan</id>
        <published>2024-01-30T09:00:00.000Z</published>
        <updated>2024-01-30T09:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In a perfect world, the complex systems we build would run indefinitely without the risk of downtime. Unfortunately, we don't live in that world, so understanding how to recover from failures is necessary when they inevitably happen.
This article will cover some considerations and best practices to ensure you can quickly and efficiently recover your database when disaster strikes.
High availability vs disaster recovery
High availability (HA) and disaster recovery (DR) are two closely related concepts regarding downtime. HA refers to the practice of designing the system in a way that allows it to survive an outage. For example, suppose a primary database server goes offline in a MySQL cluster with data replicated across three servers. In that case, the system can automatically elect a new primary node and reroute traffic within seconds. While some users may notice a brief interruption, most of the user base remains unaffected. This failover process is a benefit of replication, which we'll touch on in more detail later.
On the other hand, DR focuses on recovering from major outages that may have a broader and longer impact on the business. It involves implementing strategies and procedures to restore the database and its associated services after a catastrophic event. Unlike HA, which aims to minimize downtime and ensure continuous operation, DR is concerned with the overall recovery process and restoring normal business operations as quickly as possible.
By combining HA and DR practices, organizations can build resilient database systems that can withstand minor and major disruptions, ensuring the availability and integrity of the data.
Defining recovery goals
Since no system is perfect and will inevitably go down for some reason or another, it makes sense to account for this ahead of time and set recovery goals for your company regarding outages.
What are Recovery Point Objectives?
A Recovery Point Objective (RPO) defines the maximum amount of data your business can afford to lose during a disaster. By setting an RPO, you are establishing the expectation that there may be a maximum of N hours, minutes, or seconds of data loss when the systems return online in the event of a catastrophe. It is crucial to align your backup strategy with your RPOs to ensure you do not exceed the acceptable data loss.
For example, if you have daily backups configured for your database and an RPO measured in hours, you may miss your target RPO if the last backup is 23 hours old. Therefore, it is essential to carefully consider your backup frequency and retention policies to meet your RPO requirements effectively.
What are Recovery Time Objectives?
Recovery Time Objectives (RTOs) represent the targeted duration within which a specific outage should be resolved. A shorter RTO implies a more rapid response from your team to restore systems to full functionality. Often measured in hours or minutes, RTOs serve as a crucial metric for evaluating your DR plan's effectiveness and setting expectations for members of the organization.
These targets are those goals that can be set so your teams are on the same page when it comes to disaster recovery. They should be as small as realistically possible, but the keyword in that statement is “realistically.” It's important to understand that the smaller the RPO and RTO, the more complex and expensive the disaster recovery solution. It's a balancing act between the cost of the solution and downtime.

Databases are stateful, and code is not
Statefulness refers to a condition where the usage and functionality of a system depend directly on the previous actions taken on that system. Code is stateless, meaning that the processes that run the code don't necessarily rely on any specific underlying data to run. So, if your web application server hits a snag and goes offline, you can smoothly switch to another server, reroute the traffic, and get back to business as usual.
Now, databases operate differently. They're stateful because the data within the database results from the many transactions issued against the database from the time it was first built. If the server hosting your MySQL database crashes, you can't just swap it with a fresh MySQL server. Your application may come back online, but it's useless if your users lose all their data. The statefulness of the system is why handling database disasters requires careful planning and attention, a step up from the more agile nature of fixing application issues.
MySQL functionality used for disaster recovery
MySQL replication
Replication allows your data to replicate across multiple MySQL servers. It is crucial not only for keeping your database highly available but also for handling disasters. If your primary database location has issues, having a replicated copy in another availability zone or region lets you get things back up and running quickly. In a typical setup, one server accepts both read and write workloads (the primary or source), and the others (replicas) are for read-only workloads.
MySQL gives you two replication modes: asynchronous and semi-synchronous. In the default semi-synchronous mode, the primary server commits transactions and waits for at least one replica to confirm it got the message. On the other hand, asynchronous mode has the primary node committing the transaction and responding to the client immediately without waiting for replicas.
Semi-synchronous mode works well in low-latency situations, like servers in the same region, while asynchronous mode is better for longer distances. Using asynchronous replication over distances might mean the data in the remote region is a bit behind since network traffic is slower to respond, and the primary does not wait for those replicas to acknowledge that they've seen the transaction.
Still, it's usually a fair tradeoff, as waiting for the primary region to come online may not be the most practical option. This fact is especially true when you consider the RTOs and RPOs we discussed earlier and the fact that the data in the remote region is still available, even if it's not the most recent.
To learn more about MySQL replication, check out our blog post, where we dive into the details of replication and the best practices for setting it up.
Database backups
A robust backup strategy is crucial for effective disaster recovery. There are two types of backups: logical and physical. Logical backups consist of SQL statements that can recreate the database from scratch, while physical backups are copies of the actual files stored on disk. In either case, both types represent the database at a specific point in time.
You'll also have to consider when to take full backups instead of incremental backups and when each is appropriate. Full backups, as the name suggests, are complete representations of the database from when the backup was taken. In contrast, incremental backups represent the changes to the database that have occurred between two points in time. Incremental backups use the binary log for their backups, and they need to be restored in the order they were taken to get a complete version of the database.

The backup process can be resource-intensive and impact the server it is being performed on, which can negatively affect the responsiveness of the application using the database. This impact is greater for larger databases as they take longer to back up. Since replication allows you to create replicas, they are perfectly suited to handle read-only workloads. Backups are often performed on replicas to reduce the load on the primary server.
At PlanetScale, we use read-only replicas to perform backups of our database, but we go a step further and restore the most recent backup of a database to a backup-dedicated replica before allowing replication to catch it up and take another backup. This process has the added benefit of validating backups on our system to ensure our customers have a reliable backup to restore from in the event of a disaster.
Building a database disaster recovery plan
No disaster recovery plan looks the same between businesses, so here are a few things to remember when building a plan for yours.
Define your RPOs and RTOs
As we discussed earlier, defining your RPOs and RTOs is important. Doing this will help you understand what you need to do to meet these targets and the cost of doing so. This part of the process should be done in conjunction with business leadership, as they will need to understand the cost of downtime. Having other key members of the organization involved in this process will also help ensure that everyone is on the same page regarding disaster recovery, which reduces the friction between teams due to the outage.
Use cross-region replication to speed up recovery
Rerouting traffic to a live database server in a different region will always be quicker than restoring your entire database from a backup. There's also a pretty good chance that the data loss incurred from replication lag (the delay between when the primary writes traffic and replicas write it) will be less than the data loss from having to restore a full backup that is several hours old or even an incremental backup that could be a few minutes old.
Prioritize what needs to be restored
Not all infrastructure or applications are created equal. Some systems are more important than others, including their associated databases. Defining what parts of your infrastructure are most or least important can help you understand what needs to be restored first and what can wait. Categorizing your infrastructure will help you prioritize your recovery efforts and ensure that the most important systems are back online as quickly as possible. You might even spin out separate RTOs and RPOs for different systems based on their importance to the business.
Use loss of revenue as a metric
The systems built for businesses ultimately do one of two things: save money or make money. When building your disaster recovery plan, it's important to understand the cost of downtime in terms of lost revenue. This metric should be used when defining your RPOs and RTOs and helping prioritize which systems would cost the business the most and, therefore, need to be restored quickly. If you're in the beginning stages of building a plan, using revenue loss will also help you to make a more compelling case for the resources you need to build a robust disaster recovery plan.
Automate your recovery process
Humans make mistakes, more so even when they are under pressure to recover critical business systems. It's important to minimize the possibility of human error when it comes to disaster recovery. Automating your recovery process can help ensure that the process is repeatable and can be done quickly and efficiently. This will also help to ensure that the process is done the same way every time and that it can be performed by anyone on your team, not just the most experienced members.
Test the plan regularly
The best way to ensure that your disaster recovery plan works is to test it regularly. Testing consistently will help you to understand how long it takes to recover your systems and what the impact of that recovery is on the business. It will also help you understand if your RPOs and RTOs are realistic and if they can be met with your resources. Regular testing will also help you understand if the plan needs to be updated and if any new systems need to be included in the plan.
Validating backups is also a critical aspect of a disaster recovery strategy. It is not enough for backup software to report successful backups. After all, backups are only as good as their ability to be recovered. Corrupted or misconfigured backups can lead to devastating consequences during a crisis, which can ultimately ruin a business.
By automating and regularly testing your recovery process, you can also be confident that your backups can be restored when they are needed most. Amazon calls these events "game days" where a failure is simulated to test the recovery process.
Conclusion
Disaster recovery is a crucial aspect of database management, and it is important to have a plan in place to ensure that your business can recover from an outage. Having a plan that is communicated across the organization and can be easily executed can significantly impact the downtime your business may incur whenever a disaster inevitably hits.]]></content>
        <summary><![CDATA[Learn different considerations and best practices for quickly and efficiently recovering your database when downtime hits.]]></summary>
      </entry>
    
      <entry>
        <title>Working with Geospatial Features in MySQL</title>
        <link href="https://planetscale.com/blog/geospatial-features-mysql" />
        <id>https://planetscale.com/blog/geospatial-features-mysql</id>
        <published>2024-01-25T09:00:00.000Z</published>
        <updated>2024-01-25T09:00:00.000Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Data is abstract. Geospatial design management is how engineers across various disciplines make sense of complex data to make more informed decisions and better understand the spatial relationships in the world around us. In this blog post, I explore how complex data and geographic features can be represented in MySQL.
Geospatial data, often referred to in technical documentation as geodata, includes information related to locations on the Earth's surface. In MySQL, geographic features represent anything in the real world with a location and are defined as either Entities or Space.
Type
Definition
Examples
Entities
Specific objects with defined boundaries and individual properties
Landmarks: Mountains, rivers, forests, buildings
Infrastructure: Roads, bridges, power lines
Administrative areas: Countries, cities, states
Points of interest: Restaurants, shops, ATMs
Spaces
Continuous areas defined by their location and characteristics
Land cover: Forest, grassland, urban areas
Elevation: Topography, hills, valleys
Soil types: Sand, clay, loam
Environmental data: Temperature, precipitation, air quality
MySQL spatial data types
In the paragraph above, we covered what Spaces and Entities are. How exactly are these geographic features represented in MySQL and other relational databases? These real-world objects and areas can be modeled within the database by utilizing specific data types and spatial functions. MySQL's capabilities revolve around three core geospatial object types: points, paths, and polygons.

In MySQL, spatial data types store geometry and geography values in the table column. Both single-geometry and multi-geometry types are supported. Single-geometry values include GEOMETRY, POINT, LINESTRING, POLYGON. Multi-geometry types that represent multiple objects of the same type include MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION.
Type
Description
Examples
GEOMETRY
Stores any type of geometry value. It is a noninstantiable class but has a number of properties common to all geometry values.
Link to documentation
POINT
Stores a MySQL single X and Y coordinate value
POINT(-74.044514 40.689244)
LINESTRING
Stores a set of points that form a curve. An ordered list of points connected by edges
LINESTRING(0 0, 0 1, 1 1)
POLYGON
Stores a set of points in a multi-sided geometry. Similar to a linestring, but closed (must have at least three unique points, and the first and last point-pairs must be equal)
POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5)). Each ring is represented as a set of points.
MULTIPOINT
Stores a set of multiple point values
MULTIPOINT(0 0, 20 20, 60 60)
MULTILINESTRING
Stores a set of multiple LINESTRING values
MULTILINESTRING((10 10, 20 20), (15 15, 30 15))
MULTIPOLYGON
Stores a set of multiple POLYGON values
MULTIPOLYGON(((0 0,10 0,10 10,0 10,0 0)),((5 5,7 5,7 7,5 7, 5 5)))
GEOMETRYCOLLECTION
Stores a set of multiple GEOMETRY values. Note that MySQL does NOT support empty GeometryCollections except for the single GeometryCollection object itself.
GEOMETRYCOLLECTION(POINT(10 10), POINT(30 30), LINESTRING(15 15, 20 20))
Supported spatial data formats
MySQL supports several spatial data formats for storing and manipulating geospatial data within its database. Here are the three primary formats:
Well-Known Text (WKT) format: Which is a human-readable text format for representing geometric objects
Uses keywords like POINT, LINESTRING, POLYGON followed by coordinates and optional metadata.
Well-Known Binary (WKB) format: Which is a compact binary format for representing geometric objects. It’s not human-readable but it’s easily parsed by software and tends to have more efficient storage and transmission than WKT.
Internal Format: MySQL stores spatial data internally in a format similar to WKB but with an additional 4 bytes for storing the Spatial Reference Identifier (SRID). SRID defines the coordinate system of the geometry, ensuring accurate interpretation.
Working with geospatial features in MySQL
Geospatial objects are just another data type in MySQL and can be used right alongside numbers, strings, and JSON.
PlanetScale supports geospatial objects. If you'd like to follow along with these examples, sign up to spin a database cluster in seconds.
Creating a geospatial table
Use the CREATE TABLE statement to create a table with a spatial column. Here we have a table named geom that has a column named g that can store values of any geometry type. We also defined the column with a spatial data type to have an SRID attribute and explicitly indicated the spatial reference system (SRS) for values stored in the column:CREATE TABLE
    `locations` (
        `id` int NOT NULL,
        `city` varchar(255) NOT NULL,
        `city_ascii` varchar(255) NOT NULL,
        `country` varchar(255) NOT NULL,
        `iso3` varchar(3) NOT NULL,
        `admin_name` varchar(255) NOT NULL,
        `capital` varchar(255) NOT NULL,
        `population` int NOT NULL,
        `g` geometry NOT NULL SRID 4326,
        PRIMARY KEY (`id`),
        SPATIAL KEY `g` (`g`),
        FULLTEXT KEY `city_ascii` (`city_ascii`)
    );

Use the ALTER TABLE statement to add or drop a spatial column to or from an existing table:ALTER TABLE geom ADD pt POINT;
ALTER TABLE geom DROP pt;

Querying geospatial data
Understanding geographic features and their representation in MySQL is crucial for working with spatial data effectively. This allows you to store, retrieve, analyze, and visualize geographic information efficiently within your database.
MySQL provides various spatial functions for manipulating and analyzing geographic data. Let's cover some of the common spatial functions.
Location functions
Location functions are used to extract coordinates.ST_GeomFromText(wkt_string) -- to convert WKT to a geometry object
ST_X(geom), ST_Y(geom) -- to extract coordinates.

Distance calculations
Distance calculations are used to measure the distance between features.ST_Distance(geom1, geom2)

Area and perimeter calculations
Area and perimeter calculations are used to determine the area and perimeter of polygons.ST_Area(geom) -- calculate the area of a polygon

Intersection and containment
Intersecton and containment are used to find features that overlap or are contained within others.ST_Contains(geom1, geom2) -- to check if one feature contains another
ST_Intersects(geom1, geom2) -- to check if features intersect.

Buffering
Buffering is used to create zones around features based on a specified distance.ST_Buffer(geom, distance) -- to create a zone around a feature with a specified distance.

Analysis functions
Analysis functions can be used to combine or diff geometries.ST_Union(geom1, geom2) -- to combine geometries
ST_Difference(geom1, geom2) -- to obtain the difference between geometries.

Relationship functions
Relationship functions are used to detect relationships between features.ST_Touches(geom1, geom2), ST_Crosses(geom1, geom2)
ST_Overlaps(geom1, geom2) -- to determine various spatial relationships between features

Examples using these spatial functions
Remember, the specific functions and queries you use will depend on your specific data and analysis goals. Let's look through some common examples of spatial functions.
Distance between two citiesselect st_distance_sphere( (
            select g
            from locations
            where city_ascii = 'Santos'
        ), (
            select g
            from locations
            where
                city_ascii = 'Sao Paulo'
        )
    );

select st_distance_sphere( (
            select g
            from locations
            where
                city_ascii = 'New York'
        ), (
            select g
            from locations
            where city_ascii = 'Blauvelt'
        )
    );

Find cities in a radiusselect
    city,
    st_astext(g),
    st_distance_sphere(
        g, (
            select g
            from locations
            where city_ascii = 'New York'
        )
    )
from locations
where st_distance_sphere(
        g, (
            select g
            from locations
            where
                city_ascii = 'New York'
        )
    ) <= 15000
order by 3 desc;

select
    city,
    st_astext(g),
    st_distance_sphere(
        g, (
            select g
            from locations
            where city_ascii = 'Santos'
        )
    )
from locations
where st_distance_sphere(
        g, (
            select g
            from locations
            where
                city_ascii = 'Santos'
        )
    ) <= 15000
order by 3 desc;

Other examples (not related to the table that we previously created)
Find all restaurants within 1 km of a specific point:SELECT * FROM restaurants
WHERE ST_Distance(ST_GeomFromText('POINT(10 20)'), location) <= 1000;

Find all parks that intersect with a specified polygon:SELECT * FROM parks
WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON((10 20, 20 30, 30 20, 10 20))'));

Find the total area of all forests:SELECT SUM(ST_Area(geom)) FROM forests;

Addtional geospatial feature notes
Spatial databases for geographic features: MySQL 8 vs 5.7 compared
Better support for spatial data handling was one of the major improvements included in the MySQL 8 release. However, it was already possible to store and process geographic features using earlier MySQL versions as well as competing database systems. The first major improvement brought forth by MySQL 8 was better support for coordinate reference systems a.k.a. spatial reference systems (SRS). In the table below, we can see that there are now three kinds of spatial reference system available in MySQL 8:
Type of spatial reference system (SRS)
Explanation
Coordinates
Units
Projected SRS
Projection of a globe onto a flat surface — a map
Cartesian
Distance units: meters, feet, etc.
Geographic SRS
Non-projected — an ellipsoid
Latitude-longitude
Any angular unit
SRS with SRID 0
Default SRID for spatial data in MySQL
Infinite flat Cartesian plane
Unitless
While spatial reference systems are not a new concept in MySQL, with version 8.0 they directly affect computation. Each spatial reference system is denoted by a spatial reference system identifier (SRID). There are more than 5,000 spatial reference systems to choose from.
The second major improvement to spatial data handling in MySQL 8 pertains to spatial indexing, that is, the optimization of columns holding spatial data. Two requirements have to be met for spatial indexing to work properly:
The geometry columns to be included in the index need to be defined as NOT NULL.
Columns need to be restricted to a spatial reference system identifier (SRID), and all column values must have the same SRID.
Prior to MySQL 8.0, spatial features were stored with a spatial reference system identifier (SRID), but the database couldn't utilize this information for calculations. Instead, all functions operated on a flat plane (SRID 0). This meant users had to create custom functions to convert units and perform accurate calculations, requiring a deep understanding of math and geometry.Furthermore, many spatial relationship functions only used the minimum bounding rectangle (MBR) instead of the object's actual shape, limiting their accuracy.]]></content>
        <summary><![CDATA[In this blog post, we explore how complex data and geographic features can be represented in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale vs Amazon Aurora replication</title>
        <link href="https://planetscale.com/blog/planetscale-vs-aws-aurora-replication" />
        <id>https://planetscale.com/blog/planetscale-vs-aws-aurora-replication</id>
        <published>2024-01-24T15:00:00.000Z</published>
        <updated>2024-01-24T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Amazon Aurora is Amazon's solution for simplifying the process of scaling up RDS databases by allowing you to implement read-only replicas to offload some of the workloads from your primary read/write node. PlanetScale also allows our users to do the same, however, the strategy in which Amazon replicates databases is drastically different from traditional MySQL replication.
In this article, we're going to cover how PlanetScale implements replication compared to Aurora, all while touching on various similarities and differences between the services.
How does MySQL replication work?
Before we compare and contrast the differences between PlanetScale and Aurora's replication strategies, it's worth exploring how MySQL replication works at a basic level.
Replication works when data is, well, replicated from one MySQL server to another. The collection of servers within a replicated environment is known as a cluster. Within most clusters, one server acts as a read/write server known as the source. All other servers in the environment, known as replicas, can be used for read-only workloads such as backups or analytical processes.
Each source contains a log of all transactions it processes in a file known as the binary log, or binlog for short.
As data is written to the binlog on the source, each replica has a process that can read those log entries so they can be replayed. As transactions are replayed, each replica is eventually brought up to date with the source to contain the same data the source has. Depending on the replication mode used, there is often a small delay between the time the data is written on the source and the time replicas are brought up to date. We call that the replication lag.
PlanetScale database clusters follow this traditional style of replication, meaning each MySQL node contains a copy of the data it needs to operate properly.

Step
Description
1
Clients that need to write data connect to the source MySQL server.
2
Clients with read-only workloads can connect to one of the replicas.
3
On writes to source, transactions are relayed to replicas which will apply them locally. In semi-synchronous mode, the source can be configured to wait for one or more acknowledgments from replicas before responding to the client to ensure durability.
If you want to learn more, we also have an article that dives deeper into MySQL replication.
For more info on MySQL replication, check out our post "What is MySQL replication and when should you use it?".
How does Aurora replication work?
While Aurora does use the binary log for external replication, AWS has built a closed and proprietary replication system that deviates from the traditional MySQL replication configuration for replicating within an Aurora cluster.
Instead of storing the redo log entries directly on the attached volumes, they are forwarded to dedicated Aurora storage appliances in the same availability zone as the source compute node. Data on this appliance is stored within 10 GiB segments spread across three availability zones in a given region. Before the compute node responds to the application, Aurora will ensure that at least four of the six default segments have a replicated copy of the data to ensure durability should a data center be taken offline.
Since data is replicated on the storage level, read-only compute nodes can be started at any time in an availability zone containing a copy of the data for that node to read.
For any pages that have been read to memory, the source node will directly notify any read-only nodes of updates. This causes the read-only nodes to accommodate the changed data. As a result, the risk of reading stale data is reduced, however, replication lag still needs to be considered.

Step
Description
1
Clients that need to write data connect to the read/write compute node.
2
Clients with read-only workloads can connect to one of the read-only nodes.
3
Read/write nodes can read from and write to storage segments. The R/W node will ensure four segments have acknowledged writes before responding.
4
Read-only nodes are notified of data changes so in-memory pages can be updated.
5
Read-only nodes have read-only connections to storage segments.
6
Storage segments replicate to each other.
Similarities between PlanetScale and Aurora replication
Automatic node failover
Whenever a new database is created in PlanetScale, we automatically spin up a MySQL cluster containing a source and at least one replica for our production database branches, regardless of the plan you've selected. These nodes are part of a Kubernetes cluster that our backend systems use to ensure that the database cluster is online and healthy. If a replica goes offline for whatever reason, a new one will be automatically created to replace it. If a source goes offline, one of the replicas will be elected to become a source, and a new replica will created to replace it.
Aurora handles this process in a very similar approach, where it will automatically replace nodes that crash or go offline for whatever reason.
Query proxying
In the same light, PlanetScale and Aurora have dedicated query proxy services that automatically reroute traffic trying to access a node. This minimizes any downtime clients may experience based on the failure, making them more transparent.
Storage autoscaling
Aurora's storage appliance will automatically allocate new storage segments as needed. PlanetScale does this as well by monitoring the underlying volumes that contain data for your database and expanding or adding volumes when required. Both platforms provide a solution that essentially prevents your database from ever running out of space.
Cross-region replication
Deploying replicas across different regions can help speed up data access for users in that locale. Both platforms allow you to create read-only regions (named Global Database in Aurora) to do exactly this. When a read-only region is create, replicas will be created in the region of your choice and asyncronous replication will be configured between the home region (where the production database currently resides) and the selected region.
Run them in your own AWS account
While not strictly about replication, both platforms can be run within your own AWS account. PlanetScale's enterprise plan allows you to run the PlanetScale stack directly in your account, all while allowing you to use our beloved UI.

Where PlanetScale excels
PlanetScale is based on proven, open-source software
PlanetScale's Vitess offering is built on top of Vitess, a MySQL-compatible and horizontally scalable project that is completely open source. Vitess was built in 2010 by the engineering team at YouTube to address scaling issues based on their incredible growth at the time. This predates AWS Aurora by four years and is used by some of the largest companies in the world, like Slack, GitHub, and Pinterest, just to name a few.
Because it is open-source, you can install and test it yourself as well. To learn more about Vitess, check out the documentation portal available on their website or browse our blog for more info on how PlanetScale uses Vitess.
Version upgrades
Because we use a traditional replication setup for MySQL, we can perform rolling MySQL upgrades without taking your database offline. This contrasts Aurora's approach, which requires upgrades to be performed within a maintenance window, which can lead to downtime for your database.
Sharding
Sharding is the practice of splitting up large datasets across multiple MySQL servers or clusters to balance the load, which leads to higher data throughput at a lower price point. Currently, Amazon Aurora does not support this capability for MySQL workloads, so you'd be required to increase the cost of your compute nodes should you start hitting performance bottlenecks.
On top of that, our query proxy service known as VTGate is built with sharding in mind so it can intelligently route queries to the correct MySQL servers (or shards) even if a query needs to access the data stored on multiple shards.
Automatically validated backups
Backups are extremely critical to a disaster recovery strategy, and we take them very seriously. While both PlanetScale and Aurora support automated backups, we also validate the backups of our databases automatically every single time a new backup is created. This is only possible because we use the traditional approach for MySQL replication.
Instead of creating a fresh snapshot of your database every time a backup is performed, we restore the most recent backup of your database to a special MySQL node in the cluster that's dedicated to this process. Once the backup is restored, we use the built-in MySQL replication to copy the latest changes into this node before creating a new backup. If a backup is unhealthy, this process will fail and a fresh backup will be triggered to take its place.
By following this process, you can always be confident that backups on our platform are validated and healthy to restore from.]]></content>
        <summary><![CDATA[Learn about how Amazon Aurora replication works, and how it compares to the traditional MySQL replication strategy used by PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing the Vantage and PlanetScale integration</title>
        <link href="https://planetscale.com/blog/introducing-the-vantage-and-planetscale-integration" />
        <id>https://planetscale.com/blog/introducing-the-vantage-and-planetscale-integration</id>
        <published>2024-01-23T12:00:00.000Z</published>
        <updated>2024-01-23T12:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We are excited to announce that you can now connect your PlanetScale account to Vantage to sync all your invoice data with the new Vantage + PlanetScale integration.
What is Vantage?
Vantage is a cloud cost observability platform that allows you to analyze, understand, and report on your organization's cloud costs.
Now, with the PlanetScale integration, you can include your PlanetScale databases in your cost reporting. With Vantage, you can create budgets, set up alerts, monitor tagged resources, receive cost recommendations, and more.
Getting started with the Vantage integration
Visit the Vantage + PlanetScale integration page to learn how to get started. You can also find this information in the PlanetScale documentation.]]></content>
        <summary><![CDATA[The Vantage + PlanetScale integration is now available.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL isolation levels and how they work</title>
        <link href="https://planetscale.com/blog/mysql-isolation-levels-and-how-they-work" />
        <id>https://planetscale.com/blog/mysql-isolation-levels-and-how-they-work</id>
        <published>2024-01-08T15:00:00.000Z</published>
        <updated>2024-01-08T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In the early 1980’s, computer scientists Andreas Reuter and Theo Harder coined the term ACID to describe a set of properties related to database transactions designed to keep data stored reliably and with integrity.
Most (if not all) modern database systems are built around ACID compliance. By adhering to these fundamentals, businesses can confidently trust the data within their database, whether it’s for a small project management app, or a large banking system. Isolation levels, as well as the related concepts, are cornerstones that enable MySQL to fulfill ACID guarantees.
In this article, we’ll break down how multiple clients can work with a single database and maintaining data consistency by using isolation levels.
What is a MySQL isolation level?
A MySQL isolation level is one of four modes that can be set on a MySQL session that controls how transactions should behave when executing concurrently.
This concept relates directly to the Isolation requirement of ACID compliant databases, which states that transactions should be executed in a way that does not affect other transactions. Depending on the isolation level set on a session, various locking mechanisms are used to manage which transactions have access to what data at any given time. This also determines the different inconsistencies (known as violations) that can surface when running multiple transactions concurrently.
With that brief explanation on what a MySQL isolation level is and what benefit it provides, let’s further explore the different isolation levels, transactions, violations, locks, and how they all relate to each other.
What is ACID compliance?
Before we dive deeper into how MySQL keeps data consistent, it’s worth briefly explaining the concepts that make up ACID.
Atomicity
Atomicity means that any number of SQL operations can be executed as a single unit of work known as a transaction.
Transactions are used in MySQL to execute one or more statements as a group, satisfying the Atomicity requirement of an ACID compliant database. When statements are executed as part of a transaction, no data within the database is actually changed until the transaction is committed. Alternatively, if something goes wrong within the transaction, you are given the opportunity to roll back any changes, which can prevent issues with the underlying data if committed.
By default, MySQL has autocommit turned on, which means that individual statements issued are automatically committed without you having to manually start a transaction.
Consistency
Consistency ensures that the database can confidently be moved from one state to another, avoiding any adverse effects or corruption of the data. This is done by enforcing things like constraints, cascading effects, triggers, etc.
Isolation
Isolation states that transactions are executed independently and in a controlled and ordered way since many clients and threads are often connected to a single database and executing transactions concurrently. MySQL isolation levels relate directly to this part of ACID compliance.
Durability
Durability simply means that once a transaction completes, the data can survive system failures and outages by storing the data on persistent storage. Memcached, for example, stores all of its data in memory, so it would fail this requirement.
Example: A music store
The remainder of this article will use a common example of an online store that specializes in selling units of an antique audio medium known as compact discs, or “CDs”.
The schema will be relatively simple with four entities:
products will contain the available stock of a given CD, including the name, artist, and quantity available.
unit_status will list the quantity of CDs for any product that are currently available, on hold, or pending shipment.
customer_transactions will contain records of when sales are made.
customers will list all of the customers that have purchased CDs.
Here is a diagram of what the database will look like:

Isolation violations
Violations are various read phenomena that can occur when transactions are executed concurrently, and are specifically what isolation levels seek to prevent. Depending on the selected MySQL isolation level (each of which will be covered later in this article), one or more violations may be permitted in the name of database performance and query consistency requirements.
Let’s take a look at the three common violations you may encounter while building around your database.
Dirty read
A dirty read occurs when a query within one transaction returns inconsistent data because it may read a new version of the data that has not yet been committed to the database based on another concurrent transaction.
To demonstrate this, let’s say that two customers are interested in purchasing the same CD and two units have a status of available. One customer purchases both units, which prompts the system to move the status of two units to sold_not_shipped before reducing the quantity available. At the same time, the other customer checks the inventory to see if the CD is available, and the system returns that there are still two units available.
The transaction used to update the status for the CD is as follows:start transaction;
update unit_status set quantity = quantity + 2 where product_id = 20 and status = 'sold_not_shipped';
update unit_status set quantity = quantity - 2 where product_id = 20 and status = 'available';
commit;

Assuming the following query is executed during the above transaction just before commit; is called, the result would be 0 even though the transaction is not finished yet.select quantity from unit_status where product_id = 20 and status = 'available';

Here is a timeline of what the above transactions would look like that would result in the customer seeing a potentially unexpected result for quantity available:

At first glance, this may not appear to be an issue since the database reflects that the values of the two rows in the unit_status table after both UPDATES occurred. However, if that transaction issued a rollback; instead of commit;, the data returned would be inaccurate since the changes were reverted after the customer was shown that no units were available.
Non-repeatable reads
Non-repeatable reads occur when a transaction with multiple select queries reads the same rows within that transaction, but the data within those rows is different between selects since another transaction has modified the data in that time.
For this example, the store owner is looking to read the number of units available for a given CD across two select queries, however they receive a different result between those two statements since it occurs at the same time that a customer is purchasing the CD.
Here are the queries used to get the quantity available:start transaction;

-- Get the current available quantity of the CD, will return 15
select quantity from unit_status where product_id = 20 and status = 'available';

-- The same query will now return 13 (see the transaction below)
select quantity from unit_status where product_id = 20 and status = 'available';

commit;

And here is the same transaction used in the previous example to update the inventory of a given CD when it is purchased:start transaction;

-- Move the units from available to sold_not_shipped
update unit_status set quantity = quantity + 2 where product_id = 20 and status = 'sold_not_shipped';
update unit_status set quantity = quantity - 2 where product_id = 20 and status = 'available';

commit;

Because these two transactions are executed simultaneously, they overlap each other, resulting in the second statement in the first transaction returning a different value.

Phantom reads
Phantom reads occur when the actual rows returned between select statements in a transaction differ because another transaction has inserted rows before the first can complete.
Using the CD store example, lets say that the owner wants to know the count of all products in the database. The following transaction is used to get that information:start transaction;

-- This will return 100.
select count(*) from products;

-- Add another one for good measure, or for demo purposes, you decide.
-- Either way this would return 101.
select count(*) from products;

commit

At the same time, an employee receives a shipment a new CD, which needs to be added to the database. The following transaction is used to add the new units:start transaction;

insert into products set album = 'The Battle Of Los Angeles', artist = 'Rage Against The Machine', release_year = 1999, cost = 1500;

commit;

Finally, here is what the timeline would look like to cause this phenomena:

Because of the newly inserted records in unit_status table, the summation of the quantity available would be different between the two select statements.
Locks and how they relate to isolation
Transaction isolation is enforced by using various types of locks, or flags that can be set on rows and tables to prevent data from being read or modified.
MySQL will use one of two types of locks depending on the isolation level: shared or exclusive locks.
Shared locks
A shared lock is one that can be created by transactions for ensuring the data they are reading doesn’t change. When a shared lock is created on a row, that row is still readable by other transactions. Other transactions, however, cannot modify the data within that row. Any number of transactions can create a shared lock on a row and all locks will need to be released before the data can be modified.
Exclusive locks
When a transaction creates an exclusive lock on a row, only that transaction can read or write the data. If another transaction attempts to read or write data to that row, it will be prevented from doing so. This is especially useful during transactions where you expect data to be updated to prevent some of the violations outlined above.
Gap locking
Gap locking is a special type of lock that is used by MySQL to prevent phantom reads. Gap locking will use criteria in where clauses to lock the space around the read data, preventing rows from being inserted that may alter the query if its run a second time.
To demonstrate this, let’s use the music store example and assume the owner wants to run a sale on any CDs released in 1999:start transaction;
select * from products where release_year = 1999 for update;
update products set cost = 800 where release_year = 1999;
commit;

If an employee decides to add a new CD to the products table that was also released in 1999, MySQL would make that transaction wait until the previous one had completed before proceeding:insert into products (album, artist, release_year, cost)
	values ('The Battle Of Los Angeles', 'Rage Against The Machine', 1999, 1500);


This prevents the second transaction from inserting a new row that would alter the results of the first transaction.
Controlling locks
It’s worth mentioning that select statements can actually be altered to indicate that MySQL should place a specific type of lock on a row. By adding the for share keywords on the end of a select statement, MySQL will create a shared lock on that row. And using the for update keywords on the end of a select statement creates an exclusive lock.
Here is a modified version of the “dirty read” example from above that uses a locking read. This approach would prevent another transaction from reading the data before it was updated:start transaction;
select * from products where id = 20 for update;
update products set cost = 800 where id = 20;
commit;

The four MySQL isolation levels
Now that we’ve covered much of the prerequisite knowledge, let’s talk about how MySQL adheres to the Isolation requirement with various Transaction Isolation Levels.
Isolation levels instruct the database engine on how to manage multiple transactions being performed concurrently, and what violations are possible.
Isolation levels are set per session and each database has a default mode that will be used by every session if it is not changed.
Let’s explore the four isolation levels and what violations they each prevent and allow.
Read uncommitted
Read uncommitted is the lowest isolation level, effectively allowing all violations. This is because transactions with this isolation level will always read the latest version of a row that's been modified by any transaction, whether it's been committed or not (more on row versioning in a bit). This mode should be used when performance takes priority over data consistency.
Deviating from the example outlined above, one situation where this might fit well is in a social media app when finding the number of likes a popular post has. Many social media apps shorten their numbers after a certain point (ie; 1k over 1,000). If a client needs to know how many likes the post has, returning an approximation over the exact value is usually acceptable.
Read committed
Read committed is the step above read uncommitted and prevents dirty reads.
In this mode, each row modified will create a version of that row which is tagged by the transaction ID. Subsequent operations on that row will use the version belonging to that transaction. Each select within that transaction will create and use a fresh snapshot (a version of the read rows at that point in time). This means that if a row is modified and has been comitted by another transaction, the next select will create a new snapshot which now represents the latest committed version of that data.
The key difference between read committed and read uncommitted is that read committed will always use the latest comitted row, whereas read uncommitted will the latest version of that row even if it hasn't yet been committed.
This mode will still allow non-repeatable read and phantom read violations.
Repeatable read
Repeatable read is the default MySQL isolation level used for all connections unless configured otherwise.
With InnoDB's MVCC (Multi Version Concurrency Control), a row may have multiple versions at any given time, depending on open transactions. In repeatable read level, queries will use a consistent snapshot of the data, pinning the reads to a single transaction ID throughout the transaction. Compare that with read committed where reads always pick the latest committed version of any row, and where subsequent reads may return different results.
With locking reads, the method that MySQL uses to lock the data depends on whether a unique index is used.
If an index can be used based on the where condition, MySQL will only lock the necessary rows that match the query. If an index is NOT used and the table is scanned, MySQL will lock all of the rows it reads regardless if they match the where, as well as perform gap locking to prevent inserts that may alter the data if the query is run multiple times.
Using the repeatable read isolation level prevents dirty reads, non-repeatble reads, and phantom reads.
Serializable
Setting the isolation level to Serializable also prevents all violations, but it has the most performance impact on your MySQL server. This mode works exactly like Repeatable read, but implicitly creates a shared lock on all select statements whether you use for share or not. Due to the excessive locks used, there is a greater risk of deadlocks occuring.
How to set a MySQL isolation level
Setting a specific MySQL isolation level is relatively straightforward, and can be done on a specific session or on the entire server.
The following statements will set the isolation level on the current session. The SESSION keyword is optional if you run it prior to starting a transaction, but will error out if you try:set transaction isolation level read uncommitted;
set session transaction isolation level read committed; -- 'session' is optional

If you have the CONNECTION_ADMIN user permissions, you can also use the following command to set the default isolation level across the server. This only affects future sessions, and any session is open to change its own isolation level regardless of what the server has it set to:set global transaction isolation level serializable;

Finally, isolation levels can be configured prior to MySQL starting, by using the --transaction-isolation=READ-UNCOMMITED or in the configuration file:[mysqld]
transaction-isolation = REPEATABLE-READ

Conclusion
As you can see, MySQL uses a combination of several different concepts to ensure that it’s ACID compliant, making it a reliable and trustworthy database. Many of these concepts revolve around transaction isolation and the various levels available, allowing you to fine tune the balance between safety and performance!]]></content>
        <summary><![CDATA[Learn about the various isolation levels used by MySQL to allow concurrency in your database.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing the schemadiff command line tool</title>
        <link href="https://planetscale.com/blog/schemadiff-command-line-tool" />
        <id>https://planetscale.com/blog/schemadiff-command-line-tool</id>
        <published>2023-12-18T09:00:00.000Z</published>
        <updated>2023-12-18T09:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[One of the core benefits of PlanetScale database clusters is the workflow enabling zero downtime schema migrations, which are made possible by database branching.
A branch in PlanetScale is technically its own independent MySQL cluster. Deploy requests are used to apply the schema changes from one branch to another. As part of this workflow, we heavily utilize the schemadiff command provided by Vitess to calculate the schema differences between branches, determine the order of changes (including the potential for migration concurrency), and participate in three-way merge logic to apply changes to an upstream database branch.
Today we are releasing the schemadiff command line tool, a thin wrapper around Vitess's schemadiff library.
Getting started with schemadiff
The schemadiff command line tool makes it easy to validate, normalize, and diff schemas, loaded from the standard input, the file system, or from your MySQL database.
To try out schemadiff yourself, check out the schemadiff repository's releases page for the latest binaries available for Linux and Mac. The README also contains additional information on how to use the tool, when you might want to use it, and how to build it from source if you want to target another platform.
Let's take a look at a few use cases.
Describe a database
The load can be used with a source to show what's your existing database:schemadiff load --source 'myuser:mypass@tcp(127.0.0.1:3306)/test'
-- Output:
CREATE TABLE `t` (
	`id` int,
	PRIMARY KEY (`id`)
);
CREATE TABLE `t2` (
	`id` int,
	`name` varchar(128) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

It can also accept a string to prettify and normalize it, which can make a schema much easier to read:echo "create table t (id int(11) unsigned primary key)" | schemadiff load
-- Output:
CREATE TABLE `t` (
	`id` int unsigned,
	PRIMARY KEY (`id`)
);

Attempting to load an incorrectly formatted SQL file will result in an error. This can be extremely useful as part of a CI/CD pipeline to validate change scripts before they are applied to your database:schemadiff load --source mydb.sql > /dev/null || echo "FAIL"

Showing changes
The diff command can compare two schemas and write the necessary DDL to execute to get them in sync. This is how we generate our change statements when merging branches in PlanetScale.schemadiff diff --source 'myuser:mypass@tcp(127.0.0.1:3306)/test' --target /path/to/repo/source/code/schema
DROP VIEW `v`;
ALTER TABLE `t` MODIFY COLUMN `id` bigint;
CREATE TABLE `t2` (
	`id` int,
	`name` varchar(128) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

For more use cases, be sure to review the README for this project. The schemadiff command line tool supports MySQL 8 syntax and is released under Apache 2.0 license.
We hope you find it useful!]]></content>
        <summary><![CDATA[We are releasing schemadiff, an open source command line tool to generate diffs between two MySQL databases.]]></summary>
      </entry>
    
      <entry>
        <title>$ pscale ping</title>
        <link href="https://planetscale.com/blog/pscale-ping" />
        <id>https://planetscale.com/blog/pscale-ping</id>
        <published>2023-12-13T15:50:00.000Z</published>
        <updated>2023-12-13T15:50:00.000Z</updated>
        
        <author>
          <name>Matt Robenolt</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[How close am I to PlanetScale? Which region should I choose for my database?
These might be oddly difficult questions to answer — for many reasons. Maybe you want a development branch that's closest to you, the developer, at your house. Or your application is deployed into an infrastructure provider that isn't AWS or GCP. Or, you're just curious.
Now, with pscale ping, you can easily determine which region is most optimal based on your location. This CLI command lists the latency for every region based on your location.

Try it out yourself by installing our CLI. For best results, run pscale ping from within the same environment as your application, or wherever you'd expect to be connecting to it.
It's quite a small thing, but we think will be super useful if you are not sure which is best for your situation!
You can find more information in our Network latency documentation.]]></content>
        <summary><![CDATA[How close am I to PlanetScale? Use our new CLI command, pscale ping, to find out.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing foreign key constraints support</title>
        <link href="https://planetscale.com/blog/announcing-foreign-key-constraints-support" />
        <id>https://planetscale.com/blog/announcing-foreign-key-constraints-support</id>
        <published>2023-12-05T10:00:00.000Z</published>
        <updated>2023-12-05T10:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today, we're adding support for foreign key constraints on PlanetScale!
We understand that making database changes can be a daunting task. Now, you can use PlanetScale without reworking your existing application or data model to remove foreign key constraints. You can also use PlanetScale out of the box with your favorite ORM or framework that uses foreign key constraints by default, no changes needed.
You no longer have to sacrifice foreign key constraints when using database branching, non-blocking schema changes with Online DDL, database imports, and scaling your database with PlanetScale.
What is a foreign key constraint?
A foreign key constraint is a database construct and implementation that enforces the integrity of foreign key relationships (referential integrity). Specifically, it ensures that a child table can only reference a parent table when the appropriate row exists in the parent table. This is not to be confused with foreign keys, which allow you to cross-reference related data across tables.
Developers should weigh the advantages and disadvantages of using foreign key constraints for their specific application, which are discussed further in the foreign key constraints documentation.
If you want a behind-the-scenes look into foreign key constraints in PlanetScale, read our blog post about the technical challenges of supporting foreign key constraints.
Foreign key constraints beta
Update: Foreign key constraint support is now GA. You can enable foreign key constraint support in your database settings page.
The new foreign key constraints support is available today in beta on a per-database level for all unsharded databases. Sharded database support will come next year. When you opt in to the beta, we upgrade your Vitess cluster, and in some cases, we upgrade the MySQL version.
To opt in to an existing database, go to your database's "Settings" page and enroll on the "Beta features" page. Note: You must be an organization administrator to enroll in beta features.
On the database's "Overview" page, you will see a loading spinner that says it is "Enabling foreign key constraints." Once it no longer shows, you can use foreign key constraints in your PlanetScale database!
If you want to unenroll your database from the beta, make sure to first drop your foreign key constraints. We do not downgrade your database at this time.
Most users with newer databases can use foreign key constraints within minutes. Older databases may take longer.
If you have existing internet-accessible MySQL or MariaDB databases that use foreign key constraints, you can also now import them into PlanetScale using our database import tool.
For more details about the beta, see the foreign key constraints beta documentation.
Try it out today
We have tested the beta with various popular ORMs and frameworks, but if you experience any issues with your ORM or framework of choice, let us know! While we are in this beta phase, it is helpful to hear about any problems you experience using foreign key constraints in PlanetScale. Please contact us if you have anything to share or have any questions.]]></content>
        <summary><![CDATA[You can now use foreign key constraints in PlanetScale databases.]]></summary>
      </entry>
    
      <entry>
        <title>The challenges of supporting foreign key constraints</title>
        <link href="https://planetscale.com/blog/challenges-of-supporting-foreign-key-constraints" />
        <id>https://planetscale.com/blog/challenges-of-supporting-foreign-key-constraints</id>
        <published>2023-12-05T09:00:00.000Z</published>
        <updated>2023-12-05T09:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        <author>
          <name>Manan Gupta</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Today, PlanetScale announced support for foreign key constraints.
This has been in the making for quite a while — around a year in fact. So, what was the problem? Why did it take so long? It's something that any and every database supports, right? Yes and no. To support foreign key constraints is one thing. To do so while still delivering Online DDL, gated deployments, online imports, and eventually cross shard support, is another.
When we launched PlanetScale, we refrained from supporting foreign key constraints, first and foremost because we could not make them work with branching and Online DDL. It was a realization that came while we were building the product, and we took it as a fact of life, something we’d have to work around. Around a year ago, we decided to really dive in to see what it would actually take to support them.
It turns out, we had to overcome challenges in almost every single layer of the product: some due to MySQL limitations, some due to how we want to incorporate foreign key support with PlanetScale’s (through Vitess) Online DDL, and some due to branching and schema analysis logic. As you will see in this post, these are all closely intertwined.
Branching and Deploy requests
The first challenge we had to deal with was handling branching and deploy requests. With PlanetScale branching, the user works in a development environment where they’re free to make schema changes. Once they’re ready to deploy those changes, they submit a Deploy request, which lets them review the schema changes they’ve made before deploying them to production.
The Deploy request page shows the user the semantic diff between the main (base) branch and their own (head) branch. PlanetScale uses three-way merge to determine the diffs.
Supporting the foreign key constraint schema definitions is relatively simple, and is but a matter of understanding the SQL syntax involved. However, the semantic analysis is much more involved. Consider that in MySQL (or in InnoDB, rather, as we will discuss later on), the following rules must apply for any foreign key constraint definition:
A foreign key constraint is a relationship between a parent table and a child table. As a special case, a table may reference itself as its parent.
The referenced (parent) table must exist.
The foreign key reference columns in the parent table must exist.
The referenced columns in the parent must be indexed in-order. There has to be an index covering the referenced columns in the same order they’re referenced by the constraint. Optionally, the index may proceed to cover more columns.
The child’s columns must match the referenced parent columns in count and in data type. An INT column on the child may not reference a BIGINT column on the parent. Interestingly, it’s okay for VARCHAR(32) vs VARCHAR(64) in child and parent.
These rules are strongly enforced by the MySQL server (assuming FOREIGN_KEY_CHECKS=1; things get less consistent when FOREIGN_KEY_CHECKS=0), which means that by the time the user submits their Deploy request, it should be safe to assume that the branch adheres to those rules. However, when PlanetScale evaluates a Deploy request, it not only computes the diff between branches, but also the path towards converting one branch (base) into another (head). That path is a valid sequence of steps which maintain the schema in a valid state at all times. Moreover, it evaluates whether, and how, these steps may be deployed all at once.
Foreign key constraints introduce a new complexity to that evaluation. In the simplest form, they may require a specific order of deployment. As a trivial example, suppose we were to create a parent-child pair of tables, such as these (simplified):create table parent (id int primary key);
create table child (id int primary key, parent_id int, constraint parent_id_fk foreign key (parent_id) references parent (id));

The deployment plan must of course first apply the creation of parent, and then the creation of child. The reverse order is invalid because child must reference an existing parent table.
Some changes are eligible to run concurrently. Suppose we have the following:create table t1 (id int primary key, ref int, key ref_idx (ref));
create table t2 (id int primary key, ref int, key ref_idx (ref));
create table t3 (id int primary key, ref int, key ref_idx (ref));

And, suppose we add foreign key constraints onto t2 and t3, such that the diff evaluates to:ALTER TABLE `t2` ADD CONSTRAINT `t2_ref_fk` FOREIGN KEY (`ref`) REFERENCES `t1` (`id`) ON DELETE NO ACTION;
ALTER TABLE `t3` ADD CONSTRAINT `t3_ref_fk` FOREIGN KEY (`ref`) REFERENCES `t2` (`id`) ON DELETE NO ACTION;

The two diffs may run concurrently, even though they both affect table t2 (one directly, and one indirectly). More on why this is at all possible when we discuss Online DDL.
Even more complex is a scenario where two migrations cannot execute concurrently. Assume the base schema looks like this:create table t1 (id int primary key, info int not null);
create table t2 (id int primary key, ts timestamp);

And that our branch schema is:create table t1 (id int primary key, info int not null, p int, key p_idx (p));
create table t2 (id int primary key, ts timestamp, t1_p int, foreign key (t1_p) references t1 (p) on delete no action);

The diffs are:ALTER TABLE `t1` ADD COLUMN `p` int, ADD INDEX `p_idx` (`p`);
ALTER TABLE `t2` ADD COLUMN `t1_p` int, ADD INDEX `t1_p` (`t1_p`), ADD CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`t1_p`) REFERENCES `t1` (`p`) ON DELETE NO ACTION;

However, it is impossible to even begin the migration on t2 before the migration on t1 is fully complete. That’s because MySQL strictly requires an index to exist on the parent table’s referenced columns. Assuming the parent table is already populated, we’re looking at a potentially significant time running the first migration before we can begin the second.
Last, and of course this is not strictly limited to foreign key constraints, you can go too far with your branch changes. PlanetScale attempts to reduce the diff to a single change per table/view. If a branch differs so much from base that it takes multiple steps to operate on the same table to get it into base, then the Deploy request is not deployable.
Courtesy of schemadiff, this is all evaluated in-memory at the time a Deploy request is created.
Handling reverts
Within branching, we also had to deal with handling reverts. PlanetScale’s deployments are revertible. If you deploy a schema change, and then realize you made a mistake, we offer you the option to revert that migration while still keeping all the data that may have changed during and after the deployment.
There exist some scenarios where reverts are not possible. For example, if you modify a column from TINYINT to INT, complete the deployment, and populate some row with the value of 256, then that value is outside TINYINT range, and the change cannot propagate back to the original table.
Foreign key constraints create a different limitation. Let’s illustrate with a simple example. Say you have parent and child with foreign key relationship. You choose to drop the child table, meaning there’s now no foreign key constraint. You then proceed to DELETE FROM parent. You then wish to revert dropping child. This is possible, but it creates an inconvenient situation: the restored child table will have orphaned rows, for we have deleted all rows from parent.
The same logic applies to any destruction or editing of a constraint. You can’t trust that the reverted table complies with the constraint, for you may have modified the data in an incompatible way while the constraint was removed.
PlanetScale will allow you to make such reverts, but we will warn you that the change may not be revertible. The schema will be fine, but orphaned rows are possible.
Query serving
The next challenge we dealt with was query serving. At this time, foreign key constraint support is limited to unsharded/single shard databases. We expect to support shard-scoped foreign key constraints in a multi-shard environment, or even cross-shard foreign keys, but we limit the current discussion to a single shard. This means that a query that uses a table that’s in a foreign key relationship will only operate on a single backend database server.
Vitess, the underlying engine behind PlanetScale, normally optimizes execution of such queries by delegating them to the backend MySQL server. However, a MySQL limitation that affects other critical components of Vitess calls for a different course of action. To understand why, let’s digress and discuss Online DDL. We will then return to complete the query serving story.
Online DDL
Perhaps the largest challenge so far has been with Online DDL. PlanetScale offers non-blocking schema changes, which are very compelling, especially when altering large tables in production. The way they work is depicted in detail in our Online schema change tools documentation, but, in short, altering a table works as follows:
We create a new, empty table, termed the "shadow" table, with the same definition as the original table.
We apply the ALTER statement to the shadow table.
We copy over all existing data from the original table to the shadow table, taking into consideration the schema difference between the two.
We follow the changelog on the original table, to capture any ongoing changes, and apply them on the shadow table.
Finally, when we are satisfied the shadow table is in sync or almost in-sync with the original table, we take a lock to prevent writes to the original table, apply what remaining changelog there may be, and finalize the operation by swapping the two tables. The shadow table takes the original table’s place, and the original table becomes the shadow.
This technique, used by several tools for online schema changes, cannot work with the existing MySQL foreign key constraints implementation. Online DDL has issues with:
Tailing the binary logs for child table with a cascading (SET NULL or CASCADE) action.
Finalizing a foreign key parent table, as children’s foreign key constraints migrate with the old table as it is being swapped.
Backfilling the shadow of a foreign key child table, where as a copy of the child table’s schema, the shadow table itself has a foreign key referencing the same parent.
Each of these limitations calls for a bespoke solution. Let’s break it down.
Tailing the changelog
To backfill the shadow table, Online DDL uses VReplication, one of the most powerful components in Vitess. VReplication is the component behind online imports, materialization, live resharding, Online DDL, and more. It can live stream data from a source (or from multiple sources) to a target (or to multiple targets), while possibly manipulating or aggregating the data in transit.
VReplication works by both reading the existing data on the source table(s), as well as by following the changelog: the live, ongoing manipulation of data. The changelog is available in MySQL as the binary logs. VReplication subscribes as a replica and pulls binary log changes from the source server. This is where a major MySQL limitation hits.
MySQL foreign key constraint support
To the casual MySQL user, MySQL is known to support foreign key constraints. However, this is nuanced. MySQL has a pluggable storage engine architecture. Historically, the company MySQL AB focused on non-transactional storage engines. The MySQL engines did not support foreign key constraints. It was a 3rd party pluggable engine named InnoDB that fast became the engine of choice to most users, offering transactional (MVCC) support, as well as foreign key constraint support. There was some period in time when MySQL began work towards implementing foreign key constraints at the server level, but given how MySQL adopted InnoDB as its primary storage engine (now both under Oracle’s ownership), that effort was dropped.
The most important implication, for our discussion, of MySQL not supporting foreign keys at the server level, is this: any cascading of operations (say an ON DELETE SET NULL or ON UPDATE CASCADE) is only done by InnoDB. If you DELETE or UPDATE a row on a parent table, and that, in turn, affects rows in a child table, those changes to the child are done internally in InnoDB, and are never logged to the binary log.
Those changes are hidden from anyone consuming the binary log. MySQL trusts the replica server’s InnoDB engine to correctly replay those cascaded writes for it to remain consistent with the primary server.
Any Change Data Capture (CDC) tool, that tails the binary log or masquerades as a replica, will be missing data when reading a child table’s events when the child table has SET NULL or CASCADE foreign key actions. You cannot reliably replay events on those tables, and you will end up with corrupt data, or trying to apply an impossible statement. By way of experiment (not on your production environment), try removing the foreign key constraints on a replica server, then see how long it takes for replication to break.
For Online DDL, this means we cannot apply a change to a table that has a CONSTRAINT ... FOREIGN KEY ON [DELETE|UPDATE] [SET NULL|CASCADE].
One way of solving this issue would be to make InnoDB log cascaded statements. PlanetScale maintains a fork of MySQL, and we looked into making these changes in MySQL. However, we ended up choosing to implement foreign key constraint logic in Vitess itself. It’s a trade off, with these considerations:
The InnoDB codebase is very much detached from the MySQL codebase. There are parallel constructs, parallel entities in MySQL and in InnoDB, which are completely different and do not translate well to one another. Making the necessary changes appears to be a substantial undertaking with high risks.
Adding the necessary logic to Vitess was also a substantial amount of work. However, our Vitess maintainers are authoritative on the topic and had clarity of the scope of work. The risk was low.
In the future, we would like to be able to support foreign key constraints in a multi-shard environment. No matter how well we patch MySQL, and even if did provide the necessary cascading changelog, foreign keys would always remain an in-server constraint. There is some technology for running cross-server queries via connectors, but it is subpar (or incapable) in its ability to collaborate inside transactions, or in its ability to apply the correct locking. By implementing foreign key logic in Vitess early on, we pave the way towards multi-shard environments in the future.
It should come as no surprise that there is a downside to implementing foreign key constraint logic in the Vitess level. It requires more locking and more communication with the MySQL server. Vitess optimizes where it can and delegates some queries directly to the underlying server where possible, but there is a non-zero performance impact to using foreign key constraints with Vitess/PlanetScale.
How Vitess implements foreign key logic is described later on, when we revisit Query serving. In the meantime, let’s proceed to discuss the next Online DDL limitation.
Altering a parent table
Described in more detail in this blog post, if you have a foreign key pair of a parent and a child, and you RENAME TABLE parent TO old_parent, the child’s foreign key constraint follows the parent table onto its new name, old_parent. In the internal InnoDB implementation, the parent table retains its memory address, and, in a way, "following the parent to its new name" really means nothing changes in the children’s foreign key references, as they keep using the same pointers. It’s mostly the external facing schema definition that changes to reflect a FOREIGN KEY (...) REFERENCED old_parent (...).
Running Online DDL on a parent table, we want to swap the parent table and its shadow. We want the children to be oblivious to the swap. We want them to point to the parent table name. Of course, we need to take it upon ourselves to ensure the replacement table, the shadow table, is compatible with the existing foreign keys. Imagine swapping in a shadow table that doesn’t have the expected referenced column(s)!
In this patch to MySQL, we introduce a new server variable, rename_table_preserve_foreign_key. When set to 1, a RENAME TABLE preserves the foreign key definition on the children by pinning it to the name of the table.
Altering a child table
Suppose we alter a foreign key child table. Say we just modify some column from INT to BIGINT. The foreign keys that exist on the table should be reflected in the shadow table. For the duration of the Online DDL operation, however, the shadow table is incomplete. It it not in sync with the original table, and may be missing rows that exist in the original table, or may contain rows that do not exist anymore in the original table. But if this shadow table has a foreign key constraint pointing to the same parent as the original table’s parent, then that parent is affected by the very existence of the shadow table. It’s possible for a DELETE on the parent to be rejected due to a matching row existing in the shadow table, even if that row does not exist in the actual child table.
When backfilling the shadow table, we may choose to SET FOREIGN_KEY_CHECKS=0, but the application and users of the database will naturally expect to work with FOREIGN_KEY_CHECKS=1.
Moreover, even if we somehow manage to pull through and swap the tables, what happens now with the old child table? It becomes stale, as production traffic keeps changing the new table. But it still keeps the foreign key constraints to the parent. Again, preventing legitimate queries from executing on the parent. If only it were possible to create the shadow table without foreign key constraints, and just add them at the time of the final swap. Alas, in MySQL, foreign key constraint definitions are created in the scope of a child table. To add a foreign key constraint, or to remove a foreign key constraint, is a full table rebuild operation, locking and blocking; the very thing Online DDL was designed to overcome.
In our patch to the MySQL server, we introduced the notion of internal operations tables, something already well established within Vitess. Essentially, we tell MySQL: ignore any foreign key checks for these special tables. Do not try and validate row data. Do not attempt to cascade anything. It’s as if we SET FOREIGN_KEY_CHECKS=0 for specific tables, even while the transaction otherwise applies foreign key logic to other tables.
This way, MySQL/InnoDB completely ignore our shadow table. It can be out of sync, the parent table won’t mind. And as we swap away the original table, MySQL ignores it under its new name. It will not affect production traffic.
Query serving, revisited
Earlier, we discussed the reasoning behind Vitess owning foreign key logic. What does it mean for Vitess to own that logic?
VTGate is the Vitess component that handles all query serving. It is a proxy between an app and the underlying database, which intercepts queries and creates concrete execution plans, sent to the backend databases. For example, in a multi-sharded environment, VTGate analyzes which shard or shards should be queried for a given SELECT and possibly combines results from multiple shards before returning them to the app.
VTGate can make various optimizations based on the fact that it knows the structure of the underlying schema, the sharding scheme, and the production traffic patterns. It can cache, lock, buffer or altogether modify queries. It is now also aware of foreign key constraints and the specific rule actions. We will limit the discussion to unsharded or single-shard (the two look the same to the app) environments.
Starting with the simplest scenario, an ON UPDATE RESTRICT ON DELETE RESTRICT foreign key can be simply handled by the underlying MySQL server, with no additional planning required from VTGate. A RESTRICT (aka NO ACTION) rule means if you’re deleting or updating a parent row, and such an action would invalidate a child’s row, then the action is rejected. There’s no cascading effect: either your query is rejected, or is accepted and only modifies the parent table. Since there’s no cascading effect, we do not need to account for missing entries in the binary logs.
Let’s now illustrate the case for an ON DELETE CASCADE. A user issues a DELETE FROM parent WHERE id=7. There could be entries in a child table with matching parent_id=7. If we ran the query directly in MySQL, it would only apply the DELETE on parent, letting InnoDB take care of deleting whatever matching rows are found on child. But those deletes will not be visible in the binary logs. Vitess, therefore, has to take control.
It must first explicitly DELETE FROM child WHERE parent_id=7 — thereby ensuring that any affected rows are recorded in the binary log — and only then issue a DELETE FROM parent WHERE id=7. When faced with this latter DELETE on parent, InnoDB still checks for matching rows in child, but finds none, because Vitess had already purged them.
At least, that’s the naive outline. The implementation is more complicated:
The DELETEs on parent and child must be part of the same transaction. The changes made by the statement should be atomic, i.e. all or none are applied.
While we DELETE FROM child, we must acquire a write lock on parent, specifically for row id=7.
To do that, we must first issue a SELECT col FROM parent WHERE id=7 FOR UPDATE.
Using the foreign key column values from the parent select, execute the DELETE FROM child WHERE col in (parent_column_values).
The child table could itself be a foreign key parent for another table, and let’s say it also has an ON DELETE CASCADE child. The same logic is applied recursively: lock the relevant rows on child, DELETE from the grandchild, then delete from child, and then back to parent.

As you can see, this implementation introduces more back-and-forth communication between VTGate and the backend MySQL server, as well as having additional locking for correctness.
To show how weird things can get, let’s also illustrate an ON UPDATE CASCADE scenario. Let’s assume we issue a UPDATE parent SET id=8 WHERE id=7. This is a generalized example; as a rule of thumb, one should avoid modifying PRIMARY KEY values, but foreign keys do not necessarily need to refer to PRIMARY KEY columns, or even to UNIQUE values.
Similarly to the DELETE scenario, we want to ensure the cascaded updates appear in the binary log. InnoDB won’t do it for us, so we have to do it ourselves. We want to first issue an UPDATE child SET parent_id=8 WHERE parent_id=7, and only then update the parent table. However, we cannot run that update on child, because there is yet no parent row where id=8!
In this scenario, Vitess must turn off FOREIGN_KEY_CHECKS for the update. This leads Vitess to take responsibility for the integrity of the data, as disabling FOREIGN_KEY_CHECKS leads to MySQL/InnoDB skipping all the foreign key validations for child and parent. Any ON UPDATE RESTRICT for the grandchild or existence of foreign key value in a different parent table needs to be handled by Vitess.
More complex scenarios involve self-referencing tables or statements with non-literal values. They all take a great amount of attention. Later on, we will explain how we test all the above and what gives us confidence that we aren’t missing anything.
Database imports
PlanetScale lets you import your data from an external MySQL environment, and without interruption to that environment. Like Online DDL, Imports utilize VReplication. The external MySQL servers are outside PlanetScale's control, and, of course, will not include any of the changes discussed above. In light of that, how can PlanetScale use VReplication to import a database that has foreign key constraints, specifically with SET NULL or CASCADE actions?
Normally, VReplication moves data from a source (or multiple sources) to a target (or multiple targets) by alternating these two operations:
Transactionally reading existing rows from the source table(s) and copying them over.
Tailing the changelog (binary logs) and applying them to capture ongoing changes.
VReplication takes a surgical approach where it keeps track of per-transaction GTID (global transaction IDs, also known as positioning). When it reads existing rows data, it marks the GTID value that applies to the read transaction. That read will take a while on large tables, and during that time, new events will modify the tables. When VReplication switches to applying the changelog, it only evaluates events occurring as of the last read transaction. A third, "fast-forward" smaller phase then glues the next read transaction with the remaining changelog events up to that transaction. VReplication is also able to entirely skip changes to rows that it has not copied yet, by way of reducing unnecessary work.
With cascading foreign key rules, information is lost. VReplication may attempt to import a child table that has, for example, FOREIGN KEY ... ON DELETE CASCADE. As parent rows get deleted, so do matching rows on the child. However, those deletes on the child never make it to the binary logs. VReplication cannot trust that the binary log is complete.
In closer examination, if the DELETE on the parent makes it to the binary log, and is captured by VReplication and replayed on the target server, won’t the InnoDB engine also replay the cascade on the target server? Yes — assuming the relevant row is already in position on the target, but that’s not guaranteed. At any given time, only parts of any certain table will have been copied. You cannot apply the DELETE to row 1000 on the target, if you’ve only copied rows 1..500.
The solution for database Imports is to import the existing table data in a single, large bulk, followed by tailing the changelog. The two do not alternate. For people familiar with MySQL operations, this is not dissimilar to a point-in-time recovery method, where you first restore a full snapshot of the database, followed by applying a long sequence of binary logs. In this approach, all replayed events necessarily operate on existing data. And while there are still no events for some of the child table changes, we can trust the InnoDB engine on the target side to replay them appropriately.
It’s worth noting that Imports is not expected to work in conjunction with other VReplication operations, i.e. we won’t be running, in PlanetScale, an Online DDL on a table that is being imported. This distinction is important because the import process relies on InnoDB to apply cascading changes, which in turn means some changes are missing in the target’s changelog. In the future, and because some of Vitess’ workflows do integrate with each other, we can expect VReplication to apply changes to foreign key tables through the query serving mechanism discussed above.
Testing foreign key constraint support
These changes call for rigorous testing. We use multiple testing techniques, from unit testing to end-to-end (e2e) testing, from planned to unplanned scenarios. Here, we illustrate two specific test suites:
Application-oriented e2e stress tests with Online DDL.
Fuzzer stress tests, comparing Vitess to MySQL behavior.
Application stress tests
In this test suite, we create a hierarchical foreign key structure of tables, and emulate an app that writes to those tables. The app keeps tracks of the changes it makes and the response it gets from the (Vitess) database. At the end of the test, we compare the app’s expectations with the actual table data.
That’s just the high level description. The suite also:
Sets up a variety of foreign key situations, with one parent, two children, and one grandchild table.
Ensures high concurrency of writes.
Ensures high contention writes take place on all tables. Changes made by different connections are highly likely to collide and conflict.
Tests all foreign key rules: ON DELETE NO ACTION / SET NULL / CASCADE and ON UPDATE NO ACTION / SET NULL / CASCADE. Cascading (SET NULL and CASCADE) rules add to the write contention even more.
Runs the stress tests first without, and then with, Online DDL, on each of the hierarchy tables.
The underlying database cluster uses MySQL replication, with foreign key constraints completely removed on the replica.
Let’s take a look at some deeper impact of this test design.
The four table structure showcases a parent-only table, two child-only tables, one table which is both parent and child, and a multi-child scenario. This hierarchy does not represent all possible variants of foreign key relationships (for example, it does not depict a multi-parent setup, or a self-referencing table), but does have enough variance to cover the MySQL fork changes, as well as the fundamental VTGate logic and locking scenarios. We use a classic foreign key relationship: child tables reference the PRIMARY KEY of the parent. This is to assist with expectation management. For other scenarios, see fuzzer tests discussion, below.
The app emulation uses classic INSERT, UPDATE, and DELETE statements, randomly generated. The app does not know in advance what end result it generates. However, it intentionally limits the range of affected rows to a small enough scope, that, with concurrency and with many cycles, creates high contention on all table rows. Some of the app’s auto-generated queries will have no effect and will go to waste. For example, the app may randomly attempt a DELETE FROM stress_parent WHERE id=7 AND updates=1. Possibly, id=7 row does not exist, and the statement affects zero rows. In high volumes, however, many queries will have actual effect. The app records the success and effect of any of its statements, and is able to finally conclude how many rows it expects to find in the table, and what actual data should be there.
We run a test where ON DELETE has NO ACTION rule, and a test where ON DELETE has CASCADE rule, etc. We run combinations of rules. In a CASCADE scenario, it is difficult, or impossible, given the random nature of the query design, for the app to know what data to expect in children tables following a DELETE on a parent. How can we verify that VTGate has made the correct choices, and has cascaded the DELETE correctly? This is where MySQL replication comes in handy.
As we illustrated before, VTGate implements a ON DELETE CASCADE by first applying the DELETE on children (and, recursively, first on grandchildren) before applying on the parent. As we mentioned, by the time InnoDB sees the DELETE on the parent, there’s nothing for it to cascade. We expect the binary logs to fully represent our cascading logic. As mentioned above, we use a MySQL replica, and on that replica we actually strip away all foreign keys on all tables via ALTER TABLE ... DROP FOREIGN KEY. MySQL allows that. However, the InnoDB engine on the replica now has no knowledge of foreign keys, and will not attempt to replay/reproduce any InnoDB cascading logic made on the primary server. That might break replication! At the very least, it will cause data drift between parent and child, and due to the high contention nature of the test this will soon lead to an unresolved inconsistency.
If VTGate does everything right, though, the data will remain consistent. If VTGate leaves nothing for the InnoDB engine to cascade on the primary server, then no binlog events will be missing on the replica. The replica will be able to replay VTGate’s explicit cascaded operations and to remain consistent with the primary server. The test suite does exactly that, and validates not only that replication is unbroken at the end of each test, but also that both primary and replica report the same data metrics.
Foreign key constraints in MySQL introduce more locking, by nature, and, as we’ve seen, Vitess adds even more locking scenarios. These workloads ultimately run into locking scenarios. The test ensures we are still able to make progress and that we don’t end up in a complete lock down. It goes without saying that any database can be overwhelmed by too much traffic or by too many connections, and it’s always possible to shoot yourself in the foot by introducing extremely contentious scenarios. The test suite keeps load under reasonably contentious control.
Some of the tests suffice with the above workload, and some add an Online DDL operation on top. While the workload runs, and under high contention, we pick any of the tables and run Online DDL, wait for the cut-over to complete, and, only then, give the green light to complete the test. This tests the changes in the MySQL server under load. We verify that no rows are lost when the original and shadow tables are swapped. We verify that no child table ends up with orphaned rows. We verify none of the shadow tables, or the aftermath artifact tables, has any logical effect on the app’s traffic.
Fuzzer tests
We initially started with adding tests for individual cases and queries as we went along adding support for them. But we soon realized that with the amount of possibilities of CASCADEs, SET NULLs, RESTRICTs and their interactions with each other, we would have to add a lot of tests to ensure everything worked. So, instead, we tried a different approach. We started with a set of 20 tables having foreign key relations amongst each other, such that we had good coverage of all the different possibilities. We ensure that we had cases like having a cascade rule on a child that has another cascade rule, having a restrict on a child that has a cascade rule, amongst others. Then, we wrote a fuzzer to generate different DML queries to hammer the database and verify we do the right thing in Vitess.
We introduced two distinct types of tests. The first type involves single concurrency tests, where we initiate a Vitess cluster alongside a separate MySQL instance. Both instances are initialized with identical schemas, and we execute the same queries on both through a single thread. This approach guarantees deterministic output, serving as a validation that Vitess and MySQL exhibit consistent behavior for all DML queries supported by Vitess.
The second type of tests encompasses multi-concurrency scenarios. In this case, we exclusively establish a Vitess cluster and initiate multiple threads, each executing DML queries concurrently. This particular test is designed to ensure that Vitess implements adequate locking mechanisms, preventing any correctness issues when numerous concurrent DML queries are executed. The primary goal is to ascertain that the database remains consistent throughout this concurrent operation.
As we added support for different query types, we kept expanding the fuzzer’s range of generated queries to ensure the correct implementation of each addition.Adding fuzzer tests to our suite was incredibly useful. They helped uncover issues that would have been hard to find using manual queries alone. Let’s take a look at some problems identified by the fuzzer and how we fixed them.
Updating a child table with CASCADE foreign key and grandchild RESTRICT foreign key
Our first test failure came when updating a table that has a child with CASCADE foreign key and a grandchild with RESTRICT foreign key, such that the update doesn’t cause an actual row change on the child.
Clearly, from the length of that header, we are talking about a very specific scenario!
Consider the scenario where you have the following schema:create table parent(id bigint, col varchar(10), primary key (id), index(col)) Engine = InnoDB;
create table child(id bigint, col varchar(10), primary key (id), index(col), foreign key (col) references parent(col) on delete cascade on update cascade) Engine = InnoDB;
create table grandchild(id bigint, col varchar(10), primary key (id), index(col), foreign key (col) references child(col) on delete restrict on update restrict) Engine = InnoDB;

We initially have the following data in the three tables:insert into parent (id, col) values (1, 3), (2, 2);
insert into child (id, col) values (1, 3), (2, 2);
insert into grandchild (id, col) values (2, 2);

Now, if you run a query like update parent set col = 2, it succeeds on MySQL.But Vitess was failing this query with a Cannot delete or update a parent row: a foreign key constraint fails error.
Upon diving further in, we found out the issue. When Vitess receives a query that requires cascades, we issue a SELECT query on the parent to lock the rows that are being updated, and then update the child first. As stated before, this update would fail on MySQL if we don’t run it with foreign key checks off. But, because we are running the query with foreign key checks turned off, it falls on Vitess to validate the RESTRICT foreign keys as well.
The update we construct for the child table looks like update child set col = 2 where col in (2, 3). To validate that none of the rows in the grandchild would prevent the updates from going through, Vitess was running a JOIN query to ensure that none of the rows being updated had a foreign key constraint with matching rows.
The SELECT query we used for validation looked like this:select 1 from grandchild
  join child on grandchild.col = child.col
  where child.col IN (2, 3)
  limit 1

From the outset, this looks correct. We are trying to find if there are any rows in the grandchild table that match the col of the child table for the rows being updated.
But here is the kicker. The row (2,2) is not actually resulting in any update! So, MySQL still allows the update to go through, because it doesn’t cause the data in grandchild to become orphaned!
The fix wasn’t too hard once we understood the problem. All we had to do was also exclude the rows that weren’t actually changing in the SELECT Vitess was running for verification.
So, we updated the query to look like this, and then everything worked as intended:select 1
  from grandchild
  join child on grandchild.col = child.col
  where child.col IN (2, 3)
  and child.col NOT IN (2)
  limit 1

This case illustrates just how powerful the fuzzer is and how good it is at finding cases that are rare enough that we wouldn’t have written manual tests for.
Arithmetic operations on a VARCHAR column causing 0 vs -0 problems
Consider the scenario where you have the following schema:create table parent(id bigint, col varchar(10), primary key (id), index(col)) Engine = InnoDB;
create table child(id bigint, col varchar(10), primary key (id), index(col), foreign key (col) references parent(col) on delete cascade on update cascade) Engine = InnoDB;

And let’s say we initially have the following data in the three tables:insert into parent (id, col) values (1, -5);
insert into child (id, col) values (1, -5);

Now, if you run a query like update parent set col = col * (col - (col)), then Vitess would end up with inconsistent data in the database:select * from parent;
+----+------+
| id | col  |
+----+------+
|  1 | -0   |
+----+------+

select * from child;
+----+------+
| id | col  |
+----+------+
|  1 |  0   |
+----+------+

After investigation, we found out that the problem was coming from the part where we are doing arithmetic operations on a varchar column.
Vitess first runs a SELECT query to get the final updated values for non-literal updates. Vitess runs SELECT id, col, col * (col - (col)) from parent. It then uses the output of this query to cascade the update onto the child. The problem was that the result of col * (col - (col)) evaluation is a value -0 of type FLOAT.
This caused Vitess to issue a query like update child set col = -0 where col IN ('-5'). MySQL, however, interprets 0 and -0 as the same values and ends up setting the col to 0 causing inconsistencies.
Once we realized the issue, the fix was again not too difficult. All we had to do was type cast the expression into the type of the column. Now, Vitess issues a query like SELECT id, col, CAST(col * (col - (col)) AS CHAR) from parent and this in turn causes the update to look like update child set col = '-0' where col IN ('-5'), thus fixing the issue.
It is valuable to appreciate that the problem only surfaces when the arithmetic expression evaluates to -0. If it evaluates to any other value, it doesn’t cause any issues. This case illustrates just how useful the fuzzer is in running so many queries that it is eventually able to find cases that only happen in very specific scenarios requiring multiple conditions to be met.
Parent table unique key locking issue
While adding REPLACE INTO support for foreign keys and making adequate changes to fuzzer, we discovered that select queries executed for foreign key cascade were not able to acquire an adequate level of lock on the unique key due to missing gap locks.This led to incorrect results, and further, plan execution would result in missing cascading rows, leading to incomplete data in the binary logs.
Let’s look at a simple example:drop table if exists some_table;
create table some_table (
    id bigint,
    col varchar(10),
    primary key (id),
    unique index(col)
) ;

insert into some_table(id, col) values (3, null), (4, 5);

-- Session 1
begin;
select col from some_table where col in (5) or id in (3)  for update;


-- Session 2
begin;
select col from some_table where col in (('5')) for update;
-- This should block


-- Session 1
delete from some_table where col in (5) or id in (3) ;
insert into some_table(id, col) values (3, 5);
commit;


-- Session 2
-- Unblocked but returns 0 rows
select col from some_table where col in (('5')) for update;
-- This repeatable query in Session 2 now returns 1 row
rollback;

As a solution, we went ahead using the NOWAIT lock to promptly acquire the lock for cascade selection. NOWAIT ensures immediate lock acquisition or failure, which may result in more foreign key-related DMLs failing, necessitating query or transaction rollback.
This approach, however, effectively addresses the problem of lock waiting and prevents incorrect results.
Foreign key constraints support limitations
There still exist some limitations in our support of foreign key constraints. Many have been touched on throughout this article, but we will recap them here:
While self-referencing tables are supported, cyclic foreign key references between different tables is not allowed.
Foreign key constraints names change on every deployment. This is largely due to the MySQL limitation (compatible with ANSI SQL specification) where constraint names must be unique to the schema.
Foreign key constraints are currently only supported in unsharded environments.
There are some scenarios where schema reverts can create orphaned rows.
For a full list of limitations, see our foreign key constraint documentation.
Summary
In summary, we faced several challenges in supporting foreign key constraints in Vitess and PlanetScale, but are extremely pleased with where we landed.
If lack of foreign key constraint support has been a barrier to you trying PlanetScale in the past, we welcome you to enable foreign key constraint support in your database settings page, and give PlanetScale a try. If you have any questions, don’t hesitate to reach out to us.]]></content>
        <summary><![CDATA[Today, PlanetScale launched support for foreign key constraints. This article covers some of the behind-the-scenes technical challenges we had to overcome to support them.]]></summary>
      </entry>
    
      <entry>
        <title>What is HTAP?</title>
        <link href="https://planetscale.com/blog/what-is-htap" />
        <id>https://planetscale.com/blog/what-is-htap</id>
        <published>2023-12-01T17:30:00.000Z</published>
        <updated>2023-12-01T17:30:00.000Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[At PlanetScale, we speak to developers at all stages of their database journey. The diversity of database products on the market today can make choosing the right one for your needs extremely difficult. The purpose of this blog post is not to reduce the complex database landscape to a simplistic view, but rather to offer a framework for developers to consider as they start to think about building applications for production.
In this post, we will explore how to identify data processing methods for your workload so you can optimize the performance, scalability, and security when choosing the database for your modern application.
A brief overview of the database landscape: OLAP and OLTP
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two different types of data processing systems that are often used together in traditional architectures.
OLTP is used for processing high volumes of small, transactional data. It's often used for applications such as e-commerce, banking, and social media.
OLAP databases are designed to handle complex queries that require analysis of large amounts of data. They are often used for applications such as business intelligence and data mining.
The combination of OLTP and OLAP systems provides businesses with a powerful tool for managing and analyzing their data. OLTP systems handle the day-to-day transactions, while OLAP systems provide insights into the data that can be used to make better business decisions.

In recent years, there has been a rise of HTAP (Hybrid Transactions and Analytics Processing) databases to target new applications that require both analytics and transactions. To understand HTAP, we need to understand the history of OLTP and OLAP systems.
Relational databases have been used for both transaction processing and analytics, but OLTP and OLAP systems have different characteristics. OLTP systems are designed for individual record insert/delete/update statements and point queries that benefit from indexes. OLAP systems are designed for batch updates and table scans. Batch insertion into OLAP systems is typically done through ETL (extract transform load) systems that consolidate and transform transactional data from OLTP systems into an OLAP environment for analysis.
HTAP: benefits and categories
Through extensive marketing efforts, HTAP has been positioned as a promising new computing paradigm that will solve performance, cost, and complexity challenges that arise from managing two separate workloads. HTAP systems can be broadly classified into three categories:
Shared-everything architectures
Shared-nothing architectures
Hybrid architectures
All data is stored in a single shared storage system
Each node in the cluster stores its own data
Combines elements of both architectures, typically storing transactional data in a shared-everything system and analytical data in a shared-nothing system
Simplest to implement and ensure data consistency; can be limited in scalability
More scalable, with possibility to scale horizontally; can be more difficult to implement and manage
Can be difficult to ensure data consistency, especially when there are concurrent transactional and analytical operations; can be limited in scalability
There have been many different approaches to building out these HTAP systems:
In-memory HTAP databases: In this type of architecture, each node in the cluster stores its own data in memory. This makes it possible to scale the system horizontally, but it can also make it more difficult to ensure data consistency. It also becomes more expensive because it requires a lot of memory.
Columnar HTAP databases: These systems store data in a columnar format, which is optimized for analytical queries. Columnar systems can provide good performance for analytical queries, but they can be slower for transactional queries.
Separation of storage and compute databases: This separates the storage of data from the processing of data.
Hybrid HTAP systems: These store transactional data on disk and analytical data in memory.
Challenges with HTAP
There are inherent challenges with HTAP systems that can hinder their ability to optimize efficient data processing for modern applications at a large scale. Although every use-case is different, there are some factors to consider where HTAP systems can become prohibitive.
Mixed workload complexity: HTAP databases aim to accommodate both transactional and analytical tasks within a single system. However, this leads to a complex environment where the database must juggle the demands of high-speed transaction processing and resource-intensive analytical queries. This inherent conflict in requirements can result in performance compromises.
Performance trade-offs: In HTAP setups, optimizing for one workload often comes at the expense of the other. For instance, in pursuit of quick transaction processing, analytical queries might experience slowdowns due to the shared resources. Conversely, if resources are allocated to enhance analytical performance, transactional operations could suffer, leading to increased latencies and reduced throughput.
Data model mismatch: OLTP and OLAP workloads typically involve different data models. OLTP transactions focus on updating individual records and maintaining data integrity and consistency, while OLAP operations involve complex aggregations and scans. Trying to fit both types of workloads into the same data model can lead to suboptimal design compromises that hinder efficient processing for either workload.
Scalability challenges: Large-scale modern applications often require horizontal scalability to accommodate growing data volumes and user loads. HTAP databases can face difficulties in maintaining the same level of performance and scalability as specialized solutions tailored solely for one type of workload. Balancing the expansion needs of both transactional and analytical components becomes increasingly complex as the system grows.
Resource contention: In HTAP systems, contention arises when transactional and analytical workloads vie for the same resources, such as CPU, memory, and I/O bandwidth. This contention can lead to resource bottlenecks, unpredictable performance fluctuations, and overall system instability.
Maintenance and administration complexity: HTAP databases demand more intricate administration and maintenance compared to standalone OLTP or OLAP systems. Database administrators must manage the configuration, tuning, and optimization of the system to ensure both transactional and analytical workloads perform adequately. This complexity can result in increased operational overhead and potential human error.
Limitation in analytical processing: While HTAP databases can provide insights from operational data in near real-time, their analytical capabilities might not match those of dedicated data warehousing solutions designed explicitly for complex analytical queries and reporting. Specialized analytical databases can employ more sophisticated optimization techniques for complex analytical operations, offering superior performance and richer insights.
Evolution of data processing architectures: Modern applications often incorporate distributed computing, microservices, and serverless architectures. These architectures are designed to optimize specific types of workloads, potentially making it challenging to fit a hybrid database into the larger application ecosystem and take full advantage of emerging technological trends.
The PlanetScale approach
PlanetScale does not claim to be an HTAP database, nor are we an OLAP database built for pure analytical workloads. Instead, PlanetScale offers the only managed Vitess solution and we are optimized for OLTP workloads.
As developers who have worked with databases in production at some of the largest proprietors in the world, we understand that every application is different, and finding a single database that offers a one-size-fits-all approach often means making compromises.
If you have a complex application with distinct transactional and analytical workloads that can be separated, then it may be more appropriate to use separate databases for each workload. This approach allows each database to be optimized for its specific workload and can provide better performance and scalability.
Physical resource isolation is an effective way to guarantee the performance of transactional queries. Analytical queries often consume high levels of resources such as CPU, memory, and I/O bandwidth. If these queries run together with transactional queries, the latter can be seriously delayed.
For large ETL workloads, we support and recommend data integration engines such as Airbyte, Fivetran, and Stitch, with which you can offload these processes to other platforms that are more specialized in OLAP workloads.

Sign up to try it out yourself, or reach out to talk to us if you’d like to learn how PlanetScale can fit into your data pipeline.]]></content>
        <summary><![CDATA[Learn what HTAP is, how HTAP compares to OLAP and OLTP, and some pros and cons of HTAP.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing Insights Anomalies</title>
        <link href="https://planetscale.com/blog/introducing-insights-anomalies" />
        <id>https://planetscale.com/blog/introducing-insights-anomalies</id>
        <published>2023-11-28T15:50:00.000Z</published>
        <updated>2023-11-28T15:50:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Anyone responsible for a large production database can tell you that ensuring your database is healthy and performing optimally can be difficult and time-consuming. Even with battle-tested dashboards, the latest monitoring tools, and a deep understanding of your application, the phrase “Hey, is something up with the database?” strikes fear into the hearts of even the most experienced operators.
Today, we are launching a powerful new set of capabilities in PlanetScale called Insights Anomalies, designed to simplify answering this question. The goal is to provide a crystal clear overview of your database’s health and make it easy to troubleshoot when something goes wrong. This post will explore Insights Anomalies and show how we implemented it.
Insights Anomalies
If you head to the “Insights” tab in your PlanetScale database, you’ll see a new “Anomalies” section. There, you’ll find a graph representing your database’s health over time, as measured by the number of queries that took longer than expected.

Any periods where your database was unhealthy (represented by large values in the graph) will be highlighted with a red icon representing a performance anomaly. Clicking on an anomaly will bring up a detailed view with pertinent information to help understand the causes of an anomaly.

Database health
You may wonder what “unhealthy” means in this context, and what quantity the database health graph represents. The anomaly graph shows the percentage of queries that were unusually slow. An “unusually slow” query is defined as having an execution time exceeding two standard deviations above the mean (also known as 2σ1 or the 97.7th percentile), for queries with the same pattern over the last week. To determine this threshold we perform the following steps:
Aggregate query response time distribution by SQL fingerprint, and store a probabilistic sketch of that distribution in MySQL.
Use the stored sketches to determine the 2σ threshold for each incoming query’s fingerprint over the last week.
Count the number of incoming queries in each pattern that exceeded the threshold.
It’s worth examining why it’s necessary to go to the trouble of calculating the threshold on a per-query pattern basis, instead of using a more straightforward metric like a global latency percentile, as a proxy for database health2. By comparing each incoming query to the response time baseline for its specific query pattern, we can make an apples-to-apples comparison for each query pattern. Queries that have always been slow will only be considered in the anomaly calculation if they are substantially slower then they have been in the past. If the outlier percentage is elevated, we know that the same query patterns are now taking longer than they did over the last week. This provides a strong signal that the database is encountering a resource bottleneck, and does not result in false positives due to shifting database workloads.
In our experience observing this metric on internal PlanetScale databases, we’ve found it to be a reliable indicator of when we’re pushing a database beyond its limits.
Troubleshooting Anomalies
Determining when an anomaly is occurring (or not occurring) is a valuable capability in its own right. Still, it’s equally important to uncover the root causes. To make this easier, Insights lists relevant metrics for each anomaly. In particular, we show:
High-level query metrics, such as the number of rows read and written per second
Utilization metrics for the underlying database resources, such as CPU and disk usage (IOPS)
A list of backups and deploy requests running when the anomaly occurred since these operations have the potential to consume shared resources
Seeing time series metrics side-by-side with overall database health before, during, and after the anomalous period often makes it very clear where the bottleneck is, as in this example, where there is an approximately 300% increase in rows written per second during the anomaly.

In some cases, we can go deeper than high-level aggregate metrics like reads/writes per second and tell you which specific query patterns are most likely to be at the root of an anomaly.
Correlated queries
Because Insights records and stores exact query counts for 100% of your database’s query patterns, we can compare the execution rate of every query pattern with the overall database health metrics and identify highly correlated queries. In the example below we see an obvious correlation between the overall database health metric and the execution rate of an expensive query run intermittently by a background job.

To find the correlated queries shown in an anomaly, we calculate the Pearson correlation coefficient between the execution rate for each query pattern and the overall database health metric during the anomaly plus a fixed window before and after. We then return the queries with the highest correlation coefficient. Not all anomalies have correlated queries, for example, those caused by running a backup on an under-provisioned database cluster, so we exclude results with a correlation coefficient below a certain threshold. When correlated queries are present, it can shave hours off the time it takes to find the root cause.
Try it today
All PlanetScale databases have access to the Anomalies tab in Insights today. You can read more in the PlanetScale Insights Anomalies documentation. User feedback helps us tune the system to improve accuracy, so please let us know about your experience, positive or not, by sharing on Twitter or contacting us.
Footnotes
There’s nothing magical about 2σ or the 97.7th percentile. This value is used because it’s a fairly common choice for defining a statistical outlier.
For a deeper dive on the motivations behind using 2σ outliers to gauge system health, check out this excellent talk from two Google engineers at SREcon22]]></content>
        <summary><![CDATA[This new update to PlanetScale Insights introduces smart query monitoring to detect slower than expected queries in your database.]]></summary>
      </entry>
    
      <entry>
        <title>Webhook security: a hands-on guide</title>
        <link href="https://planetscale.com/blog/securing-webhooks" />
        <id>https://planetscale.com/blog/securing-webhooks</id>
        <published>2023-11-21T09:00:00.000Z</published>
        <updated>2023-11-21T09:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We recently released webhooks for PlanetScale.
One of the more interesting parts of building a webhooks service is making it secure and protected from abuse.
As soon as we started talking about the project internally, engineers throughout PlanetScale started sharing the different ways they have seen webhooks be abused or exploited in the past.
These collective experiences gave us a good list of things to worry about while building out our own webhooks service.
In this post, we'll go through some of the primary steps we took to build our webhooks service securely.
Server-side request forgery (SSRF)
The main vulnerability in any webhooks service is server-side request forgery (SSRF). An SSRF is when an attacker causes your service to make an internal, unintended request within your own network.
Webhooks are the perfect target for this. The user provides a URL, and then triggers your application to send a request to it.
This request could be harmful by either returning private information to the attacker, or by triggering an internal service to perform some action on their behalf.
For example, if a web server is running an internal metrics endpoint that responds to HTTP POST requests, an attacker could direct the webhook service to send a request to the service. If the webhook service displays the response in the UI, the attacker has now gained access to your internal metrics data.
Mitigating webhook SSRF's
When building a webhook service, there are two layers of defense to setup to protect against SSRFs. First, is limiting the URLs users are allowed to set up webhooks for. And, second, limiting where your webhook service can make HTTP connections via egress rules or a proxy.
Strict validation of the webhook URL
Adding validations for allowed URLs mainly benefits the user by quickly giving them feedback that the URL they entered won't work with your webhook service.
Since DNS can be easily changed, URL validation alone is not enough to mitigate from SSRFs.
For our service, we check for the following:
Require HTTPSThese days, running a web service without SSL is rare. We felt that making https a requirement for any webhook we send is a fair request that limits vulnerabilities and protects the potentially sensitive data being sent in our webhook payloads.
Block private and loopback IP addressesWe used Ruby's ipaddr to identify if an IP address is private (internal) or a loopback (localhost) address.
If we see either of these, they fail the validation.uri = URI.parse(url)

host_ip = begin
  IPAddr.new(uri.host)
rescue
  nil
end

return false if host_up && (ip.private? || ip.loopback?)

Block our own domainsTo protect against a user sending traffic to another external service owned by PlanetScale, we set up a domain blocklist which includes all of our other public services.uri = URI.parse(url)

if BLOCKED_DOMAINS.any? { |domain| uri.host&.include?(domain) }
  return false
end

DNS resolution testOnce the URL has passed basic tests, we then resolve the DNS to further validate it is not pointing towards any private or loopback IP addresses.
Remember, the user can always update the host's DNS after this check has passed. This alone is not enough to protect from SSRFs.def host_resolves_valid_ips?(host)
  ip_addresses = Resolv.getaddresses(host)
  return false if ip_addresses.none?

  if ip_addresses.any? { |ip| blocked_ip?(IPAddr.new(ip)) }
    return false
  end

  true
end

def blocked_ip?(ip)
  ip.private? || ip.loopback?
end

HTTP egress rules
No matter how rigorous your URL validations are, you cannot fully trust any URL provided by a user. Because of this, it's critical to isolate and limit where the webhooks service can send HTTP requests.
How this is implemented will depend on your infrastructure. Our application is deployed using Kubernetes. We set up an isolated service dedicated to sending webhooks.This service sends all HTTP requests via an Envoy Proxy which only allows HTTP requests outside of our network. It has similar rules as the URL validations above, but are executed when the webhook is being sent.
The key rules to put in place are:
Block any connections to internal/private IPs.
Limit traffic to HTTPS ports.
Mitigating distributed denial-of-service (DDoS)
Webhook services can be manipulated to send large amounts of traffic to a URL. To implement this attack, all an attacker needs to do is setup a webhook, and then find a way to trigger it in large quantities.
API based rate limitingOne simple way to protect against this is to set reasonable rate limits at your API layer. This restricts how many actions an attacker can take and stops them from enqueueing an unlimited number of webhooks.
Our entire API service has a general rate limiter that protects all endpoints.
For our webhooks service, we have a test endpoint that triggers a test webhook. For this endpoint specifically, we added a rate limit of 1 request per 20 seconds. This felt reasonable for users who are testing their hooks while also eliminating the risk of the test webhook being abused.
Webhook uniqueness/lockingOur webhook service uses a Sidekiq queue to process and send webhooks. With Sidekiq, we are able to set up a uniqueness check on each webhook that is added to the queue.
Duplicate webhooks in quick succession get rejected, resulting in only a single unique webhook being sent out from our service, as well as limiting the number of webhooks we need to process.
Isolated infrastructureIn the event that our other mitigations fail, we run our webhooks queue on isolated machines to protect against webhooks impacting the availability of other PlanetScale services.
If our webhooks are being abused, we do not want that to impact the reliability of the rest of our systems. They can be easily paused or disabled in the event of an incident.
Set strict timeoutsSending a webhook ties up our resources while waiting for a response. One possible attack vector is queueing many webhooks that resolve very slowly. This can be mitigated by setting a short timeout on webhook requests.
Limiting number of webhooksWe set an initial limit of 5 webhooks per database. We felt this was enough for people to automate several workflows, while also protecting us from having users trigger large number of hooks for the same events. Starting with 5 is fairly conservative, but leaves us space to grow and allow more if people have use cases for them. Adding more later is always easier than taking it away.
Conclusion
Hopefully you enjoyed this overview on how we secured PlanetScale webhooks. If you haven't tried webhooks yet, you can learn more about them in our Webhooks documentation.]]></content>
        <summary><![CDATA[Learn what went into building PlanetScale webhooks from a security perspective. This article covers SSRF, webhook validation, DDoS, and more. ]]></summary>
      </entry>
    
      <entry>
        <title>Three surprising benefits of sharding a MySQL database</title>
        <link href="https://planetscale.com/blog/three-surprising-benefits-of-sharding-a-mysql-database" />
        <id>https://planetscale.com/blog/three-surprising-benefits-of-sharding-a-mysql-database</id>
        <published>2023-11-20T15:00:00.000Z</published>
        <updated>2023-11-20T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Organizations often shard their database to scale beyond what simply adding resources to a single server can provide.
When you horizontally shard your database, you essentially break the data up and split it across multiple database servers. Hearing this, you might think that adding more servers means adding more maintenance overhead to your staff, and more expenses on your budget, with the tradeoff that your organization can handle more database traffic. While there is definitely some truth to that in certain situations, there’s oftentimes more to the story that's not as obvious.
In this article, we’ll cover three ways that sharding your database can benefit your organization beyond additional throughput.
Minimized impact on failures
There’s an old saying in architecting infrastructure: two is one, and one is none.
The implication is that you should never have one of anything, as it creates a single point of failure. This is true for your database as well, perhaps more so since it is a critical part of your application. In a typical MySQL environment, if the database server goes down, the entire application goes down with it.
In sharded environments, this failure domain is actually spread out.
Consider a scenario where you shard based on ranges of customers using the customer ID.

If shard A goes down, it will make a bad day for customers 1-5, but the remaining shards are actually still online and can serve data with no problem. Since the impact of an outage is more isolated, there is less of an impact on various teams across your organization as they work to communicate with customers and recover from the failure.
This does not consider any lost revenue from the outage, which is also minimized.
Maintenance tasks are more efficient
The larger a MySQL environment gets, the harder it gets to manage.
Consider backing up a 1TB database. Not only does the process take a long time, but it can have a significant impact on how fast your database responds to queries. Now let's take that same database and create a sharded environment where the data is evenly split across five shards, similar to the previous example.
Not only is backing up 5× 200 GB databases quicker, but if you ever have to restore data from those databases, that process will be faster as well.
Backups are just one example of how sharding makes database management easier.
Schema migrations are another task that can be performed more efficiently. For example, when you merge in a Deploy Request on PlanetScale, we’ll create a new table on the target database branch with the updated version of the schema and sync data from the live table into this “ghost table”. Once the changes are merged in, the old table is dropped and the “ghost table” becomes the new production table.
Using the same scenario from above, performing this operation on the smaller databases in parallel will dramatically reduce the time it takes to complete.
You might actually save money
I know the thought going through your head right now: “How can sharding save me money if I’m adding more servers?”
Let’s first consider the vertical scaling approach. When you provision a server, you need enough resources (CPU, memory, IOPS) to run whatever it is you are trying to run, as well as the necessary overhead to accommodate usage spikes. As the application scales, you’ll eventually start reaching the limits of your server and need to bump resources along with even more overhead to support the service.
This cycle continues, resulting in you always paying for more than you actually use.
Now consider a world where you have a database that’s sharded across five servers as shown earlier in this article.
Whenever the load exceeds what the allocated resources can handle, you add another server into the environment with the same specs and rebalance the load across those servers. There may still be some overhead, but it's significantly lower than what's required when scaling vertically. Plus, since you are adding another server with the same specs, the overall cost increases more linearly and predictably, something your finance team will appreciate.

Another way that sharding can save you money is by utilizing commodity disks in cloud infrastructure.
As your database is used more and more, it increases the demand on the underlying storage in the form of more required IOPS. Lower-cost virtual disks often have a set limit to the amount of IOPS granted to them before you have to select a more costly option. This can creep up on cloud architects if it’s not accounted for.
By sharding your database across multiple, lower-cost disks, you can save money by avoiding the additional costs of their more expensive counterparts.
Conclusion
As a database grows, so do many of the struggles that are associated with databases in general, not only data contention. After reading this article, you should now have a better idea of several other key benefits of sharding beyond additional throughput.
If you’ve sharded your database, what other benefits have you found that might not be provided here? Share it on X and tag us @planetscale!]]></content>
        <summary><![CDATA[There is more to sharding than simply increasing data throughput. In this article, we explore three different benefits of sharding your database.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL replication: Best practices and considerations</title>
        <link href="https://planetscale.com/blog/mysql-replication-best-practices-and-considerations" />
        <id>https://planetscale.com/blog/mysql-replication-best-practices-and-considerations</id>
        <published>2023-11-15T15:00:00.000Z</published>
        <updated>2023-11-15T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MySQL offers a wide array of options to configure replication, but with all of those options, how can you be sure you are doing it right?
Replication is the first step to providing a higher level of availability to your MySQL database. A well configured replication architecture can be the difference between your data being highly available, or your MySQL setup becoming a management nightmare. At PlanetScale, we support hundreds of thousands of database clusters, all using replication to provide high availability, so we have a little bit of experience in this arena!
In this article, we’re going to explore some of the best practices when it comes to replication, both locally and across longer distances.
Use an active/passive configuration
When replicating with active/passive mode, one MySQL server acts as the source and all other servers are read-only replicas from that source.
In this configuration, the replicas can be used to serve up read-only queries, but all writes must be sent to the source. This helps split the load across all replicas, but it is important to note that when using the default asynchronous replication mode (covered in more detail later), there may be some delay between when data is written to the source and when it is available on the replica. Keep this in consideration when designing your application.
The alternate configuration is active/active, which means multiple servers are actively used to read and write data.
Active/active might seem like a good idea since you have two servers to process write requests, though each server is processing the others query workload, making write distribution more of an illusion. The failover between servers can appear seamless though conflicts can easily occur as there is no native conflict resolution logic within MySQL. When conflicts do occur, neither node can be considered the source of truth for a rebuild without significant data loss.
We always recommend using an active/passive configuration for replication, and sharding if you need more throughput from your database.
If sharding interests you, you might also like our tech talk where Solutions Architect Liz van Dijk dives deep into sharding.
Use GTIDS
Global Transaction Identifiers (aka, GTIDs) are unique IDs attached to a given transaction within a replicated environment.
By default, replicas will read the binary log file on a source database and track the processed records based on the position within that file. As transactions are processed by the replica, the position within that file continues to advance to indicate what has and has not been processed until this point. This system is relatively fragile as issues can occur if the source crashes and the logs need to be restored.
With GTIDs enabled, each transaction is assigned an ID so replicas can concretely determine if a transaction has been processed or not.
Each GTID is a UUID that represents the source server followed by an auto-incrementing ID, formatted as 14a54b2f-2ad0-43b6-b803-72b5d7151d3b:1. As transactions are handled on the source, the UUID remains the same but the ID continues to grow. When a replica processes a transaction from the source, the GTID or GTID range (formatted as 14a54b2f-2ad0-43b6-b803-72b5d7151d3b:1-10) is stored in the gtid_executed table.
This process dramatically reduces the risk of your data getting out of sync based on transactions either not being processed, or being processed more than they should.
Use the correct replication mode
MySQL has two different modes to replicate data, and you should know how they behave to ensure you have the best mode set for your environment.
By default, MySQL will be configured with asynchronous replication. With this mode, transactions will be sent to the source and then read by each replica and processed independently. There is no validation from the source that any replica in the environment processes the transaction.
The other form of replication supported by MySQL is semi-synchronous replication, enabled as a plugin.
When configured with semi-synchronous replication, transactions are received by the source and processed by each replica, however, the source will wait until at least one replica accepts the transaction before responding to the caller. The benefit is that data consistency is greater since at least two database servers in your environment will have the data, but it does add a bit of overhead in the response time. PlanetScale actually uses semi-synchronous replication for our databases within a given region.
With semi-sync mode enabled, there are some additional options also available that can be tweaked.
By default, the primary server will wait 10 seconds for a replica with semi-sync mode enabled to acknowledge the transaction. This value can be modified, and if you rely on semi-sync for data consistency, you should increase this value to be high enough to guarantee consistency. We set the timeout value extremely high to ensure that the data for our databases are always consistent.
It’s also worth mentioning that you can mix and match these two modes.
If you want to guarantee that one specific server always contains an up-to-date copy of your database, but also want additional replicas for more resiliency, you could configure one replica with semi-sync and one without. This means when data is written to the source, it will always make sure that the one server with semi-sync enabled has received that transaction before responding, and the other replicas in the cluster will catch up when they can. In a disaster scenario (discussed further down this article), this can help you easily identify the best candidate to recover from.
Understanding the difference between these two modes can help you make a better decision on what makes more sense for your business
Log storage
As mentioned earlier in this article, each replica in your environment will read from the binary logs of the source as their source of truth.
By default, these logs are stored on the same disk as the database. As you can imagine, busy databases can be bogged down when you consider the amount of throughput being processed by a single disk (manipulating the database + reading binary logs for replication). That said, the better approach would be to store binary logs on a separate disk than the database.
This approach can also save you some money in cloud environments where free volumes have hard IOPS limits.
Monitor replication
All infrastructure requires monitoring to catch issues proactively, and replication is no exception.
If left unmonitored, you’d have no idea whether or not your data is actually being replicated once it's configured. SolarWinds Database Performance (formerly VividCortex) is one of the more popular database monitoring solutions available and does support monitoring replication. At PlanetScale, we use Prometheus to monitor replication, along with other metrics, for the clusters we manage.
Regardless of the solution used, make sure that when issues do occur, the proper people are notified so that things can be fixed before they become a real issue.
Create a failover strategy
It is inevitable that software or hardware can fail, so planning for said failovers can minimize the pain when they do happen.
One of the major benefits of using replication is the increase in resiliency by having more than one server containing your data online at any given time. Your team should have a good strategy ready in case the primary data source fails. The following is an example of what an unplanned failover might look like:
Take measures to ensure the downed source won't come back online. This could cause replication issues if it happens unexpectedly.
Identify the replica you want to choose as the new source and unset the read_only option. If semi-sync is used, this would be the replica you’ve configured with the plugin along with the source.
Update your application to direct queries to the newly promoted source.
Update the other replicas to start replicating from the new source.
Considerations when replicating to other data centers
When building in the cloud, you’ll have the ability to deploy services to almost anywhere in the world, and this includes databases too.
Each cloud provider is made up of a number of geographical regions. Within those regions are multiple data centers that are close enough to be considered in the same region, but far enough to survive many disasters. These data centers are known as availability zones or AZs for short.
If possible, replicating your database to other physical locations is definitely a best practice, but it comes with some additional considerations.
The time it takes to send data over the wire is one such consideration.
Since the network traffic needs to travel over a farther distance, replicating between locations does introduce additional latency between replicas. Luckily, the latency between AZs is often not very high. In fact, AWS claims that they have single-digit millisecond latency between availability zones in the same region.
Regions are much farther from one another and often have significant latency.
At the time of this writing, cloudping.co reported that the latency between us-east-1 and us-west-1 is over 60ms. Replication in itself has a bit of a delta between the time that data is written to the source and the time it is written to a replica, known as replication lag. This is exacerbated when replicating across longer distances.
As such, replicating across regions should be done in asynchronous mode so as to not cause unnecessary delay for the application making requests.
Conclusion
Knowing the available options when configuring replication is not enough. Understanding the best configuration for these options can make a tremendous difference in how well your MySQL cluster operates and serves data.
If this article has helped you better understand replication and how it should be configured, let us know by sharing it on X and tagging @planetscale!
For more about replication, check out our article about what it is and when to use it.]]></content>
        <summary><![CDATA[Learn the best practices for configuring MySQL replication, and how to ensure your data is always available.]]></summary>
      </entry>
    
      <entry>
        <title>A guide to HTML email with Ruby on Rails and Tailwind CSS</title>
        <link href="https://planetscale.com/blog/guide-to-html-email-with-ruby-on-rails-and-tailwind-css" />
        <id>https://planetscale.com/blog/guide-to-html-email-with-ruby-on-rails-and-tailwind-css</id>
        <published>2023-11-14T08:01:46.798Z</published>
        <updated>2023-11-14T08:01:46.798Z</updated>
        
        <author>
          <name>Ayrton</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Being in someone’s inbox is a privilege. At PlanetScale, we spend a great amount of time and detail making sure we get it right.
We recently shipped weekly database report emails. These report emails are sent out weekly to give you an easy way to see database performance, activity, and actionable next steps.
In this article, we’ll cover how we built the HTML emails that power our weekly database reports using Ruby on Rails and Tailwind CSS and how we overcame some of the challenges we faced.
This article is not intended to be a full step-by-step tutorial but, rather, a reference for those exploring how to build HTML emails in your own application using Rails and Tailwind.
Set up the Rails app
If you’d like to loosely follow along, make sure you have a Rails app set up. You need to have Rails 7+ running with cssbundling-rails and configured for postcss.
You can create a new app by running:rails new myapp --css postcss

Updating an existing Rails app
If you have an existing Rails app, but it’s not configured for postcss, follow these instructions to finish set up. If you followed the previous instructions, you can skip this step.
Add cssbundling-rails:./bin/bundle add cssbundling-rails

Run the scaffolding:./bin/rails css:install:postcss

You may be prompted to overwrite the file. If you already have a postcss.config.js file, you can skip overwriting it by typing n when prompted.
If you previously had an application.css file in app/assets/stylesheets/ directory, you can undo these changes with:git restore --staged app/assets/stylesheets/application.css
git checkout app/assets/stylesheets/application.css

Remove the generated application.postcss.css:rm app/assets/stylesheets/application.postcss.css

You may be prompted to overwrite the file. If you already have an application.postcss.css in your app/assets/stylesheets directory, you can skip overwriting it by typing n when prompted.
Now that your Rails app is set up, let’s configure postcss.
Configuring postcss for email
We want to have two separate CSS files: one for your web application and one for your emails. Why? Email CSS is much different than web CSS — email clients have very limited support to what CSS you can use. Let’s create that separate file now.
Create your email CSS file by running:touch app/assets/stylesheets/mailer.postcss.css

Each CSS file will be configured with different Tailwind options optimized for their own use case. Generating these separated files is straightforward with the postcss CLI.
We need an input directory app/assets/stylesheets and an output directory app/assets/builds.// package.json
 {
   "scripts": {
-    "build:css": "postcss ./app/assets/stylesheets/application.postcss.css -o ./app/assets/builds/application.css"
+    "build:css": "postcss ./app/assets/stylesheets/{application,mailer}.postcss.css --base ./app/assets/stylesheets --dir ./app/assets/builds"
   }
 }

Next, compile the CSS:yarn build:css

You can verify that the compiled CSS exists with:ls app/assets/stylesheets/

Technically, these are just CSS files. These have already been processed by PostCSS and there’s nothing PostCSS about them anymore. Let’s remove the postcss extension:mv app/assets/stylesheets/application{.postcss,}.css
mv app/assets/stylesheets/mailer{.postcss,}.css

Next, rebuild the CSS with:yarn build:css

You can confirm it updated with:ls app/assets/builds/

Creating an email-specific stylesheet
Next, let’s set up Tailwind and configure PostCSS. This allows us to optimize the build output differently for the web (application.css) and email (mailer.css).
Tailwind CSS
Install Tailwind:yarn add tailwindcss postcss autoprefixer postcss-import --dev

Initialize Tailwind CSS:yarn tailwindcss init

Let’s tell PostCSS to post-process the email CSS for an optimized build. Update the postcss.config.js file as follows:// postcss.config.js
-module.exports = {
-  plugins: [
-    require('postcss-import'),
-    require('postcss-nesting'),
-    require('autoprefixer'),
-  ],
+module.exports = (api) => {
+  if (/mailer/.test(api.file.basename)) {
+    return {
+      plugins: {
+        'postcss-import': {},
+        'postcss-custom-properties': {
+          preserve: false
+        },
+        'tailwindcss/nesting': {},
+        tailwindcss: {
+          config: './tailwind.config.mailer.js'
+        }
+      }
+    }
+  }
+
+  return {
+    plugins: {
+      'postcss-import': {},
+      'tailwindcss/nesting': {},
+      tailwindcss: {},
+      autoprefixer: {}
+    }
+  }
 }

There’s a lot to unpack here so let’s go over the above snippet step by step.
First, we check if we’re post-processing the mailer CSS or application CSS. Depending on the file, we’ll post-process them slightly different to guarantee we’re optimizing for the platform. This is done in this code snippet:if (/mailer/.test(api.file.basename)) {
  // ...
}

Next, let’s look at the two import configuration rules. The first one is postcss-custom-properties:{
  'postcss-custom-properties': {
    preserve: false
  }
}

The preserve option determines whether Custom Properties and properties using custom properties should be preserved in their original form. We don’t want to preserve these because most email clients do not support CSS variables. We do this by setting preserve to false.
For example, the two snippets below illustrate what it looks like before and after setting preserve: false:
Before setting preserve: false::root {
  --color: red;
}

h1 {
  color: var(--color);
}

After setting preserve: false:h1 {
  color: red;
}

Finally, we’re telling PostCSS to use a different Tailwind config:{
  tailwindcss: {
    config: './tailwind.config.mailer.js'
  }
}

Now that that’s all covered, let’s continue on with the Tailwind configuration.
Next, create the tailwind.config.mailer.js file now:// tailwind.config.mailer.js

module.exports = {
  content: ['app/helpers/mailer_helper.rb', 'app/views/*_mailer/*.html.erb', 'app/views/layouts/mailer.html.erb'],
  future: {
    disableColorOpacityUtilitiesByDefault: true
  },
  theme: {
    extend: {
      borderRadius: {
        none: '0',
        xs: '2px',
        sm: '4px',
        DEFAULT: '6px',
        md: '8px',
        lg: '10px',
        full: '9999px'
      },
      fontSize: {
        xs: '10px',
        sm: '12px',
        base: '14px',
        lg: '16px',
        xl: '18px',
        '2xl': '22px',
        '3xl': '24px',
        '4xl': '28px',
        '5xl': '32px'
      },
      spacing: {
        0.5: '4px',
        1: '8px',
        1.5: '12px',
        2: '16px',
        2.5: '20px',
        3: '24px',
        4: '32px',
        4.5: '36px',
        5: '40px',
        6: '48px',
        7: '56px',
        8: '64px',
        9: '72px',
        10: '80px'
      }
    }
  }
}

Let’s go over what’s happening in this file.
This first content block tells Tailwind where all of our email HTML templates and helpers live.{
  content: ['app/helpers/mailer_helper.rb', 'app/views/*_mailer/*.html.erb', 'app/views/layouts/mailer.html.erb']
}

Next, take a look at the future object:{
  future: {
    disableColorOpacityUtilitiesByDefault: true
  }
}

This is the equivalent of saying:{
   corePlugins: {
     backgroundOpacity: false,
     borderOpacity: false,
     divideOpacity: false,
     placeholderOpacity: false,
     ringOpacity: false,
     textOpacity: false
   }
}

And what this does if favor HEX values over RGBA values because, as you might have guessed, not all email clients support alpha values.
Similarly, if you take a look at theme.extend in the tailwind.config.mailer.js file, this will favor PX values over REM, since email client doesn’t support them:{
  theme: {
    extend: {
      // ...
    }
  }
}

Inline CSS
Email clients don’t have great support for stylesheets. The easiest way to handle this is to work with inline styles, but that is error-prone and hard to work with, as you cannot use classes and/or reuse styling over your HTML.
For our emails, we used a library called roadie to do the hard work for us. It also plays nice with Tailwind CSS.
Add roadie-rails:./bin/bundle add roadie-rails

Set up roadie-rails:# app/mailers/application_mailer.rb
 class ApplicationMailer < ActionMailer::Base
+  include Roadie::Rails::Automatic
+
   default from: "from@example.com"
   layout "mailer"
 end

Set up the layout
The container is the main wrapper that hold your content. Typically, in email, this is a single ~600px wide center-aligned column that will shrink down on smaller viewports. Here is what ours looks like:<!DOCTYPE html>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <title><%= message.subject %> | PlanetScale</title>

    <%= stylesheet_link_tag "mailer" %>
  </head>

  <body>
    <div role="article" aria-roledescription="email" aria-label="<%= message.subject %>" lang="en">
      <table border="0" cellpadding="0" cellspacing="0" class="font-sans" width="100%">
        <tr>
          <td height="32"></td>
        </tr>

        <tr>
          <td align="center">
            <table border="0" cellpadding="0" cellspacing="0" class="w-full max-w-[684px] px-0 sm:w-[684px] sm-px-2" width="100%">
              <tr>
                <td>
                  <%= yield %>
                </td>
              </tr>
            </table>
          </td>
        </tr>

        <tr>
          <td height="32"></td>
        </tr>
      </table>
    </div>
  </body>
</html>

Supporting dark mode
As part of these new emails, we had to make sure we support users who prefer dark mode. Tailwind made this a breeze:   <head>
     <meta charset="utf-8">
     <meta http-equiv="x-ua-compatible" content="ie=edge">
+    <meta name="color-scheme" content="light dark">
+    <meta name="supported-color-schemes" content="light dark only">
     <meta name="viewport" content="width=device-width, initial-scale=1">

     <title><%= message.subject %> | PlanetScale</title>
/* app/assets/stylesheets/mailer.css */
@import 'tailwindcss/base';
@import 'tailwindcss/components';
@import 'tailwindcss/utilities';

:root {
  color-scheme: light dark;
  supported-color-schemes: light dark;
}

@media (prefers-color-scheme: dark) {
  body {
    background-color: #111 !important;
    color: #fafafa !important;
  }

  a {
    color: #47b7f8 !important;
  }
}

Adding the preheader
The preheader is the perfect place to further encourage your subscribers to open your email. It is the text headline that appears next to the email subject. Here’s how we set ours up:<% # app/views/database_mailer/database_weekly_report.html.erb %>
<% content_for :preheader do %>
  <%= @report.period_start_label(full: false) %> – <%= @report.period_end_label(full: false) %> Here’s a look at the performance of your <%= @database.display_name %> database.
<% end %>

<% # app/views/layouts/mailer.html.erb %>
<!DOCTYPE html>
<html>
  <body>
    <% if content_for?(:preheader) %>
    <div class="hidden">
      <%= yield :preheader %>
    </div>
    <% end %>
  </body>
</html>

Apple autolinking
Phone numbers, addresses, dates, and (somewhat random) words like "tonight" frequently turn blue and underlined in emails viewed on an iPhone or iPad. These links trigger app-driven events, such as making a call or creating a calendar event. While these may come in handy for some scenarios, in others, they can be a nuisance and ruin your carefully-planned branding, even decreasing legibility.
They weren’t relevant to our emails, so we removed them:     <meta name="color-scheme" content="light dark">
     <meta name="supported-color-schemes" content="light dark only">
     <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="format-detection" content="telephone=no, date=no, address=no, email=no, url=no">
+    <meta name="x-apple-disable-message-reformatting">

     <title><%= message.subject %> | PlanetScale</title>

     <%= stylesheet_link_tag "mailer" %>
   </head>

Handling Gmail clipping
Gmail clips emails that have a message size larger than 102KB and hides the content behind a “View entire message” link.

Cut unnecessary content
The first recommendation to handle this is to cut any content that may be unnecessary. In our case, we’re limiting the amount of slow queries to the first ten:-<% @report.slow_queries.each do |query| %>
+<% @report.slow_queries.first(10).each do |query| %>
 <tr>
   <td class="font-mono text-sm text-primary">
-    <%= query.sql %>
+    <%= truncate(query.sql, length: 200) %>
   </td>

Additionally you can truncate content and instead link to the full content:<% @report.slow_queries.first(10).each do |query| %>
 <tr>
   <td class="font-mono text-sm text-primary">
+    <%= truncate(query.sql, length: 200 %>
+  </td>
+  <td align="right">
+    <%= link_to "View query", query %>
   </td>

Debug view source
After we went through and cut any unnecessary content, we noticed we’d still sometimes have clipped content. The email looks pretty small, so let’s take a closer look to see what sends the email size over the 102KB limit.
To determine the size of your sent email, send it to a test address. View the source code, and save the source code in a document. Then, view the file size of that document. We were sending emails of ~80KB, so we wanted to create a bit more buffer. We did this by cutting down some Tailwind imports:-@import 'tailwindcss/base';
-@import 'tailwindcss/components';
 @import 'tailwindcss/utilities';

 @import 'mailer/base';
 @import 'mailer/theme';

We removed Tailwind’s preflight styles and component utilities to try to bring down our overall file size:html {
  line-height: 1.5;
}

body {
  line-height: inherit;
  margin: 0;
}

img {
  border-style: none;
  display: block;
  vertical-align: middle;
  max-width: 100%;
  height: auto;
}

Testing file size again
With these changes in place, we repeated the email sending test again, and saw we reduced our email size from 80KB to 45KB.

It turned out that trying to accommodate Gmail’s clipping was a great exercise for determining what content is actually essential for your emails.
Accommodating Gmail desktop styles

We noticed that on desktop in Gmail, our mobile styles are applied, even though our design is responsive. According to the Google Workspace guides, Gmail supports CSS media queries.
So, what’s going on here? Whenever you use responsive modifiers like sm or important modifiers like !, Tailwind CSS will escape that output.
For example:<span class="hidden sm:!inline"> Shown on desktop only </span>

This will generate:.hidden {
  display: none;
}

@media (min-width: 640px) {
  .sm\:\!inline {
    display: inline !important;
  }
}

While modern browsers support escaped sequences, Gmail unfortunately does not. It’s best to stick to the symbols [a-zA-Z0-9_-]. From a to z, from A to Z, from 0 to 9, underscores (_), and hyphens -.
To do this, the easiest solution is to define a handful of utility helpers:@media (min-width: 640px) {
  .sm-block {
    display: block !important;
  }

  .sm-hidden {
    display: none !important;
  }

  .sm-inline {
    display: inline !important;
  }
}

And, finally, update our markup:-<span class="hidden sm:!inline">
+<span class="hidden sm-inline">
  Shown on desktop only
</span>

This works well because at PlanetScale, we don’t heavily rely on differentiating between desktop and mobile styles.
An alternative option to this could be changing the separator modifier:module.exports = {
  separator: '_'
}

However, with this option, you’d still end up having to overwrite your styles to add !important to them. It’d be cool if Tailwind CSS supported setting the important modifier on mobile styles only. You cannot override inline CSS if it has !important. It has the highest precedence, higher than the style in our external CSS file where the media queries live.
If we see that this becomes a maintenance burden down the line, we’re considering writing a Roadie transformer to automate the above process for us so we can keep using the vanilla Tailwind CSS we know and love.

Testing our emails
Now that all the upfront work is done, we enter the testing phase.
Previewing emails
Rails provides an easy way to see how emails look by visiting a special URL that renders them.
Let’s create a preview mailer in test/mailers/previews/database_mailer_preview.rb:class DatabaseMailerPreview < ActionMailer::Preview
  def database_weekly_report
    DatabaseMailer.with(
      database: database,
      recipient: user,
      report: report,
      subscription: subscription,
    )
  end

  private

  def database
    Database.first!
  end

  def recipient
    User.first!
  end

  def report
    DatabaseReport.new(
      database: database,
      period_start: Time.current.beginning_of_week,
      period_end: Time.current.end_of_week,
    )
  end

  def subscription
    Subscription.new(
      database: database,
      user: user,
    )
  end
end

The preview is now available in http://localhost:3000/rails/mailers/database_mailer/database_weekly_report.
If you change something in app/views/database_mailer/database_weekly_report.html.erb or the mailer itself, it’ll automatically reload and render it so you can see the new style instantly.

Sending test emails
Email testing ensures that your emails are rendered as intended. Not every email provider supports dark mode or media queries, so it’s important to send an actual email so that you can test in several providers. Let’s create a custom Rake task in lib/tasks/email.rake/ to generate the email:# frozen_string_literal: true

namespace :dev do
  namespace :mailer do
    task :database_weekly_report, [:email] => :environment do |_t, args|
      raise ArgumentError, "Rails environment is not development" unless Rails.env.development?
      raise ArgumentError, "Email argument is missing" unless args[:email]

      mailer = DatabaseMailerPreview.new.database_weekly_report
      mailer.to = args[:email]
      mailer.deliver
    end
  end
end

We can now send a test email with:$ bin/rails dev:mailer:database_weekly_report[info@planetscale.com]

From here, you can quickly iterate to fix anything that doesn’t work well with specific email clients.
Conclusion
While building HTML emails can be a pain, Rails and Tailwind CSS made the process quite enjoyable. We were able to build, test, and ship quickly thanks to Rails and Tailwind. If you have any questions about the implementation, don’t hesitate to reach out! You can find us on X (Twitter) or fill out the form on our Contact page.]]></content>
        <summary><![CDATA[Learn how to build HTML emails using Rails and Tailwind CSS. We also cover how to overcome some common obstacles such as Gmail message clipping, large file size, Apple autolinking, and more.]]></summary>
      </entry>
    
      <entry>
        <title>Sharding for cost-effective database management</title>
        <link href="https://planetscale.com/blog/sharding-for-cost-effective-database-management" />
        <id>https://planetscale.com/blog/sharding-for-cost-effective-database-management</id>
        <published>2023-11-13T13:00:00.000Z</published>
        <updated>2023-11-13T13:00:00.000Z</updated>
        
        <author>
          <name>David Bravant</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Sharding provides a unique solution that can optimize database cluster performance as well as the infrastructure costs associated with basic database operations.]]></content>
        <summary><![CDATA[Maximizing performance while minimizing costs is integral for engineering large-scale applications with massive data volumes. Learn more about cost-effective sharding in this tech talk.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies</title>
        <link href="https://planetscale.com/blog/planetscale-named-deloitte-top-500-fastest-growing-companies" />
        <id>https://planetscale.com/blog/planetscale-named-deloitte-top-500-fastest-growing-companies</id>
        <published>2023-11-08T09:00:00.000Z</published>
        <updated>2023-11-08T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Every year, Deloitte ranks the fastest-growing companies in North America in technology, media, life sciences, fintech, telecommunications, and energy tech. The awardees are chosen based on the percentage of fiscal year growth they’ve experienced.
The Deloitte Technology Fast 500™ is now in its 29th year, and PlanetScale is thrilled and honored to share that we’ve placed 188th out of 500.
This isn’t the first time PlanetScale has been recognized for its growth trajectory. Our workplace culture and product innovation have also been highlighted via Fortune Best Small Workplaces, Redpoint InfraRed 100 — the most promising companies in cloud infrastructure, Inc. Best Workplaces, and InfoWorld Technology of the Year awards — all in the last 12 months.
Since our inception, we’ve made it our mission to be the world’s most advanced MySQL platform. We’re trusted by brands like MyFitnessPal and Barstool Sports, bringing companies all over the world scalability, performance, and reliability — without ever sacrificing the developer experience.
By leveraging PlanetScale and its cutting-edge features — like branching, real-time query analytics, and non-blocking schema changes — users can spend less time worrying about their database and more time on what actually matters: their own applications.
Thank you to Deloitte for this incredible honor, and a huge thank you to our users who make it all possible.]]></content>
        <summary><![CDATA[We are pleased to announce that PlanetScale has been named on the Deloitte Technology Fast 500™.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 18</title>
        <link href="https://planetscale.com/blog/announcing-vitess-18" />
        <id>https://planetscale.com/blog/announcing-vitess-18</id>
        <published>2023-11-07T09:01:00.000Z</published>
        <updated>2023-11-07T09:01:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[MySQL compatibility improvements
Vitess 18 is now generally available, with a number of new enhancements designed to improve usability, performance, and MySQL compatibility.
Foreign key constraints
In the past, foreign key constraints had to be managed outside Vitess. This was a significant blocker for adoption. We are now able to support Vitess-managed foreign key constraints within the same shard. This includes the ability to import data into Vitess from an existing MySQL database with foreign key constraints. We plan to extend foreign key constraint support to cross-shard relationships in a future release.
Foreign key constraints are supported in PlanetScale. You can enable them in your database settings page.
General compatibility
The Vitess query planner has been significantly enhanced, paving the way for advanced query capabilities. The newly revamped version is more robust and flexible. This allows Vitess to provide better support for complex aggregations, sophisticated subqueries, and derived tables. Complex queries on sharded databases will perform better as a result of these changes.
Usability enhancements
Cobra
The Vitess CLI has been migrated to cobra.In addition to standardizing and modernizing our CLI infrastructure, this change provides two major benefits.
We now auto-generate reference documentation for both released and development versions of Vitess.
This means that developers spend less time performing mechanical documentation changes and more time on features, bug fixes, and more in-depth documentation. End users benefit from more reliably up-to-date reference docs.
In addition, Vitess commands now support shell completion:

Vtctldclient
We have completed the migration of all client commands to vtctldclient. The legacy vtctl/vtctlclient binaries are now fully deprecated and we plan to remove them in Vitess 19.
This migration provides several benefits:
Clean separation of commands makes it easier to develop new features without impacting other commands.
It presents an API that other clients (both Vitess and third-party) can use to interface with Vitess.
It enables future features. For example, we can now use configuration files and start building support for dynamic configuration.
VReplication and online DDL
We now have the ability to import data with foreign key relationships in a way that properly maintains those relationships.
Additionally, we now support near-zero downtime migration of data from an external database. Previously, there was a perceptible cutover duration during which queries would error out.
Online DDL can now provide better progress estimates.
Point in time recoveries
While Vitess has supported Point in Time Recovery for years, the functionality was dependent on running a binlog server and was not widely used. In this release, we've added the ability to restore to a specific timestamp without relying on an external binlog server. Recovery to a known GTID position without a binlog server has been supported since Vitess 17.
The old functionality that relied on a binlog server is now deprecated and will be removed in a future release.
Tablet throttler
The throttler now uses gRPC for communication with other tablets instead of HTTP, except as a fallback during version upgrades. The use of HTTP was a security concern and will be removed in the next release.
Arewefastyet
Arewefastyet, our benchmarking system, now has a new look aimed at improving the reliability and usability of the website. We have also made several bug fixes and enhancements to the benchmarking system.
Try it out
We are very pleased with the great strides we have made with v18 and hope you will be as well. We encourage all current Vitess users and everyone who has been considering it to try this new release! We also look forward to your feedback, which can be provided via Vitess GitHub issues or the Vitess Slack.]]></content>
        <summary><![CDATA[Vitess 18 is now generally available, with a number of new enhancements designed to improve usability, performance, and MySQL compatibility.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing the Fivetran integration</title>
        <link href="https://planetscale.com/blog/announcing-the-fivetran-integration" />
        <id>https://planetscale.com/blog/announcing-the-fivetran-integration</id>
        <published>2023-11-02T12:00:00.000Z</published>
        <updated>2023-11-02T12:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        <author>
          <name>Katie Sipos</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[You can now extract data from PlanetScale and use it as a data source in your extract, load, and transform (ELT) processes with Fivetran.
Today, we are announcing a Fivetran integration for PlanetScale. This fully managed data integration can extract data from your PlanetScale database and safely load it into other destinations for analysis, transformation, and more. The integration with Fivetran is quick to set up with your PlanetScale connection information and benefits from the large Fivetran ecosystem of connectors and destinations.
It is available today in private preview to Fivetran users.
The power of PlanetScale + Fivetran
Not every company can have a fully staffed team of people to manage their data pipelines. Fivetran makes it super easy to send your data to multiple destinations without building one-off bespoke solutions, which takes time and effort. Connecting your PlanetScale database to Fivetran gives you near-immediate access to insights from your database, which can be used in many ways. Here are a few ideas to get you started:
Enrich data from other parts of your business (like Marketing, Sales, and Support) with data directly from your database. Since this is the same data that runs your application, you can be sure it's up-to-date and accurate.
Easily analyze your PlanetScale database data using your preferred querying tool.
Send your data to various data warehouses, where it can be transformed or analyzed.
How to set up the PlanetScale integration in Fivetran
You can request access to the PlanetScale connector in private preview from within Fivetran. See our Fivetran integration documentation for more info.
Once you have access to the connector, you connect to Fivetran as you do for any other Fivetran connector with your PlanetScale database name, host name, username, and password. The integration was built with the new Fivetran Connector SDK, which allows PlanetScale users to benefit from Fivetran's 400+ connectors and 14 destinations. This also allows us to provide PlanetScale-specific documentation directly in the Fivetran UI you will see when setting up the connector.
If you have any questions, feedback, or issues setting up the connector, please contact us, and we will be happy to help out.
How PlanetScale uses Fivetran
At PlanetScale, we send over 30 different data sources through Fivetran to BigQuery, from tools like HubSpot, Salesforce, and Google Ads, to name a few. From there, the data is transformed to help clean up and join data sources that would otherwise involve complex queries or be unable to be joined. Since we have a small Data team, this type of solution is ideal because we could set it up in minutes and quickly enable the rest of the business with data to help them make decisions.
Having all of those sources in BigQuery as our data warehouse also helps us to create a complete customer profile using data from Salesforce, our website, ad and marketing tools, and our own PlanetScale database. This gives us a holistic view of our customers and how we can help them.
Finally, with all this newly transformed and joined data, we can set up in-depth reporting in our BI tools to allow folks from business areas outside of Engineering to easily self-serve our product analytics from our PlanetScale database. Having this setup saves time for our Engineering team, so they don't have to write one-off queries to answer questions about the business, and we don't have a bunch of folks writing one-off queries to the production database!]]></content>
        <summary><![CDATA[Learn how you can extract, load, and transform your PlanetScale data with Fivetran.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing webhooks</title>
        <link href="https://planetscale.com/blog/introducing-webhooks" />
        <id>https://planetscale.com/blog/introducing-webhooks</id>
        <published>2023-10-26T12:00:00.000Z</published>
        <updated>2023-10-26T12:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[You can now automatically trigger a webhook to get notified when certain events occur inside PlanetScale.
Webhooks are an effective way to build integrations between systems that don't necessarily have strong ties with each other. A webhook, which is an outbound HTTP POST callback, can be triggered by various events. With webhooks, you can get near real-time data through automation. While an API must be polled by external services, webhooks can push information when an event occurs.
Along with the PlanetScale CLI and API, webhooks are another feature that makes the PlanetScale database platform extendable and easy to customize for your workflows.
PlanetScale webhooks in action
There are various examples where webhooks in PlanetScale can be helpful, for example:
Creating notifications in Slack, Microsoft Teams, GitHub, and other tools
Integrating with CI/CD processes for the automation of schema changes
Updating external issue trackers like Jira
These examples can be built with the events we are releasing today for webhooks. You can trigger a webhook for the following events in PlanetScale:
branch.ready: The branch is created and ready to connect.
branch.sleeping: The branch is now sleeping.
deploy_request.opened: The deploy request has been opened.
deploy_request.queued: The deploy request has been added to the deploy queue.
deploy_request.in_progress: The deploy request has started running.
deploy_request.schema_applied: The deploy request has finished applying the schema.
deploy_request.errored: The deploy request has stopped due to an error.
deploy_request.reverted: The deploy request has been reverted.
deploy_request.closed: The deploy request has been closed.
If there is an event you want to use that is not included in this list, please contact us and let us know what event you want to trigger a webhook on.
PlanetScale Zapier example
Webhooks can notify your team about new and running deploy requests for your database. One way we've been using them internally is via Zapier and Slack. We've set up a Zapier webhook trigger, which gives us a URL to send our hooks. In PlanetScale, we created a new webhook that triggers all of the deploy_request events. Each time a webhook is received, we have Zapier post a message into a Slack channel notifying us of the progress of a deploy request.

With the data available in the webhook, we were able to produce messages like this:
DR 16: deploy_request.schema_applied by Mike Coutermarsh - https://app.planetscale.com/mike/example-db/deploy-requests/16
Keeping our team informed right in Slack as changes are being made to our database.
PlanetScale API and webhooks
With webhooks, you no longer need to poll the API for updates, like the latest status of a deploy request or when a branch is ready. If you need to take action or need more data, each webhook event pairs nicely with the PlanetScale API for further information or action.
Getting started with webhooks in PlanetScale
If you are on a Base plan and are a database administrator in your organization, you can start using webhooks today. You can create a webhook by visiting your database's settings page.
If you are on an Enterprise plan, contact your account manager to enable webhooks in your organization.
See our webhooks overview docs for how to create and validate a webhook and what events can trigger a webhook.
Webhooks beta
Currently, webhooks are in beta. During this beta phase, we want to hear from you. We picked the webhook events based on the feedback we have heard from users. If you want to see other events for future integrations, please contact us to share. Also, we would love to see what you are building on top of webhooks! Let us know on Twitter by tagging @planetscale!]]></content>
        <summary><![CDATA[You can now automatically trigger HTTP callbacks on events in PlanetScale to build custom integrations, notifications, and workflows.]]></summary>
      </entry>
    
      <entry>
        <title>What is MySQL replication and when should you use it?</title>
        <link href="https://planetscale.com/blog/what-is-mysql-replication-and-when-should-you-use-it" />
        <id>https://planetscale.com/blog/what-is-mysql-replication-and-when-should-you-use-it</id>
        <published>2023-10-25T15:00:00.000Z</published>
        <updated>2023-10-25T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Have you heard about MySQL replication but aren’t sure exactly why you should care?
Having multiple servers for any workload is typically considered best practice. After all, a workload split across multiple servers helps balance out the performance of any application. When it comes to working with your database, though, the benefits may not be as clear.
In this article, you’ll learn about five real-world use cases for implementing MySQL replication.
What is MySQL replication?
Before we get into its use cases, let us briefly describe what MySQL replication is.
MySQL replication is a process that is used to keep multiple MySQL servers in sync. When you first set up a MySQL environment, it is typically with a single server to run your databases. One approach to scaling your database environment is to configure additional servers to contain copies of your database (replicas) that match the primary MySQL server (source).
As data is updated, written to, or deleted from the source, those changes are also dispatched to the replicas.

MySQL replication use cases
Now that we've defined database replication, let's look at when you should use it. The following are some common scenarios where you may consider database replication for improved performance: read-only connections, backups, analytical workloads, high availability, and planned upgrades.
Read-only connections
While a MySQL cluster can contain multiple active database servers, replication is typically configured in an active/passive configuration.
In this setup, the "active" server is the one that all requests will be sent to by default, and the "passive" servers are replicas that contain a read-only copy of the database. While you can't make changes to read-only replicas, you can still connect to them to read data.
Many applications are more read-heavy vs write-heavy. In these applications, setting up a read-only connection can help balance the load between the database servers, leading to higher performance overall. One thing to be aware of is that there may be some delay between replication, so keep this in consideration if your application requires that the data be immediately available after being written.
Backups
Backing up your database is critical and should not be taken lightly.
If something happens and your data gets corrupted or modified unintentionally, good backups can be the difference between your application going down for a period of time or your business collapsing. As important as backups are, they can create a substantial load on your database server. Since replicas typically contain a copy of your database, you can configure your backup system to take a snapshot of a replica.
This can reduce the load on your overall database infrastructure, reducing any kind of performance hit users might see while backing up your database.
Analytical workloads
In analytics solutions, the data from your database is often scanned at a specific point in time and transferred to a data warehouse.
While the process occurs, users of your database may notice performance degradation while data is being parsed into the data warehouse. Replicas are an excellent solution to get around this performance hit. By configuring your analytics solutions to read a replica and not the primary database server, users of your application won’t be affected by this process.
This (and the previous use cases) can be performed on one or even multiple replicas!
High availability
Deviating from read-only workloads, a well-architected replicated environment can help your application stay online.
It’s inevitable that all hardware will fail at one point or another, and your database server is no exception. Even if you have good backups, if your database goes down and it takes several hours to stand up a new server and restore the data, you’re still looking at a potentially lengthy outage. Luckily, with replication enabled, it's as simple as performing a configuration change on a replica to promote it to a new primary server, allowing for data writes.
Paired with a good load-balancing solution, you can potentially bring your database back online in minutes instead of hours or even days!
Planned upgrades
As new versions of MySQL are released, your IT teams should have a strategy to keep your servers updated with the latest version.
In a single-server environment, this often includes maintenance windows where your application is taken offline so the database can be updated. If you have replication configured, you can actually test and validate upgrades on your replicas to ease the pain of performing upgrades. This can also facilitate a rolling upgrade situation where your replicas are upgraded, a replica is promoted to the primary, and finally, the original source is updated.
With this approach, you can perform major upgrades to your MySQL environment with little to no downtime.
If you enjoyed this article, you might like our overview on MySQL partitioning.]]></content>
        <summary><![CDATA[Learn about what database replication is and some real-world use cases of MySQL replication that can benefit your database scalability strategy.]]></summary>
      </entry>
    
      <entry>
        <title>Sync user data between Clerk and a PlanetScale MySQL database</title>
        <link href="https://planetscale.com/blog/sync-user-data-between-clerk-and-a-planetscale-database" />
        <id>https://planetscale.com/blog/sync-user-data-between-clerk-and-a-planetscale-database</id>
        <published>2023-10-20T15:00:00.000Z</published>
        <updated>2023-10-20T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[A common strategy of modern development, especially in serverless applications, is to offload user management to third-party authentication services, commonly referred to as Identity Providers (IdP). While this shifts the focus and responsibility of managing your users’ data to another organization, implementing this strategy brings several tradeoffs:
You don’t have direct access to user data within your own systems.
You can’t perform secondary actions when a user changes their data in some way.
Any API calls where the user’s info is used would require you to call the IdP, making the responses take longer.
In this article, you’ll learn how to address each of these issues using webhooks, a pattern where the IdP can send an HTTP call to your API to inform you of any changes made on their end. To do this, you’ll be using a combination of Clerk, Netlify, and PlanetScale.
For the demo, we’ll be using Orbytal.ink, an open-source “link in bio” web application that uses the services listed above. By the end of this guide, Clerk will be configured to send an HTTP request to a Netlify Function designed specifically for accepting these requests whenever a user is created, updated, or deleted.
To follow along, make sure you have the following:
A PlanetScale account and the PlanetScale CLI installed and configured
A GitHub account
A Netlify account
A Clerk account
Set up the project
Create a PlanetScale database
Start by creating a PlanetScale database using the PlanetScale CLI.pscale database create orbytalink

Next, create a password for that database. The name is arbitrary but the following command uses “defaultpass”:pscale password create orbytalink main defaultpass

Take note of the username and password — you’ll be using those values when setting up Netlify.
Now, open a shell session to the new database you just created:pscale shell orbytalink main

To configure the tables for the database, run the following command in the pscale shell session:CREATE TABLE `users` (
	`id` bigint unsigned PRIMARY KEY AUTO_INCREMENT,
	`username` varchar(120),
	`tagline` varchar(250),
	`display_name` varchar(250),
	`img_url` varchar(500)
);
CREATE TABLE `blocks` (
	`id` bigint unsigned PRIMARY KEY AUTO_INCREMENT,
	`url` varchar(200),
	`type` int,
	`user_id` int,
	`label` varchar(200)
);

For a high level overview on PlanetScale, check out our Introduction to PlanetScale Tech talk.
Set up a Clerk application
Head over to Clerk and create a new application. Make sure to only check "Username" in the "How will your users sign in?" section, then click "Create application".

Next, you’ll have to configure a few more options before the project is properly configured. Select "Users & Authentication" > "Email, Phone, Username" on the left side. Then click the "gear" next to "Email address".

In the modal, enable the "Require" option and make sure "Email verification code" is enabled. Everything else should be disabled. Click "Continue" to accept the settings.

Scroll down a bit and enable "Name", then click the "gear icon".

In the modal, toggle "Require" to on and click "Continue".

Scroll to the bottom of the page and click "Apply changes".
Now you’ll need to grab the API keys for the project so the application will work when deployed. Select "API Keys" on the left and take note of both the "Publishable key" and the "Secret key". You’ll need these values as well when setting up Netlify.
Anyone who has these values can access your Clerk project, so make sure to keep these values secret!

Fork the project and deploy to Netlify
Next, log into GitHub and fork the orbytal.ink project to your own account. Make sure "Copy the main branch only" is NOT enabled as you’ll want all branches forked into your account.
Log into your Netlify account, and click "Add new site" > "Import an existing project".

Select "Deploy with GitHub".

Next, select "Orbytal.ink" from the list of projects in your account.

Make sure to select the clerk-blog-post branch under "Branch to deploy". Then, scroll to the bottom and click "Deploy Orbytal.ink".

Once the initial deployment is done, you’ll need to configure a few environment variables. Select "Site configuration" from the sidebar, and then "Environment variables". Add the following variables:
DATABASE_URL: The connection string for your PlanetScale database formatted as mysql://<DB_USERNAME>:<DB_PASSWORD>@aws.connect.psdb.cloud/orbytalink and replace DB_USERNAME and DB_PASSWORD with the values created earlier in the guide.
CLERK_API_KEY: The private key acquired when configuring the Clerk project.
VITE_CLERK_PUBLISHABLE_KEY: The public key acquired when configuring the Clerk project.

Finally, re-deploy the project with the new settings by going back to "Deploys", and then "Trigger deploy" > "Deploy site".

To test that your deployment works, navigate to the URL provided by Netlify and you should see the following if everything was built and deployed correctly.

Add webhooks
Now that your project is up and running, let's add the webhook so we can pass specific user information to the PlanetScale database as users are created or updated.
Configure Clerk webhooks
In the overview of the Clerk project, select "Webhooks" from the sidebar, and then "Add endpoint".

Enter the "Endpoint URL" formatted as <NETLIFY_URL>/.netlify/functions/clerk_webhook, select the topmost user element under "Message Filtering", and scroll to the bottom and click "Create".

Create and deploy the Netlify Function
Next, you’ll need to create a Netlify Function that will act as the endpoint to receive messages from our Clerk project. Luckily, all of this work can be done directly within GitHub. With the project open in GitHub, start by switching to the clerk-blog-post branch.

Click "Add file" > "Create new file".

Name the file functions/clerk_webhook.ts and paste the following into that file. Note the comments in the code that describe what the important parts of the function do.// functions/clerk_webhook.ts
import { HandlerEvent, HandlerContext } from '@netlify/functions'
import { getDb } from './utils/lib'
import { blocks, users } from './utils/db/schema'
import { eq } from 'drizzle-orm'
// This type describes the structure of the incoming webhook
type ClerkWebhook = {
  data: {
    first_name: string
    last_name: string
    image_url: string
    username: string
  }
  type: string
}
const handler = async (event: HandlerEvent, context: HandlerContext) => {
  if (event.body) {
    // 👉 Parse the incomign event body into a ClerkWebhook object
    const webhook = JSON.parse(event.body) as ClerkWebhook
    try {
      const db = getDb()
      // 👉 `webhook.type` is a string value that describes what kind of event we need to handle

      // 👉 If the type is "user.updated" the important values in the database will be updated in the users table
      if (webhook.type === 'user.updated') {
        await db
          .update(users)
          .set({
            display_name: `${webhook.data.first_name} ${webhook.data.last_name}`,
            img_url: webhook.data.image_url
          })
          .where(eq(users.username, webhook.data.username))
      }

      // 👉 If the type is "user.created" create a record in the users table
      if (webhook.type === 'user.created') {
        await db.insert(users).values({
          display_name: `${webhook.data.first_name} ${webhook.data.last_name}`,
          img_url: webhook.data.image_url,
          username: webhook.data.username
        })
      }

      // 👉 If the type is "user.deleted", delete the user record and associated blocks
      if (webhook.type === 'user.deleted') {
        const dbuser = await db.query.users.findFirst({
          where: eq(users.username, webhook.data.username)
        })
        console.log('dbuser', dbuser)
        if (dbuser) {
          await Promise.all([
            db.delete(users).where(eq(users.id, dbuser.id)),
            db.delete(blocks).where(eq(blocks.user_id, dbuser.id))
          ])
        }
      }

      return {
        statusCode: 200
      }
    } catch (err) {
      console.error(err)
      return {
        statusCode: 500
      }
    }
  }
}
export { handler }

Click on "Commit changes" to open a modal. Feel free to add an extended description if you like, and click "Commit changes" in the modal to create the file in the branch.

Once the file is saved, Netlify should automatically deploy the latest version of the web application.
Test the function
Now that the new code is deployed, we can test the three main operations that were configured in Clerk. Open the web app using the Netlify address and click "Create your profile".

Create an account using your own email address. When prompted for a verification code, grab it from your email and enter it.

If you get a message stating "The authentication settings are invalid", be sure to double check your configuration in Clerk.
Once the account is created, you should be redirected back to your version of Orbytalink asking for some more details. Add a tagline and click "New block". Select "Twitter" and enter your Twitter username. Finally, click "Save".

You should be redirected to your profile that not only shows you the tagline and Twitter block but also your name and username you entered into the Clerk sign-up form. That’s because when you created your account, Clerk sent a message to the Netlify Function you created earlier (clerk_webhook.ts) which saved this information into the PlanetScale database.

If you explore the Netlify function that is used by the home page to retrieve user data, you’ll notice that there are no API calls to Clerk. It grabs information directly from the PlanetScale database and returns it to the React front end.// functions/profiles.ts
import { HandlerEvent, HandlerContext } from '@netlify/functions'
import { users } from './utils/db/schema'
import { eq } from 'drizzle-orm'
import { createResponse } from './utils/netlify_helpers'
import { getDb } from './utils/lib'
const handler = async (event: HandlerEvent, context: HandlerContext) => {
  const { username } = event.queryStringParameters as any
  const db = getDb()
  if (username) {
    const user = await db.query.users.findFirst({
      where: eq(users.username, username),
      with: {
        blocks: true
      }
    })
    return createResponse(200, user)
  } else {
    const user_rows = await db.select().from(users).limit(30)
    return createResponse(200, user_rows)
  }
}
export { handler }

The following sequence diagram explains exactly how this overall system works:
The user creates an account in Clerk.
Clerk sends a message to the Netlify function once the user is created.
Netlify writes that users’ info to the PlanetScale database.

Using this flow, we can utilize Clerk to handle authentication and user management, and still have the users’ information available to us directly in our PlanetScale database!
Conclusion
Webhooks are extremely useful when using third-party systems where you need to be notified if a specific event happens in a place you don’t have full access to. In this guide, I showed you how you can use webhooks to receive user information from an IdP, but that’s only one example where these can be used in a production system.
Have you used webhooks in your own projects? Let us know on Twitter by tagging @planetscale!
If you enjoyed this article, you might also like our comprehensive guide on integrating AWS Lambda functions with PlanetScale.]]></content>
        <summary><![CDATA[Learn how to sync user data from a Clerk project into your PlanetScale MySQL database with webhooks using Netlify and Netlify Functions.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing database reports</title>
        <link href="https://planetscale.com/blog/introducing-database-reports" />
        <id>https://planetscale.com/blog/introducing-database-reports</id>
        <published>2023-10-16T16:01:00.000Z</published>
        <updated>2023-10-16T16:01:00.000Z</updated>
        
        <author>
          <name>Frances Thai</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[There is a new way to stay up-to-date with your PlanetScale databases: database reports curated by us and delivered weekly to your inbox.
Starting today, you will have a report for your most active database in your inbox.
What's in the report?
These weekly reports give you an easy way to see database performance and activity and make this information actionable. Initially, these reports will consist of:
Slow database queries

Powered by Query Insights, the report will show your database's slowest queries in the last week. Clicking on these queries brings you directly into Query Insights, where you can find detailed information to help you make your queries more performant.
Deploy requests

The report also helps you understand what's changed in your database in the last week. You'll see deploy requests from the previous week, what schema changes were made, and have easy access to the deploy request itself.
Database storage changes

Whether your database storage has increased or decreased, you'll see a table-by-table summary of the current storage size and how much that size has changed in the last week.
Managing your weekly reports

Initially, all users will be subscribed to weekly reports for the most active database in their organizations.
While we believe these reports will be helpful to you, we don't like getting too many emails either. That's why we've made it easy to unsubscribe to these reports if you need to take a break. You can unsubscribe directly from the email or in your user settings.
When you're interested in receiving weekly reports again or in receiving weekly reports about any of your databases, you can subscribe from the same settings page.
What's next?
We're always looking for ways to help you succeed by getting the most out of your PlanetScale databases.
Look out for future iterations and additions to the weekly review email, and if you have additional questions or have other metrics you would like to see in your weekly reports, please don't hesitate to contact us.]]></content>
        <summary><![CDATA[Bringing weekly reports of the most important parts of your database, straight to your inbox.]]></summary>
      </entry>
    
      <entry>
        <title>Distributed caching systems and MySQL</title>
        <link href="https://planetscale.com/blog/distributed-caching-systems-and-mysql" />
        <id>https://planetscale.com/blog/distributed-caching-systems-and-mysql</id>
        <published>2023-10-11T15:00:00.000Z</published>
        <updated>2023-10-11T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[As the usage of your app grows, performance can steadily decline.
There’s nothing necessarily surprising by that statement, but what is surprising is the number of bottlenecks that can surface and the options available to you to fix them. One such bottleneck can be directly related to the time it takes to read from and write to your database. After all, behind the complexities of a relational database, you’re still working with storage systems that have inherent IO latency.
This is where a well-architected caching system can help.
A good caching system can reduce the load on your database and increase the general performance of your application. A faster application results in happier users and potentially more revenue, which is always a good thing! However, caching systems have their own setup complexities, along with a number of gotchas that might creep up unexpectedly.
In this article, we’ll explore backend caching, how to implement it, how caching works with MySQL, and potential issues to watch out for.
What is a cache?
At its very core, a cache is a component that stores data so that the data may be accessed more quickly.
Caches can be either hardware or software-based. Oftentimes, caching systems store data in memory, which allows accessing and manipulating that data to be much faster since it’s not working with traditional storage systems like solid state or spinning hard drives. These caching systems can be run locally on a server but can also be configured to work independently on their own system.
Caches that run independently of other services are known as distributed caches.
Redis and memcache are two very popular distributed caching systems that can be accessed by external systems. These systems will leverage the memory of the system they are running on to store data in a key/value setup, allowing developers to quickly call back data based on a specific key. They can even be configured on multiple servers as a cluster, adding to their distributed nature.
So how can caching and MySQL work together for a faster application?
Want to learn about how Instagram scaled performance in the early days? Check out this Database Caching Tech talk from Rick Branson, Instagram's first full-time backend engineer, where he spills it all.
Implement caching into a MySQL environment
While MySQL does some minimal caching in the form of query caching, it leaves much to be desired.
So in order to properly utilize a caching system, it needs data. MySQL does not have any built-in mechanisms to hydrate (or fill) a cache, so the responsibility for this task will lie in the application code. Ultimately, you’ll need to build a system that will return data from a cache if it exists there, or return it from the database and hydrate the cache if it’s not.
There are two common patterns that can be used when designing a caching system: look-aside and look-through.
Example: retrieving follower count for a user
For the examples that follow, we’ll be using the following database diagram. It mimics a social media platform with two tables: users and followers.

Each example will show how caching can increase load times for viewing the number of followers a given user has. This may sound like a relatively simple problem, but consider the load that would be placed on a database if a user has a particularly high number of followers. Every time the user’s profile is viewed, a SELECT COUNT query would have to be run against a table to simply return a number.
The query to look up the number of followers would be this:SELECT COUNT(*) FROM followers WHERE follower_id = ?

Look-aside cache
A look-aside cache is a system that sides outside of the data access path of your database.
Typically this setup has two distinct steps in its workflow. Using our example of retrieving follower count, the application would first check to see if the cache contains the follower count for the requested user. If the cache does not contain that information (this is known as a cache miss), the code would grab the value directly from the database, populate the cache for future requests, and finally, return it to the caller.
Here is what that flow might look like visually:

Look-through cache
A look-through cache is a system that sits in line with the data access path, in front of the database.
This scenario would have the code hit the cache directly. If a cache miss is experienced, it would be up to the cache software to request the data from the database and populate itself. During this time, the caller would wait for that part of the process to complete before returning to the user.
This is a diagram that demonstrates how a look-through cache setup might look like:

Potential issues with caching
In the following sections, we’ll explore some potential issues with caching systems, along with ways that these problems can be mitigated.
Inconsistent data
One issue is that data within the cache is not accurate or up to date.
Let's say our caller requests the follower count at 9:00 am and a value of 1,120 is returned. Then at 9:15 am, that user gains an extra 1,500 followers because a post went viral. You’ll need to consider a method by which the cache is updated.
The first potential solution is to simply update the cache whenever a new follower is added.
The benefit of this approach is that it is relatively straightforward. Instead of just issuing an UPDATE statement to the database, you’d also increment the value currently stored in the cache. Where this becomes problematic is that for each of those 1,500 new followers, you’re also updating your cache (and your database) 1,500 times.
The second solution is to store a cache expiration with each value.
In this scenario, you could have a cache expiration value of 20 minutes. When the cache is populated at 9:00 am, that value will stay in the cache until 9:20 am. When the users’ follower count increases by 1500, any callers between 9:15 am and 9:20 am will receive the old count, but it's a short enough window that it may be acceptable to prevent systematic issues with your architecture.
Each solution has its tradeoffs, which is something to consider for your environment when configuring a cache.
Thundering herd
The thundering herd problem refers to a time in which too many callers are trying to update the cache at the same time, causing performance issues.
Using the same example from the previous section, let's assume you had 3,000 clients attempt to request data from the cache at exactly 9:20 am. All 3,000 processes would determine that the cache is expired and they’d attempt to rehydrate the cache simultaneously. Not only could this cause issues, but it's entirely unnecessary.
A solution to this issue would be to configure what's called a “cache lease.”
Essentially, each client would have a unique identifier. When the first caller attempts to rehydrate the cache, the system will note its unique ID as the process responsible for updating that specific value. All other clients would receive the old value while the cache is being updated.
Once the value is updated, the lease is released until the value in the cache expires again, and future requests will receive the most up-to-date value.]]></content>
        <summary><![CDATA[Learn what distributed caching systems are, how they complement MySQL databases, and potential issues you might face when implementing them.]]></summary>
      </entry>
    
      <entry>
        <title>What is MySQL partitioning?</title>
        <link href="https://planetscale.com/blog/what-is-mysql-partitioning" />
        <id>https://planetscale.com/blog/what-is-mysql-partitioning</id>
        <published>2023-10-10T15:00:00.000Z</published>
        <updated>2023-10-10T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When the performance of your database server starts to decline from general usage, you have several options for optimizing it.
One common method for optimizing your MySQL database is through partitioning. In this article, we’ll cover the basics of MySQL partitioning, how to apply partitioning to your database, and we'll discuss how it’s related to sharding.
The basics of MySQL partitioning
Partitioning is the idea of splitting something large into smaller chunks. In MySQL, the term “partitioning” means splitting up individual tables of a database.
When you partition a table in MySQL, the table is split up into several logical units known as partitions, which are stored separately on disk. When data is written to the table, a partitioning function will be used by MySQL to decide which partition to store the data in. The value for one or more columns in a given row is used for this sorting process.
MySQL provides several partitioning functions out of the box, a few of which we’ll explore in the next section.

How to partition your tables in MySQL
There are a variety of methods available to database developers to partition their data based on the needs of the application.
Partitioning with RANGE
One of the simplest examples is the RANGE partitioning strategy. By using RANGE, you provide a set of numerical ranges that MySQL will use to place the data. This will be used for each partition you wish to create, although MINVALUE and MAXVALUE can be used to set the outermost ends of the ranges.
The following CREATE TABLE sample uses the concept of a library, where books are sorted into separate partitions based on their publication_year:CREATE TABLE library_books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255),
    publication_year INT,
    isbn VARCHAR(13),
    genre VARCHAR(50),
    checked_out BOOLEAN DEFAULT FALSE,
    checked_out_date DATE,
    due_date DATE,
    shelf_location VARCHAR(50)
)
PARTITION BY RANGE (publication_year) (
    PARTITION p0 VALUES LESS THAN (2001),
    PARTITION p1 VALUES LESS THAN (2011),
    PARTITION p2 VALUES LESS THAN (2021),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

Partitioning with LIST
Another relatively simple example is the LIST strategy.
When using LIST, you’d provide a fixed set of values along with each partition to inform MySQL how to store the data. This strategy works well in situations where you know exactly what values your columns should contain. One of the downsides, however, is that attempting to insert a row with an invalid value will cause an error.
The following statement shows the same example as above but partitioned based on a set of known author values.CREATE TABLE library_books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255),
    publication_year INT,
    isbn VARCHAR(13),
    genre VARCHAR(50),
    checked_out BOOLEAN DEFAULT FALSE,
    checked_out_date DATE,
    due_date DATE,
    shelf_location VARCHAR(50)
)
PARTITION BY LIST (author) (
    PARTITION p0 VALUES IN ('William Shakespeare', 'Jane Austen', 'George Orwell'),
    PARTITION p1 VALUES IN ('J.K. Rowling', 'Agatha Christie', 'Stephen King'),
    PARTITION p2 VALUES IN ('J.R.R. Tolkien', 'Gabriel García Márquez', 'Toni Morrison'),
    PARTITION p3 VALUES IN ('Haruki Murakami', 'Neil Gaiman', 'Chimamanda Ngozi Adichie')
);

Partitioning with KEY
So what happens if you want to take advantage of partitioning but just want to let MySQL figure it out?
This is where the KEY partitioning strategy works well. By using KEY, you are letting MySQL use the primary key (or a unique key) of a table to determine how to sort the data. An internal algorithm will be used to determine how to best sort the data evenly across a specified number of partitions.
The following example uses KEY on the same library_books table from the previous examples:CREATE TABLE library_books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255),
    publication_year INT,
    isbn VARCHAR(13),
    genre VARCHAR(50),
    checked_out BOOLEAN DEFAULT FALSE,
    checked_out_date DATE,
    due_date DATE,
    shelf_location VARCHAR(50)
)
PARTITION BY KEY()
PARTITIONS 4;

These are only a few select types that can be used in partitioning. To explore all possible strategies, please refer to the MySQL docs.
Benefits and drawbacks of MySQL partitioning
Partitioning provides several benefits when applied correctly.
One of these benefits is helping MySQL query data in large tables. Using the RANGE example from above, if you wanted to query all books published in 2007, MySQL would only need to scan p1 and not the entire table to find the requested data. If you need to query across partitions, these operations can be performed in parallel, which can be even faster if the partitions are stored across multiple storage devices.
Another potential benefit of MySQL partitioning can be seen with select maintenance operations.
For example, truncating all of the data in a partition is faster than performing a DELETE ... WHERE statement on the same data. Backup operations can also be performed on individual partitions, which can reduce the load on the overall server. Rebuilding indexes or reclaiming unused space can also be done per partition instead of on the entire table.
Partitioning is no silver bullet when it comes to improving performance as there are several drawbacks.
Depending on the partitioning strategy you want to use, you may be limited on the available data types for columns you want to partition on. For instance, ENUMs are not permitted on most strategies. Additionally, you need to properly understand your access patterns as unbalanced partitions can limit the performance gains you’d get from partitioning in the first place.
If you are in the process of deciding whether to partition your data in MySQL or not, it’s worth performing the work on a test system to ensure it will provide a tangible benefit to your environment.
Sign up for our free Database scaling course to learn more about scaling strategies like partitioning, replication, sharding, and more!
Partitioning vs. sharding
It’s no secret that PlanetScale has a focus on the ability to shard databases, but how does that differ from partitioning?
The concepts behind partitioning and sharding are very similar. The key differences are that partitioning occurs on the same server and is supported by MySQL natively, whereas sharding a database splits tables across different servers and requires external mechanisms to achieve this. Since partitioning involves one server, it is considered scaling vertically, whereas sharding is scaling horizontally.
PlanetScale offers full-managed Vitess clusters with explicit horizontal sharding.
Each production database branch has at least one primary MySQL node along with one failover node. While sharding is not configured out of the box, it’s something that Vitess handles very well using a combination of a stateless proxy which routes queries to the proper MySQL node, and a topology server to keep the entire system aware of any changes to individual nodes.
Because we’re built with a focus on sharding and horizontal scalability, partitioning is not supported in PlanetScale.
Partitioning, in reality, is a stepping stone to greater performance. Since the focus is put on a single server, it still creates a single point of failure should something catastrophic happen to your MySQL server. Because of our expertise in sharding, partitioning adds little value to what we already do.
Combined with resource-based pricing, you get the best of both horizontal and vertical scaling with your PlanetScale database.
To explore another scalability method known as sharding, read "How does database sharding work" on our blog.]]></content>
        <summary><![CDATA[Learn the basics of MySQL partitioning, including partitioning with range, list, and key strategies, as well as how partitioning relates to database sharding.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL High Availability: Connection handling and concurrency</title>
        <link href="https://planetscale.com/blog/mysql-high-availability-connection-handling-concurrency" />
        <id>https://planetscale.com/blog/mysql-high-availability-connection-handling-concurrency</id>
        <published>2023-10-10T13:00:00.000Z</published>
        <updated>2023-10-10T13:00:00.000Z</updated>
        
        <author>
          <name>Matthias Crauwels</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Deep dive into MySQL’s connection handling mechanisms for optimal connection pooling and improved concurrency.]]></summary>
      </entry>
    
      <entry>
        <title>Personalizing your onboarding with Markdoc</title>
        <link href="https://planetscale.com/blog/personalizing-your-onboarding-with-markdoc" />
        <id>https://planetscale.com/blog/personalizing-your-onboarding-with-markdoc</id>
        <published>2023-10-06T12:00:00.000Z</published>
        <updated>2023-10-06T12:00:00.000Z</updated>
        
        <author>
          <name>Ayrton</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We recently released a new and improved onboarding flow for PlanetScale. The goal of our new onboarding is to help developers quickly connect and query their PlanetScale databases.
How to connect varies greatly by the language and framework your application uses. Each framework has small quirks, so it was important to us that no matter what language/framework your application uses, we offer a one-track path for you to get connected.

The key to building out this onboarding was Markdoc. This post will go into more detail about how we built our product onboarding using Markdoc.
More flexibility with Markdoc
Markdoc, created by Stripe, is a Markdown-based syntax for creating custom documentation sites.
One of the reasons we're able to ship so quickly at PlanetScale is because we prioritize easy-to-use tools that allow anyone at the company to contribute. For this reason, using Markdown and GitHub for our product onboarding was an obvious choice for us.
As we started building, we quickly realized we wanted to have more interactive and personalized content. Static Markdown content just wasn't going to cut it. This is when we had the idea to explore using Markdoc.
The Markdoc syntax is a superset of Markdown, specifically the CommonMark specification. This means you can write content in the Markdown you know and love, but also allows you to extend it to add custom attributes, custom tags, and use functions and/or variables.
Building out the onboarding
Let's dive into some of the code. The following snippet is part of the Markdown used to render a connection tutorial in the onboarding flow.

You'll notice that we are using variables here inside the Markdown, i.e., $user, $host, $database, and $password. The goal was for each path in onboarding to be custom to the user and as easy as possible for them to follow. Providing easily copy and pastable code specific to the selected framework was key. And this is where variables became important.
Variables in Markdoc
Variables let you customize your Markdoc documents at runtime. We use variables to inline the users credentials directly into the content rather than using a static placeholder value.
This is a lot like blade templates in Laravel and erb templates in Ruby on Rails. The following snippet shows how we set up variables to populate those Markdown fields.
All credentials have been invalidated prior to publishing this. We are leaving the full, unblurred credentials in for clarity.import React from 'react'
import { parse, renderers, transform } from '@markdoc/markdoc'

export default function Page() {
  const config = {
    variables: {
      host: 'us-east.connect.psdb.cloud',
      user: 'mpl0y3jv3a92h4qc4ufn',
      database: 'beam',
      password: 'pscale_pw_V8db13jnq5mrOWcGFn6GTs6AerDI7A0womsmnJ1qxOc',
      ssl_ca: '/etc/ssl/certs/ca-certificates.crt'
    }
  }

  const doc = `# Configure your application\n…`
  const ast = parse(doc)
  const content = transform(ast, config)
  const children = renderers.react(content, React, {})

  return <div>{children}</div>
}

Nodes in Markdoc
To help contextualize the file that the onboarding code snippets live in, we want to extend the code blocks to accept an additional file attribute. Markdoc nodes enable you to customize how your document renders without using any custom syntax.
The following example extends the code snippet from the previous section by adding a new fence node, which displays the filename above a code snippet.import React from 'react'
import { parse, renderers, transform } from '@markdoc/markdoc'

export default function Page() {
  const config = {
    nodes: {
      fence: Fence.scheme
    },
    variables: {
      host: 'us-east.connect.psdb.cloud',
      user: 'mpl0y3jv3a92h4qc4ufn',
      database: 'beam',
      password: 'pscale_pw_V8db13jnq5mrOWcGFn6GTs6AerDI7A0womsmnJ1qxOc',
      ssl_ca: '/etc/ssl/certs/ca-certificates.crt'
    }
  }

  const doc = `# Configure your application\n…`
  const ast = parse(doc)
  const content = transform(ast, config)
  const children = renderers.react(content, React, { components: { Fence } })

  return <div>{children}</div>
}

function Fence({ children, file, language }) {
  return (
    <div>
      <div>{file}</div>

      <pre>
        <code className={`language-${language}`}>children</code>
      </pre>
    </div>
  )
}

Fence.scheme = {
  render: Fence.name,
  children: ['pre', 'code'],
  attributes: {
    file: {
      type: String
    },
    language: {
      type: String
    }
  }
}

Further customizations
One challenge we have seen users face is deciding which SSL certificate to use when connecting securely to PlanetScale. To address this, we built a common component that will swap out the certificate based on the users detected operating system.
This also extends the snippet from the previous section, adding in functions to detect the user's operating system and return the correct string for the ssl_ca variable.import React from 'react'
import { parse, renderers, transform } from '@markdoc/markdoc'

export default function Page({ userAgent }) {
  const platform = connectPlatform(userAgent)
  const sslCertificate = connectSslCertificate(platform)

  const config = {
    nodes: {
      fence: Fence.scheme
    },
    variables: {
      host: 'us-east.connect.psdb.cloud',
      user: 'mpl0y3jv3a92h4qc4ufn',
      database: 'beam',
      password: 'pscale_pw_V8db13jnq5mrOWcGFn6GTs6AerDI7A0womsmnJ1qxOc',
      ssl_ca: sslCertificate
    }
  }

  const doc = `# Configure your application\n…`
  const ast = parse(doc)
  const content = transform(ast, config)
  const children = renderers.react(content, React, { components: { Fence } })

  return <div>{children}</div>
}

function connectPlatform(userAgent) {
  userAgent = userAgent.toLowerCase()

  switch (true) {
    case /linux/.test(userAgent):
      return 'linux'
    case /mac/.test(userAgent):
      return 'mac'
    case /windows/.test(userAgent):
      return 'windows'
    default:
      return 'ubuntu'
  }
}

function connectSslCertificate(platform) {
  switch (platform) {
    case 'linux':
      return '/etc/ssl/certs/ca-certificates.crt'
    case 'mac':
      return '/etc/ssl/cert.pem'
    default:
      return '/etc/ssl/certs/ca-certificates.crt'
  }
}

function Fence({ children, file, language }) {
  return (
    <pre>
      <div>{file}</div>

      <code className={`language-${language}`}>children</code>
    </pre>
  )
}

Fence.scheme = {
  // …
}

Because a user's development environment is often different from their production environment, we also had to add a selector that allows users to select the certificate on their own. The final code for that is shown below.import React, { createContext, useState } from 'react'
import { parse, renderers, transform } from '@markdoc/markdoc'

const Platform = createContext({ platform: null, setPlatform: () => {} })

export default function Page({ userAgent }) {
  const initialPlatform = connectPlatform(userAgent)
  const [platform, setPlatform] = useState(initialPlatform)

  const sslCertificate = connectSslCertificate(platform)

  const config = {
    nodes: {
      fence: Fence.scheme
    },
    variables: {
      host: 'us-east.connect.psdb.cloud',
      user: 'mpl0y3jv3a92h4qc4ufn',
      password: 'pscale_pw_V8db13jnq5mrOWcGFn6GTs6AerDI7A0womsmnJ1qxOc',
      ssl_ca: sslCertificate
    }
  }

  const doc = `# Configure your application\n…`
  const ast = parse(doc)
  const content = transform(ast, config)
  const children = renderers.react(content, React, { components: { Fence } })

  return (
    <Platform.Provider value={{ platform, setPlatform }}>
      <div>{children}</div>
    </Platform.Provider>
  )
}

function connectPlatform(userAgent) {
  // …
}
function connectSslCertificate(platform) {
  // …
}

function Fence({ children, file, language }) {
  const { platform } = useContext(Platform)

  const isDotEnvFile = file === '.env'
  const isWindowsPlatform = platform === 'windows'

  return (
    <div>
      <div>
        <div>{file}</div>

        {isDotEnvFile && (
          <select
            onChange={(event) => {
              setPlatform(event.target.value)
            }}
            value={platform}
          >
            <option value='linux'>Linux</option>
            <option value='mac'>macOS</option>
            <option value='windows'>Windows</option>
          </select>
        )}
      </div>

      <pre>
        <code className={`language-${language}`}>children</code>
      </pre>

      {isDotEnvFile && (
        <div>
          {isWindowsPlatform && (
            <>
              For Windows you may need to download a root certificate to connect securely.{' '}
              <a href='/docs/vitess/connecting/secure-connections#windows'>Learn more</a>
            </>
          )}
          {!isWindowsPlatform && (
            <>
              View the{' '}
              <a href='/docs/vitess/connecting/secure-connections#ca-root-configuration'>
                certificate authority root
              </a>{' '}
              paths for the SSL CA details.
            </>
          )}
        </div>
      )}
    </div>
  )
}

Fence.scheme = {
  // …
}

Outcomes using Markdoc
We are extremely happy with how the onboarding turned out, and based on some early data, it seems to be a huge win for new users as well. Working with Markdoc made the building process incredibly simple and straightforward. We're already finding that maintenance, like adding new frameworks, is very manageable as well.
We'd love to hear if you've been able to get your hands on Markdoc yet. If you'd like to experience our onboarding process first-hand, make sure you sign up for a PlanetScale account to give it a go.]]></content>
        <summary><![CDATA[Learn how we utilized Markdoc to create custom, extendable product onboarding at PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale vs. Amazon Aurora</title>
        <link href="https://planetscale.com/blog/planetscale-vs-amazon-aurora" />
        <id>https://planetscale.com/blog/planetscale-vs-amazon-aurora</id>
        <published>2023-10-05T19:59:23.344Z</published>
        <updated>2023-10-05T19:59:23.344Z</updated>
        
        <author>
          <name>PlanetScale</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[What is PlanetScale?
PlanetScale is an extremely fast, scalable, and reliable database platform for Postgres and Vitess. Vitess is the popular open-source database management technology created at YouTube that enables horizontal sharding of MySQL abstracted from the application layer. It’s designed to improve database management and provide a performant, fault-tolerant database that can handle large workloads.
PlanetScale provides customers with the power of Vitess, offering a fully managed and performant MySQL database service with horizontal sharding, Git-like schema change workflows, automatic backup, recovery, advanced query analytics, and multi-region replication capabilities. PlanetScale can be deployed on multiple cloud platforms, including Amazon Web Services (AWS) and Google Cloud Platform (GCP).
What is Amazon Aurora?
Amazon Aurora is a cloud-native relational database service developed by Amazon Web Services (AWS). It provides a scalable and high-performance database solution compatible with MySQL and PostgreSQL database management systems. It offers several performance enhancements that improve the speed and reliability of these databases. It uses a distributed architecture that replicates data across multiple storage nodes, providing fast and reliable read and write access to your data. It offers multi-region replication features. As a fully managed database service, Amazon Aurora takes care of many of the tasks associated with traditional database management, such as provisioning, patching, backup, and recovery. It also integrates with other AWS services, making it easy to manage your databases and applications within the AWS ecosystem.
Comparisons: PlanetScale vs. Amazon Aurora
PlanetScale vs. Aurora architecture and deployment
PlanetScale is MySQL-compatible and democratizes all of the data management and scalability features of Vitess. It's a highly scalable database platform with numerous tenancy and deployment options. Multi-tenant is the default tenancy for basic PlanetScale plans. Both multi-tenant and single-tenant deployments are available on PlanetScale Enterprise and Managed plans. PlanetScale Managed is a packaged data plane. It includes all compute, live data, and backups required to run the PlanetScale product inside of an AWS or GCP sub-account owned by the user.
Amazon Aurora supports both MySQL and PostgreSQL deployments. It only supports AWS cloud infrastructure and is intentionally hyper-compatible with AWS tooling. It supports both multi and single-tenant deployments. In comparison to Amazon RDS, which is Amazon’s other managed relational database offering, Aurora is more performant and designed for more intensive use cases.
PlanetScale vs. Amazon Aurora scale and performance
PlanetScale and Amazon Aurora both approach database scale and performance in a different way. PlanetScale is built for high availability, performance, and scale. It’s able to handle bursts in traffic and heavy IOPS with ease, all while reducing the overhead associated with database management as a managed service. Amazon Aurora is AWS’s more performant managed relational database solution. This is due to Aurora’s approach to point-in-time-recovery (PITR), multi-availability zone (AZ) replication, and the usage of S3 as the unified underlying storage.
Both PlanetScale and Aurora can scale horizontally, but the method differs in both solutions. PlanetScale horizontally shards your data abstracted from the application-layer. This is often a more performant, reliable, and cost-conscious way to shard MySQL. With Aurora, users can horizontally scale read operations or add additional instances to distribute database operations to. Additionally, both PlanetScale and Aurora users can vertically scale up instance size and resources to meet increased demand.
When a new MySQL node is added in PlanetScale, load balancing is automatically implemented. The way that PlanetScale load balances is possible because of a few Vitess components called VTTablet and VTGate as well as PlanetScale’s edge infrastructure. VTGate is an application-level query routing layer while VTTablet behaves as a middleware between VTGate and MySQL. This facilitates the flow of connections from the application, to a load balancer, to VTGate, to VTTablet, and then finally to MySQL. The VTTablet will manage connection pooling and perform health checks for MySQL instances, updating its status in a topo-server. Meanwhile, VTGate determines available tablets and their roles via the topo server and reroutes traffic as needed. PlanetScale’s edge infrastructure then acts as a frontend load balancer, terminating MySQL connections in the closest edge location.
Every single PlanetScale database gets all of this underlying infrastructure to ensure the database remains available. Load balancers will distribute traffic between the production branch and replicas as well as balance connections, IOPS, and resource usage.
Aurora load balances and manages connections by splitting reads/writes and using connection pooling. Readers in the replica database cluster can accept writes and forward to the primary writer instance. To perform queries, users can use a reader endpoint to automatically load balance amongst all of their available Aurora read replicas. Additionally, Aurora maintains high availability by self-healing, rebalancing, maintaining write capability, and providing quick recovery from crashes and failures.
With technically unlimited connections, PlanetScale is equipped to handle high concurrency. PlanetScale offers connection pooling, which scales with your cluster and enables connection requests to queue. Aurora scales connections by splitting reads/writes and uses connection pooling as well.
PlanetScale vs. Amazon Aurora pricing
PlanetScale costs are built for scale. With the first paid tier starting at $5 per month with options for resource-based pricing, users can linearly grow their workload and infrastructure costs. PlanetScale Enterprise and Managed plans are completely custom and will cater to the user’s specific needs.
For users that experience high workload variability, Amazon Aurora offers Serverless V2. There are many ways to purchase Aurora, including paying-as-you-go, reserving resources, and opting into I/O Optimized. I/O Optimized is a pricing model available for customers with high Input/Output operations.
PlanetScale plans tend to be objectively cheaper on infrastructure than running on Amazon Aurora. This is because PlanetScale right-sizes resources on many commodity-grade AWS EC2 instances, or the equivalent on other cloud providers, which prevents over provisioning and keeps cost in-line with the user’s actual workload. Additionally, users get more infrastructure with PlanetScale, including the immense power of Vitess, and a database cluster that provides the same if not greater fault tolerance than other commercial solutions.
PlanetScale vs. Amazon Aurora operations
Both PlanetScale and Amazon Aurora are managed database systems that aim to reduce complex database administration tasks. PlanetScale and Aurora automate basic database operations and provide monitoring, logging, and auditing solutions.
PlanetScale provides users with production branches, which are highly available databases intended for production traffic. They are conceptually equivalent to the primary instance in other commercial databases. Production branches automatically failover to one of two default replicas to improve redundancy. The two replicas reduce the load on the primary branch, and enable users to scale read and write operations. Additional replicas and read-only regions are configurable.PlanetScale additionally offers built-in auditing and log retention, as well as integrations with popular third-party monitoring tools such as Datadog.
Aurora provides 15 replicas per primary instance in different availability zones. The system automatically scales and replicates storage in 10GB increments, distributing it 6-times across 3 availability zones on a single unified storage layer. It does this without affecting compute resources.
Aurora has a tight integration with other AWS services to help achieve a holistic monitoring solution, such as Database Activity Streams for cluster activity monitoring, and AWS CloudTrail for auditing logs. However, the choice of monitoring and logging tooling varies per organization. If you have other workloads such as web applications or data warehouses, you might want to consolidate these to a centralized monitoring platform, which both PlanetScale and Aurora can support.
PlanetScale vs. Amazon Aurora change management
Database change management processes vary by team. Changes performed on the database are complex, as you risk irreversible data corruption and consistency issues. PlanetScale and Amazon Aurora provide different methods to configure an environment to test these changes prior to pushing to production.
Additionally, with a Git-like workflow for change management and in combination with CI/CD tooling, PlanetScale makes it easy to build, test, and deploy database changes to production with minimal risk. Amazon Aurora does not provide native tooling for schema changes, and does not support online schema changes. Although this is not technically a schema change revert, Aurora Backtrack can be used to restore the database to a point-in-time (PIT) to recover from DDL operations that introduce a breaking change.
Frequently Asked Questions
Is Amazon Aurora the same as MySQL?
Amazon Aurora is not the same as MySQL. MySQL is an open-source relational database management system (RDBMS). It is a database engine that is supported by services like Aurora, and allows you to manage data in a relational structure.
Is PlanetScale the same as MySQL?
PlanetScale's Vitess offering is a MySQL-compatible database platform. It's built on open-source Vitess, a middleware technology designed to scale and manage very large MySQL deployments. PlanetScale offers a managed database service that provides horizontal scaling of MySQL, multi-cloud deployments, and other features that enable high availability and make it easier to manage large-scale MySQL deployments. In addition to the improved tooling, scaling, and schema management offered, PlanetScale will soon introduce vector search and storage, which is not currently available in MySQL.
Is PlanetScale better than Amazon Aurora?
PlanetScale's fully-managed database platform supports both Postgres and Vitess. It’s the only solution built on open-source Vitess, the technology that scaled YouTube to billions of monthly active users. Every single database spun up on PlanetScale gets all of the power of Vitess without having to implement and maintain it yourself.
PlanetScale is cloud-agnostic and supports multiple cloud platforms. It tends to be more cost-effective than many solutions. This is because of the commodity-grade instance types that PlanetScale uses to optimally serve a user’s unique workload. If you are curious what this might look like for your organization, talk to PlanetScale’s Solutions team.
On top of improved scale and performance, PlanetScale is built with CI/CD and database change management in mind. This mitigates the risk associated with changes made on stateful workloads.
While “better” is subjective, as Aurora does not provide the scale, portability, or cost-effectiveness of Vitess, PlanetScale is oftentimes viewed as the superior platform.
Who uses PlanetScale?
PlanetScale is used by hyper-growth organizations like Block, Cursor, Intercom, MyFitnessPal, and many more. It’s built for high availability, performance, and massive scale. It’s the solution for high performing companies with intensive workloads, equally great for users just getting started with a MySQL database, and can effectively support the workloads of everything in between.
What is the difference between Vitess and PlanetScale?
Vitess is the open-source middleware technology developed at YouTube in 2010. It is designed to horizontally shard MySQL abstracted from the application-layer, and generally improves the process of managing the database. As Vitess is open-source, organizations can implement and maintain it on their own. This is complex and requires a level of expertise about the technology that is sparse in the infrastructure world.
PlanetScale is the only solution that is built on Vitess and democratizes many of its features. Deploying a PlanetScale database is the easiest way to deploy Vitess in the cloud. With a PlanetScale database, you can horizontally shard MySQL, obtain unlimited connections, perform online schema changes, revert schema changes, and much more.
Additionally, PlanetScale was co-founded by one of the creators of Vitess, Sugu Sougoumarane. The PlanetScale team is also one of the top contributors to Vitess, employing several of the Vitess maintainers.]]></content>
        <summary><![CDATA[Compare PlanetScale’s multi-cloud database platform for Postgres and Vitess with Amazon Aurora, one of AWS’s managed relational database services. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale vs. Amazon RDS</title>
        <link href="https://planetscale.com/blog/planetscale-vs-amazon-rds" />
        <id>https://planetscale.com/blog/planetscale-vs-amazon-rds</id>
        <published>2023-10-05T18:23:38.312Z</published>
        <updated>2023-10-05T18:23:38.312Z</updated>
        
        <author>
          <name>PlanetScale</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[What is PlanetScale?
PlanetScale is an extremely fast, scalable, and reliable database platform for Postgres and Vitess. Vitess enables horizontal sharding of MySQL abstracted from the application layer. It’s designed to improve database management and provide a performant, fault-tolerant database that can handle large workloads.
PlanetScale provides customers with the power of Vitess, offering a fully managed and performant MySQL database service with horizontal sharding, Git-like schema change workflows, automatic backup, recovery, advanced query analytics, and multi-region replication capabilities. PlanetScale can be deployed on multiple cloud platforms, including Amazon Web Services (AWS) and Google Cloud Platform (GCP).
What is Amazon RDS?
Amazon RDS (Relational Database Service) is a managed relational database service offered by Amazon Web Services (AWS). It supports several database engines including MySQL, PostgreSQL, MariaDB, Oracle, and Microsoft SQL Server. With RDS, users can set up, operate, and scale a relational database in the cloud. It facilitates and automates operational tasks like resource provisioning, patching, replication, and backups. It's designed to integrate well with other AWS native services.
Comparisons: PlanetScale vs. Amazon RDS
PlanetScale vs. Amazon RDS architecture and deployment
PlanetScale is MySQL-compatible and democratizes all of the data management and scalability features of Vitess. It's a highly scalable database platform with numerous tenancy and deployment options. Multi-tenant is the default tenancy for basic PlanetScale plans. Both multi-tenant and single-tenant deployments are available on PlanetScale Enterprise and Managed plans. PlanetScale Managed is a packaged data plane. It includes all compute, live data, and backups required to run the PlanetScale product inside of an AWS or GCP sub-account owned by the user.
Amazon RDS supports a number of commercially available and open-source software (OSS) database engines. It only supports AWS cloud infrastructure and is intentionally hyper-compatible with AWS tooling. Although RDS is a managed solution, users are still required to manually provision resources. This requires a level of knowledge about resource utilization and the user’s data model to provision correctly.
PlanetScale vs. Amazon RDS scale and performance
PlanetScale and Amazon RDS both approach database scale and performance in a different way. PlanetScale is built for high availability, performance, and scale. It’s able to handle bursts in traffic and heavy IOPS with ease, all while reducing the overhead associated with database management as a managed service. Amazon RDS was built to reduce database management overhead for users that would otherwise run and manage their database in EC2. Amazon RDS is the less performant managed relational database offering that AWS has. This is in comparison to Amazon Aurora, which Amazon claims is 5× faster than standard MySQL and more performant than RDS.
Both PlanetScale and RDS can scale horizontally, but the method differs in both solutions. PlanetScale horizontally shards your data abstracted from the application-layer. This is often a more performant, reliable, and cost-conscious way to shard MySQL. With RDS, users can horizontally scale read operations or add additional instances to distribute database operations to. Additionally, both PlanetScale and RDS users can vertically scale up instance size and resources to meet increased demand.
When a new MySQL node is added in PlanetScale, load balancing is automatically implemented. The way that PlanetScale load balances is possible because of a few Vitess components called VTTablet and VTGate as well as PlanetScale’s edge infrastructure. VTGate is an application-level query routing layer while VTTablet behaves as a middleware between VTGate and MySQL. This facilitates the flow of connections from the application, to a load balancer, to VTGate, to VTTablet, and then finally to MySQL. The VTTablet will manage connection pooling and perform health checks for MySQL instances, updating its status in a topo-server. Meanwhile, VTGate determines available tablets and their roles via the topo server and reroutes traffic as needed. PlanetScale’s edge infrastructure then acts as a frontend load balancer, terminating MySQL connections in the closest edge location.
Every single PlanetScale database gets all of this underlying infrastructure to ensure the database remains available. Load balancers will distribute traffic between the production branch and replicas as well as balance connections, IOPS, and resource usage.
With technically unlimited connections, PlanetScale is equipped to handle high concurrency. PlanetScale offers connection pooling, which scales with your cluster and enables connection requests to queue.
Amazon RDS does not automatically load balance. To manage an influx of connections, users can utilize the RDS Proxy to connection pool and reuse connections. This may increase latency, but will not lead to an application failure and will prevent overwhelming the database with too many connections.
PlanetScale vs. Amazon RDS pricing
PlanetScale costs are built for scale. With the paid tier starting at $5 per month with options for resource-based pricing, users can linearly grow their workload and infrastructure costs. PlanetScale Enterprise and Managed plans are completely custom and will cater to the user’s specific needs.
Amazon RDS is priced based on the type and sizes of compute nodes and the time that these resources are used for. There are options to pay based on usage, and options to reserve resources at a discounted rate.PlanetScale plans tend to be objectively cheaper on infrastructure than running on Amazon RDS. This is because PlanetScale right-sizes resources on many commodity-grade AWS EC2 instances, or the equivalent on other cloud providers, which prevents over provisioning and keeps cost in-line with the user’s actual workload. Additionally, users get more infrastructure with PlanetScale, including the immense power of Vitess, and a database cluster that provides the same if not greater fault tolerance than other commercial solutions.
PlanetScale vs. Amazon RDS operations
Both PlanetScale and Amazon RDS are managed database systems that aim to reduce complex database administration tasks. PlanetScale and RDS automate basic database operations and provide monitoring, logging, and auditing solutions.
PlanetScale provides users with production branches, which are highly available databases intended for production traffic. They are conceptually equivalent to the primary instance in other commercial databases. Production branches automatically failover to one of two default replicas to improve redundancy. The two replicas reduce the load on the primary branch, and enable users to scale read and write operations. Additional replicas and read-only regions are configurable.
PlanetScale additionally offers built-in auditing and log retention, as well as integrations with popular third-party monitoring tools such as Datadog.
Amazon RDS integrates tightly with other AWS services, including Database Activity Streams for activity monitoring, AWS CloudTrail for auditing logs, and more. Similar to PlanetScale, replicas are utilized to distribute load across multiple nodes. Additional read replicas can be manually configured. Backups are automatically configured by default, with retention periods and backup windows manually defined by the user. Snapshots can be set up manually. RDS does not have an individual load balancer per database cluster. To distribute traffic between primary and replica instances, logic can be manually defined by the user at the application level to direct reads and writes to either instance. Failover is automatic to a standby in a different AZ.
While Amazon RDS is a managed solution, a common complaint about this service is that many users end up managing database operations manually. In contrast to PlanetScale, RDS does not automatically load balance or handle major version upgrades. It requires user input for backup windows and its managed services do not fine-tune cluster resources, create comprehensive disaster recovery plans, or design a horizontal sharding scheme unique to your data model.
PlanetScale vs. Amazon RDS change management
Database change management processes vary by team. Changes performed on the database are complex, as you risk irreversible data corruption and consistency issues. PlanetScale and Amazon RDS provide different methods to configure an environment to test these changes prior to pushing to production.
Additionally, with a Git-like workflow for change management and in combination with CI/CD tooling, PlanetScale makes it easy to build, test, and deploy database changes to production with minimal risk. Amazon RDS does not provide native tooling for schema changes, and does not support online schema changes or schema reverts.
Frequently Asked Questions
Is Amazon RDS the same as MySQL?
Amazon RDS is not the same as MySQL. MySQL is an open-source relational database management system (RDBMS). It is a database engine that is supported by services like RDS, and allows you to manage data in a relational structure.
Is PlanetScale the same as MySQL?
PlanetScale's Vitess offering is a MySQL-compatible database platform. It's built on open-source Vitess, a middleware technology designed to scale and manage very large MySQL deployments. PlanetScale offers a managed database service that provides horizontal scaling of MySQL, multi-cloud deployments, and other features that enable high availability and make it easier to manage large-scale MySQL deployments. In addition to the improved tooling, scaling, and schema management offered, PlanetScale will soon introduce vector search and storage, which is not currently available in MySQL.
Is PlanetScale better than Amazon RDS?
PlanetScale's Vitess offering is a fully-managed MySQL-compatible database platform that supports the growth of organizations of all sizes. It’s the only solution built on open-source Vitess, the technology that scaled YouTube to billions of monthly active users. Every single database spun up on PlanetScale gets all of the power of Vitess without having to implement and maintain it yourself.
PlanetScale is cloud-agnostic and supports multiple cloud platforms. It tends to be more cost-effective than many solutions. This is because of the commodity-grade instance types that PlanetScale uses to optimally serve a user’s unique workload. If you are curious what this might look like for your organization, talk to PlanetScale’s Solutions team.
On top of improved scale and performance, PlanetScale is built with CI/CD and database change management in mind. This mitigates the risk associated with changes made on stateful workloads. Additionally, while both solutions are managed, RDS users often complain about having to manually manage database operations while PlanetScale is fully-managed.
While “better” is subjective, as RDS does not provide the power, scale, portability, or cost-effectiveness of Vitess, PlanetScale is oftentimes viewed as the superior platform.
Who uses PlanetScale?
PlanetScale is used by hyper-growth organizations like Block, Cursor, Intercom, MyFitnessPal, and many more. It’s built for high availability, performance, and massive scale. It’s the solution for high performing companies with intensive workloads, equally great for users just getting started with a MySQL database, and can effectively support the workloads of everything in between.
What is the difference between Vitess and PlanetScale?
Vitess is the open-source middleware technology developed at YouTube in 2010. It is designed to horizontally shard MySQL abstracted from the application-layer, and generally improves the process of managing the database. As Vitess is open-source, organizations can implement and maintain it on their own. This is complex and requires a level of expertise about the technology that is sparse in the infrastructure world.
PlanetScale is the only solution that is built on Vitess and democratizes many of its features. Deploying a PlanetScale database is the easiest way to deploy Vitess in the cloud. With a PlanetScale database, you can horizontally shard MySQL, obtain unlimited connections, perform online schema changes, revert schema changes, and much more.
Additionally, PlanetScale was co-founded by one of the creators of Vitess, Sugu Sougoumarane. The PlanetScale team is also one of the top contributors to Vitess, employing several of the Vitess maintainers.]]></content>
        <summary><![CDATA[Compare PlanetScale’s multi-cloud database platform for Postgres and Vitess with Amazon RDS, AWS’s managed relational database service. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale is bringing vector search and storage to MySQL</title>
        <link href="https://planetscale.com/blog/planetscale-is-bringing-vector-search-and-storage-to-mysql" />
        <id>https://planetscale.com/blog/planetscale-is-bringing-vector-search-and-storage-to-mysql</id>
        <published>2023-10-03T09:00:00.000Z</published>
        <updated>2023-10-03T09:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[We’re excited to announce that we are adding support for vector storage and search into MySQL and PlanetScale. Soon, you’ll be able to use PlanetScale as a vector database for all of your AI needs without needing to adopt a second tool.
You can sign up to be notified of release at planetscale.com/ai.
What are vectors?
A vector is a one-dimensional array of real number values: [1, 1] is a vector, as is [1.5, 8.889, 9.234]. Each element represents an attribute or a dimension, and the position of that element in the array represents a “point.” In three-dimensional space, a vector of [2, 3, 5] would have 2 as the x-coordinate, 3 as the y-coordinate and 5 as the z-coordinate. In real-world applications of artificial intelligence, vectors are in significantly higher dimensionality — well above 1,000!
Modern databases are already very good at storing lists of numbers. Storing vectors as a raw datatype is not interesting, like points on a globe or positions on a chessboard. Vectors are useful when applied, like latitude and longitude or the position of a queen and the opponent’s rook.
Applying vectors to machine learning
What makes vectors useful is a technique called embedding, which uses machine learning to transform arbitrary data like a picture, song, or sensor data into a vector. This creates a uniform numerical representation of that data which can be transmitted over the network and stored. Inside of that storage engine, they can be compared to other transformed data and analyzed for similarity, using mathematical operations like the cosine similarity.
For a long read that covers this and a lot more, I recommend reading Stephen Wolfram’s What Is ChatGPT Doing … and Why Does It Work?. I recommend the entire blog post, but you can skip to the section titled “The Concept of Embeddings” which does a great job of explaining the concept, and how they are applied.
Vector storage in MySQL
We know what a vector is, and storing the data is still straightforward — you can use a BLOB type and start writing arrays into MySQL today! So what extra support does MySQL need anyway?
That’s where vector-specific indexing comes into play. This is what we’re adding to MySQL, along with a first-class vector data type. Specifically, we’ll be implementing the state-of-the-art Hierarchical Navigable Small World (HNSW) algorithm, which constructs optimized graph structures that make it efficient to search vector similarity in large datasets.
Imagine a database containing a record for every document written at your company, trained on a machine learning algorithm that can identify attributes like what project it’s for, what team wrote it, and other useful information. If a user opens up one of the documents, a common task would be to find everything similar — documents for the same project, written by the same team, or that cover the same workstreams.
Without an index, you would have to iterate over every document’s vector in the database and compare them for similarity. At scale, this could take a while, and the performance would be awful! Using an index, you can efficiently traverse the graphs of vectors, and quickly present the user their meeting notes from the status meeting last week, or the design document for one component of their project. This is “vector search” in a nutshell.
How are we doing this?
PlanetScale already maintains a fork of MySQL and we’ll be adding vector types and indexes to it. When released, we’ll run that MySQL fork in PlanetScale as we do today. We will publish packages and containers for our PlanetScale-flavored MySQL that will allow users to test and develop locally.
If you’re a current PlanetScale customer, this will be transparent: one day you’ll automatically gain the ability to do vector storage and retrieval.
Who should use vector storage and search?
AI/ML apps that want to harness the power, stability, reliability, and scalability of MySQL. Instead of adopting a second database just for vectors, you’ll be able to do the same storage and retrieval right in PlanetScale, reducing cost and operational burden significantly.
When will this launch?
It’s exciting to see vector workloads working on MySQL. We are committed to maintaining a stable, reliable, and highly available product. We will continue to test our new vector support under rigorous workloads to ensure it meets our high standards before release.
Until then, you can sign up for updates at https://planetscale.com/ai. If you have additional questions, don’t hesitate to contact us.]]></content>
        <summary><![CDATA[We are adding vector storage and search to MySQL, enabling you to use PlanetScale for your AI use cases.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale Managed is now PCI compliant</title>
        <link href="https://planetscale.com/blog/planetscale-managed-is-now-pci-compliant" />
        <id>https://planetscale.com/blog/planetscale-managed-is-now-pci-compliant</id>
        <published>2023-10-02T09:00:00.000Z</published>
        <updated>2023-10-02T09:00:00.000Z</updated>
        
        <author>
          <name>Frank Fink</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[PlanetScale is committed to building the best database, and a significant part of that commitment is building in enterprise-grade security at every step along the way.
We are pleased to announce that PlanetScale Managed on AWS has been issued an Attestation of Compliance (AoC) and Report on Compliance (RoC), certifying our compliance with the PCI DSS 4.0 as a Level 1 Service Provider. This enables PlanetScale Managed to be used via a shared responsibility model across merchants, acquirers, issuers, and other roles in storing and processing cardholder data.
What is the PCI Data Security Standard?
The Payment Card Industry Data Security Standard (PCI DSS) is a widely known set of security standards designed to ensure that all companies that accept, process, store, or transmit credit card information maintain a secure environment. Version 4.0 is the latest iteration of the PCI DSS, bringing with it several key improvements around continuous monitoring, authentication and authorization, secure software development, and response to evolving threat landscapes.
In order to achieve compliance after sufficient security controls are in place, an entity must be audited by a qualified security assessor (QSA). PlanetScale Managed on AWS has completed this process and is compliant with Version 4.0 today, as 4.0 becomes the required version of the standard in early 2024.
PlanetScale’s compliance journey
This PCI milestone is the result of a months-long, cross-functional collaboration between our Security, Engineering, and Operations teams, resulting in the significant evolution of our policies, practices, and systems around authentication, logging, access management, and network security. It represents not only our heightened commitment to compliance, but also a tangible improvement to the core standards and procedures supporting our products.
Strengthening trust with customers
As we grow, so do the security and compliance needs and expectations of our customers, and we are continuously committed to showcasing security as a core tenant of both our culture and our products. While compliance and security are not one in the same, we consider our commitment to the PCI DSS another marker of trust between PlanetScale and our customers, enabling them to rely on PlanetScale as a trusted component in their increasingly complex data environments.
Learn more
Read more about our security and compliance programs in PlanetScale’s documentation, or contact us to learn more about getting started with PlanetScale Managed.]]></content>
        <summary><![CDATA[PlanetScale Managed on AWS is now PCI compliant.]]></summary>
      </entry>
    
      <entry>
        <title>Guide to scaling your database: When to shard MySQL and Postgres</title>
        <link href="https://planetscale.com/blog/how-to-scale-your-database-and-when-to-shard-mysql" />
        <id>https://planetscale.com/blog/how-to-scale-your-database-and-when-to-shard-mysql</id>
        <published>2023-09-28T08:00:00.000Z</published>
        <updated>2023-09-28T08:00:00.000Z</updated>
        
        <author>
          <name>Jonah Berquist</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Scaling a database presents challenges. As you grow, you might begin having trouble handling ever-increasing throughput or data size. You might find that query latency is getting worse. You might be pushing the limits of your hardware. When this happens, a classic option is vertically scaling your database by getting better hardware, but is there a better way? And what happens when you reach the vertical limits?
This is where horizontal sharding comes in. In this article, we'll cover some common indicators that your database may be ready for horizontal sharding. We'll also look at some measures you can implement until then. Let's dig in.
Signs your database is hitting its scaling limits
There are lots of different limits that you can run into when you're scaling up. At the database level, you might be maxing out CPU, memory, disk space, or IOPS.
Running into these limits can have real consequences for your business. Database operations like schema changes will start taking longer, making it harder to ship new features. Query latency will increase, leading to sluggish responsiveness. As things get worse, hitting these limits leads to incidents and you start facing outages.
What are your options for scaling your database?
Before you begin to think about sharding, let's make sure you've first exhausted some of the other options.
After a single server is maxed out, you need to spread the load across more nodes. There are several approaches that you can use.
Option 1: Scale reads with replicas
A tried-and-true method for scaling MySQL or Postgres is using replicas for reads. In addition to setting up the replicas, this involves application changes to split reads and writes to different connection strings. Most web applications are very read-heavy, and this method allows you to continue scaling reads by adding more replicas.
Option 2: Reduce load with vertical sharding (data segmentation)
After that, another strategy is adding more clusters by segmenting logical groups of tables. This means taking all of the tables used by a certain service or product area (for example, users or notifications) and separating them into a new cluster. We sometimes call this vertical sharding or vertical partitioning.
The diagrams below show what it would look like to break a cluster containing users and notifications tables down into vertical shards by moving notifications to a separate cluster.

While vertical sharding is a viable option, it does come with some downsides. In addition to the application changes required for these new connection strings, there may be more complex changes to account for the fact that, without a framework like Vitess, you would be unable to perform JOINs between tables that now live on different servers.
Having broken down your databases into their smallest logical groups of tables, you may find yourself in a bit of a jam when one of those clusters starts hitting limits. This is where horizontal sharding comes in.
Option 3: Scale writes and storage with horizontal sharding
Horizontal sharding differs from the vertical sharding described previously. Instead of splitting up a cluster by moving whole tables elsewhere, with horizontal sharding, each underlying cluster shares the same schema and has different rows distributed to it.

For a more in-depth guide about what sharding is and how it works, check out What is database sharding and how does it work on our blog.
Historically, you needed to be one of the largest webscalers in the world to require sharding, and when you hit those limits, you had to build it yourself. Examples of these include TAO at Facebook, Gizzard at Twitter, and Vitess at YouTube. Sharding was a last resort after you'd exhausted all other options and still needed to handle growth.
When should you shard your MySQL or Postgres database?
Today, we think about it differently. Since its creation at YouTube in 2011, Vitess has become a widely adopted open source solution that has made sharding much more accessible. Sharding is no longer a last resort, and in fact, if adopted earlier, can help you avoid other larger application changes.
So, how do you know when to shard your database? Some good indicators that it may be time to consider sharding are when you've started to max out data size, write throughput, and/or read throughput. Let's walk through each of these categories.
Shard when: data size strains memory and operations
One of the original reasons to shard was because disks were not large enough to hold all of your data. These days, that's not the problem. For example, Backblaze purchased over 40,000 16TB drives!
Data size can still be a driving factor for sharding though. One thing to consider is how large your working set is, and how much of that fits into RAM. As less of your active data fits in memory and more queries need to read from disk, query latency will increase.
Other database operations are also affected by the data size of a single MySQL or Postgres cluster. The larger the database, the longer backups (and restores!) take. The same is true for other operational tasks like provisioning new replicas and making schema changes. This is the logic behind guidelines Vitess has made for shard sizing. Smaller data size per shard improves manageability.
Shard when: write throughput maxes out your primary
Another reason to consider sharding is when you've maxed out the write throughput of your cluster. This can show up in a couple of ways.
When the primary is maxed on IOPS, writes will become less performant. Usually before that, however, replication lag becomes a problem. While there have been significant improvements in replication within MySQL clusters, there will always be a small amount of delay between the time the data is written to the primary and that same data is written to a replica. You may be depending on replicas being up-to-date for disaster recovery, or you may be using replicas to scale out your reads as discussed earlier. When replicas fall behind the primary, this can look like inconsistent or stale data to your users, and may also result in errors if your application expects to be able to read data that it has just written.
With PlanetScale Metal, write throughput is significantly less of a concern. Unlike other solutions such as RDS and CloudSQL that separate storage and compute, Metal keeps them together on the same hardware. This reduces network hops and provides better hardware, delivering substantially higher IOPS. If you're on Metal, you can often delay sharding and continue scaling vertically much further (into the several TB range) than traditional cloud database architectures allow.
When you're hitting your write throughput limits, other database operations like schema changes and batch jobs will be slower as well.
Shard when: read throughput outpaces your replicas
While running out of read throughput capacity can be solved through read-write splitting and the addition of read replicas, that isn't without its own challenges. As mentioned in the previous section, replication lag can make this complex or lead to a poor experience for your users.
Typically, this is earlier than we often think about sharding. However, by scaling read capacity through horizontal sharding instead of by using replicas, application code does not need to account for the potential replication lag or that multiple connection strings need to be managed and utilized depending on the data set you are trying to access. Plus, sharding at this stage sets you up for future growth and you don't have to come back and shard later when write throughput or data size would otherwise become an issue.
Operational benefits of horizontal sharding
Sharding can and should be considered as a solution not just for scaling large data sizes, but also for scaling throughput of reads and writes.
In addition to being able to handle larger workloads, sharding provides other benefits, including:
Better cost optimization as you scale.
Increased parallelization leading to faster backups, restores, and schema changes.
Better SLAs by reducing the blast radius of various failure domains to a single shard.
Predictable horizontal scalability.
If you're unsure whether or not you're ready to shard, don't hesitate to contact us. We'd love to hop on a call to discuss your workload and scaling options.
FAQs
What is database sharding?
Database sharding is an approach to horizontally scaling a MySQL or Postgres database by splitting data across multiple database instances (called shards), each holding a subset of the total data. Sharding allows you to scale beyond a single server. It improves performance because queries only hit one shard, rather than scanning a massive single table. Finally, it provides fault isolation, since a problem with one shard doesn't necessarily take down others.
When should I consider sharding my database?
You should consider sharding when a single MySQL or Postgres instance can no longer handle your data volume or query throughput, even after optimizing indexes, queries, and adding read replicas. A common signal is when your dataset grows into the hundreds of gigabytes or terabytes and write performance degrades, since read replicas only help with reads. You might also consider it when you hit connection limits or when table locks and contention become a bottleneck that caching alone can't solve. That said, sharding adds significant operational complexity, so exhaust simpler scaling options first.
Does sharding require changes to my application code?
It depends on your approach. If you implement sharding at the application level, yes — your code needs to determine which shard to route each query to, manage multiple connection strings, and handle cases where data spans shards. However, using a sharding layer like Vitess abstracts much of this away: your application talks to a single endpoint and the routing happens transparently, significantly reducing the code changes required. Either way, you'll want to be thoughtful about choosing a shard key early, as changing it later is costly and disruptive.
What solutions are available to implement horizontal sharding?
PlanetScale offers a sharded MySQL option through Vitess, which handles query routing for your shards at the proxy layer so there is no sharding logic in your application code. We're also currently building Neki, our Vitess for Postgres solution for horizontally sharding Postgres.]]></content>
        <summary><![CDATA[Not sure when to shard your MySQL or Postgres database? This article covers when you should consider horizontal sharding as a scaling strategy in MySQL and Postgres, and some other scaling options before then.]]></summary>
      </entry>
    
      <entry>
        <title>Scaling hundreds of thousands of database clusters on Kubernetes</title>
        <link href="https://planetscale.com/blog/scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes" />
        <id>https://planetscale.com/blog/scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes</id>
        <published>2023-09-27T15:00:00.000Z</published>
        <updated>2023-09-27T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Containers have made an incredible impact when it comes to making, deploying, and distributing applications.
When building containerized applications, you no longer have to be concerned about dependency mismatches or the age-old “works on my machine” argument. And Kubernetes has simplified deploying and scaling these containerized applications. If a container crashes, Kubernetes can easily spin a new one up to handle the load!
But have you ever wondered if you can run a database in Kubernetes?
The short answer is “yes.” In fact, we utilize Kubernetes extensively at PlanetScale to support hundreds of thousands of databases all over the world. When deploying a database workload to Kubernetes, special considerations need to be made regardless of how big the workload is.
Let's explore how you might want to deploy databases on Kubernetes, and how PlanetScale does it.
A crash course on Kubernetes
Kubernetes is a container orchestration tool used by some of the largest enterprises around the world to manage their fleet of containerized applications.
In a Kubernetes cluster, multiple servers (nodes in Kubernetes parlance) are configured to work together to ensure that the containers deployed to them are always online and available. The smallest deployable unit in Kubernetes is known as a pod, which represents one container or a collection of containers. When a pod crashes for whatever reason, the environment is smart enough to spin up a new instance of that pod to keep the application online, whether that's on the same node or a different one.
This process is known as the Control Loop.
The Control Loop is managed by the Kubernetes Scheduler. It utilizes configuration files (written in YAML) that define what pods need to be running for a given application to stay online. The scheduler does this by comparing what's defined in the config file vs. what’s actually deployed on Kubernetes and taking the necessary steps to reconcile the differences.
This type of setup works great for applications, but what happens when your database is running on a pod that crashes?

The issue with running databases on Kubernetes
The main concern with databases on Kubernetes is that they are stateful.
The statefulness of a specific workload references whether the data related to that workload is ephemeral or not. Applications deployed to Kubernetes should be built with this in mind. If a container within a pod crashes, any data being stored within the container is essentially lost. With databases, the state of the data within the database is kind of important, which is why special considerations need to be taken when deploying the database to Kubernetes.
According to the official docs, StatefulSets are the recommended way to run a database on Kubernetes.
StatefulSets in Kubernetes allow you to define a set of pods that maintain the state of the data within a pod regardless of its online status. Kubernetes does this by attaching persistent storage to the pod for it to read and write data to, as well as ensuring that when pods come online, they do so in the same order, with the same name and network address every time. This is in comparison to something like a deployment, which aims to keep a specific number of a given pod online but doesn’t care in what order they come up with as long as they are accessible.
Whenever you want to deploy a database to Kubernetes, StatefulSets should be used, but other considerations are needed to run a database on Kubernetes properly.
Considerations when deploying MySQL to Kubernetes
Deploying a well-architected MySQL setup to Kubernetes is more complicated than just deploying a bunch of pods running MySQL.
Replication and node failures
MySQL can be launched with replication enabled, which permits data to be synchronized across the pods in a given StatefulSet, but what happens when a node experiences an outage?
Source of truth
Within the replicated environment, how do you know which of the pods has the most up-to-date data, and how should your application determine which pod to query from?
Backups and restores
Backups are always necessary to ensure that your organization can roll back accidental writes, but in an environment with multiple MySQL instances running, where do the backups come from, how do you know they are complete, and how should you restore them?
How PlanetScale manages hundreds of thousands of MySQL database clusters
Until this point, much of what has been discussed in this article is considered best practices when running databases on Kubernetes, but we do things a bit differently here at PlanetScale.
Leveraging Vitess on Kubernetes
It’s no secret that PlanetScale uses Vitess to manage and operate our databases, but there’s more to the story than that.
Vitess is an open-source, MySQL-compatible project that is designed to scale beyond the traditional capabilities of MySQL. It does this by providing additional components on top of MySQL such as a stateless proxy (known as VTGate) and a topology server. In a Kubernetes environment, these two components are used together to determine how many MySQL instances exist, how to access them, and (in a horizontally sharded configuration) on which pod the requested data lives. Those MySQL pods in Vitess are known as a tablet, which is a pod running MySQL along with a sidecar process known as vttablet. This allows the management plane (vtctld) to manage them, as well as notify the overall configuration of any topology changes. All of this functionality is bundled into Vitess, but it also adds the question of how Vitess is managed.
This is where PlanetScale Vitess Operator for Kubernetes is used.
Kubernetes offers a great deal of automation, but some workloads require a bit more logic than out-of-the-box Kubernetes is prepared to handle. Operators allow developers to extend Kubernetes by adding custom resources that add to the Control Loop. The Vitess Operator addresses the issues outlined in the previous section by enabling Kubernetes to streamline these otherwise complicated management tasks.
The combination of Kubernetes, Vitess, the PlanetScale Operator, and our global infrastructure is what has enabled us to scale and manage hundreds of thousands of MySQL database clusters.
Deploying new databases
When a user needs to deploy a new database into our infrastructure, the first step is that our API sends a request to our custom orchestration layer asking for that database to be created.
Provided the request is valid, the orchestrator will define a custom resource definition we use to define the specifications of a PlanetScale database. The Operator discussed in the previous section will use the built-in mechanism of Kubernetes (specifically, the Control Loop and the API) to detect that the current state and desired state do not match. The Operator will create the necessary resources to run and operate a Vitess cluster for the requested database.
Once this process is completed, our orchestrator will notify the rest of the system that the creation process is done and the user can start using the database.
Storage management
Here is where things start deviating from what's considered common knowledge when running databases on Kubernetes.
As discussed earlier, the recommended best practice is to use StatefulSets to run databases since the state is automatically tracked by Kubernetes. We actually don’t do this and opt instead to use the logic built into the Vitess Operator to spin up pods that attach directly to cloud storage using a persistent volume claim (PVC). Because we already have a routing mechanism in place (VTGate), we don’t need to be concerned about the name or address of a given pod.
Attaching directly to cloud storage also allows us to programmatically manage how much storage is allocated.
As database utilization is increased, the required storage to operate a database often increases with it. To address this, we have monitoring mechanisms in place to detect when provisioned cloud storage that serves a database starts nearing capacity. When this occurs, our internal systems will use the cloud providers’ APIs to automatically allocate additional space so the databases that are being served by that storage do not stop from capacity issues.
This process occurs entirely behind the scenes so our users never have to worry about running out of space for their database.
Database backups
Regardless of where you run your database, backups should be first-class citizens as they can literally save your business.
One consideration around backing up your database, especially with more extensive databases, is the performance impact that running a backup can have. To avoid this issue, we actually utilize Vitess to create a special type of tablet that's ONLY used for backing data up. To back up a database on PlanetScale, our system will restore the latest version of the backup to this tablet, replicate all of the changes that have occurred since the backup was taken, and then create a brand new backup based on that data.
This not only significantly decreases the impact on the production database but also has the added benefit of automatically validating your existing backups on PlanetScale!
Mitigating failures
Most databases in PlanetScale operate on either AWS or GCP and as robust as those platforms are, we also have to consider how to address failures beyond our control.
Using Kubernetes and the Vitess Operator allows us to automatically handle instances when a pod or container within that pod gets stopped. The configuration for each database in Kubernetes defines how many resources and of what type should be running at any time. If a deviation is detected, the Operator will automatically take the necessary steps to spin up new resources to ensure things are always running smoothly.
Another type of outage we protect against for our paid production databases is when availability zones go offline.
Using Base plan databases as an example, we automatically provision a Vitess cluster across three availability zones in a given cloud region. This includes the tablets that serve up data for your database, as well as the VTGate instances that route queries to the proper tablets. This means that if an AZ gets knocked offline, the Vitess Operator will automatically detect the outage and apply the necessary infrastructure changes to keep our databases online.
In fact, we don't even support cloud regions with less than three availability zones!
Check out 'What is Vitess: resiliency, scalability, and performance' for more info on how PlanetScale uses Vitess]]></content>
        <summary><![CDATA[Explore what to consider when deploying databases to Kubernetes, and how PlanetScale utilizes Kubernetes and Vitess to run hundreds of thousands of databases.]]></summary>
      </entry>
    
      <entry>
        <title>The art and science of database sharding</title>
        <link href="https://planetscale.com/blog/the-art-and-science-of-database-sharding" />
        <id>https://planetscale.com/blog/the-art-and-science-of-database-sharding</id>
        <published>2023-09-19T13:00:00.000Z</published>
        <updated>2023-09-19T13:00:00.000Z</updated>
        
        <author>
          <name>Liz van Dijk</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Grasping when and how to shard, selecting the ideal shard keys, and managing cross-shard queries effectively can prove difficult even for seasoned engineers. Learn how to do this, and more in this tech talk.]]></summary>
      </entry>
    
      <entry>
        <title>Streamline database management using the PlanetScale Netlify integration</title>
        <link href="https://planetscale.com/blog/planetscale-netlify-integration" />
        <id>https://planetscale.com/blog/planetscale-netlify-integration</id>
        <published>2023-09-13T12:00:00.000Z</published>
        <updated>2023-09-13T12:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[With the PlanetScale Netlify integration, connecting and deploying applications on Netlify that use a PlanetScale database is now easier. Today, the integration is generally available and will allow you to deploy faster with an even better developer experience. The integration follows the release of the new Netlify SDK, enabling the development of new features such as this.
Let's walk through a few of the benefits of the integration and how you can get started today.
No more copying and pasting connection strings
With the Netlify PlanetScale integration, you no longer need to copy and paste connection strings from PlanetScale to Netlify. The integration uses PlanetScale's OAuth applications feature to log you into PlanetScale securely. Then, it allows you to select which database and branch you want to use. No more copying and pasting or accidentally inputting the wrong connection string for your database's environment variables. This can help prevent mistakes and speed up the deployment process.
Different branches for different deploy contexts
It is easy to mix up connection strings when you have different deploy contexts while application and database changes are simultaneously occurring. The integration will also allow you to assign a branch to a specific deploy context, such as production, deploy preview, and local development. All while handling the connection strings for each behind the scenes for you. When your application builds in Netlify, the integration will handle wiring up the correct branch for you.

Works seamlessly with PlanetScale's Fetch API-compatible database driver
The MySQL binary protocol does not work well in edge compute environments. This is why we have created a JavaScript library that can be used in any Fetch API-compatible environment. The library uses HTTP instead of the MySQL binary protocol over a raw TCP socket. Even better, the @netlify/planetscale library uses @planetscale/database so you can connect to and make calls to your PlanetScale database from your code with Netlify Functions.
First, you will need to import the necessary dependencies:npm install @netlify/planetscale @planetscale/database

And then, you can use the library in your Netlify Functions. Here is a TypeScript example:import type { Handler } from '@netlify/functions'
import { withPlanetscale } from '@netlify/planetscale'

export const handler: Handler = withPlanetscale(async (event, context) => {
  const {
    planetscale: { connection }
  } = context
  const { body } = event
  if (!body) {
    return {
      statusCode: 400,
      body: 'Missing body'
    }
  }
  const { email, name } = JSON.parse(body)
  await connection.execute('INSERT INTO users (email, name) VALUES (?, ?)', [email, name])
  return {
    statusCode: 201
  }
})

Getting started with the Netlify integration
See the PlanetScale integration page in the Netlify docs to get started with the integration.
If you have any suggestions for future features related to the integration, please let us know by tweeting at @planetscale or dropping us a note.]]></content>
        <summary><![CDATA[Learn how to use the new PlanetScale Netlify integration to simplify the process of wiring up a database to your Netlify applications.]]></summary>
      </entry>
    
      <entry>
        <title>Emulating foreign key constraints with Drizzle relationships</title>
        <link href="https://planetscale.com/blog/working-with-related-data-using-drizzle-and-planetscale" />
        <id>https://planetscale.com/blog/working-with-related-data-using-drizzle-and-planetscale</id>
        <published>2023-09-06T17:30:00.000Z</published>
        <updated>2023-09-06T17:30:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[PlanetScale now supports foreign key constraints. We still do not recommend using them for performance reasons, but you do not need to disable them to use Drizzle with PlanetScale.
This tutorial is still relevant if you'd like to use Drizzle without foreign key constraints.
Drizzle is a fantastic ORM that is quickly gaining popularity among TypeScript developers. It maintains type safety while striving to use a syntax very familiar to those already comfortable with writing SQL. The team has also built a CLI companion that can generate SQL migrations or apply schema change directly to a database based on the schema definition used by code within the project.
In this article, we will cover how to use virtual relationships in Drizzle and how to apply those changes to a PlanetScale database. We will use the following table diagram as a point of reference. It is for a simple “link in bio” service where users can create a profile containing links to their favorite websites or social media profiles:

Foreign keys and foreign key constraints
When designing your database, you’ll typically have one or more tables that contain data related to one another. Using the schema shown above, there is a one-to-many relationship between the users table and the blocks table, where a single user record will reference multiple blocks. This is done by linking the values between the users.id column and the blocks.user_id column. In this situation, blocks.user_id is a foreign key of users.id.
Foreign keys allow you to define logical relationships in the database. These relationships can be made more rigid by adding a constraint. When you define a foreign key constraint, you are telling the database that these two tables are related to the specified columns, AND you would like the database engine to maintain the integrity of the table by automatically performing operations on related data when specific actions are taken.
Using the same sample schema above, if we were to create a foreign key constraint between these two columns, we can ask the database to automatically delete any records in the blocks table when the associated users record is deleted.ALTER TABLE blocks
	ADD CONSTRAINT fk_users_blocks
	FOREIGN KEY (user_id)
	REFERENCES users(id)
	ON DELETE CASCADE;

Querying related data without foreign key constraints
While foreign key constraints are the traditional way of maintaining integrity in a database, PlanetScale was built with a focus on scalability and zero-downtime schema updates, something that foreign key constraints interfere with. Fortunately, virtual relationships within ORMs build in similar logic but let the code handle the heavy lifting instead of the database engine.
Typically in Drizzle, you’d use the references method on a field, passing in the related entity and its field. This tells the ORM that entities are related based on specific columns:export const users = mysqlTable('users', {
  id: serial('id').primaryKey(),
  username: varchar('username', { length: 120 }),
  tagline: varchar('tagline', { length: 250 }),
  display_name: varchar('display_name', { length: 250 }),
  img_url: varchar('img_url', { length: 500 })
})

export const blocks = mysqlTable('blocks', {
  id: serial('id').primaryKey(),
  url: varchar('url', { length: 200 }),
  block_type: int('type'),
  // 👇 The following line will create a foreign key constraint
  user_id: int('user_id').references(() => users.id),
  label: varchar('label', { length: 200 })
})

Since Drizzle works across a number of different relational databases, using this method will automatically attempt to add foreign key constraints in the schema. Running the following command to apply this schema to a PlanetScale database using drizzle-kit results in an error:drizzle-kit push:mysql --schema functions/utils/db/schema.ts --connectionString='$DATABASE_URL' --driver mysql2

# Output:
# Error: VT10001: foreign key constraints are not allowed [...]
# {
#   code: 'ER_UNKNOWN_ERROR',
#   errno: 1105,
#   sql: 'ALTER TABLE `blocks` ADD CONSTRAINT `blocks_user_id_users_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users`(`id`) ON DELETE no action ON UPDATE no action;',
#   sqlState: 'HY000',
#   sqlMessage: 'VT10001: foreign key constraints are not allowed'
# }

With just a bit more code, Drizzle can be configured to query the data in a child table using a virtual relationship instead of a foreign key constraint. The following code accomplishes the same results as above, allowing you to query a user and also get their associated blocks:export const users = mysqlTable('users', {
  id: serial('id').primaryKey(),
  username: varchar('username', { length: 120 }),
  tagline: varchar('tagline', { length: 250 }),
  display_name: varchar('display_name', { length: 250 }),
  img_url: varchar('img_url', { length: 500 })
})

export const blocks = mysqlTable('blocks', {
  id: serial('id').primaryKey(),
  url: varchar('url', { length: 200 }),
  block_type: int('type'),
  user_id: int('user_id'),
  label: varchar('label', { length: 200 })
})

//👇 This code block will tell Drizzle that users & blocks are related!
export const usersRelations = relations(users, ({ many }) => ({
  blocks: many(blocks)
}))

//👇 This code block defines which columns in the two tables are related
export const blocksRelations = relations(blocks, ({ one }) => ({
  user: one(users, {
    fields: [blocks.user_id],
    references: [users.id]
  })
}))

Applying these changes using the same command as above will work as well.drizzle-kit push:mysql --schema functions/utils/db/schema.ts --connectionString='$DATABASE_URL' --driver mysql2

# Output:
# drizzle-kit: v0.19.12
# drizzle-orm: v0.27.2
#
# Reading schema files: orbytal-ink/functions/utils/db/schema.ts
#
# [✓] Changes applied

Finally, when you want to return a user along with their associated blocks, you can use the following example:const user = await db.query.users.findFirst({
  where: eq(users.username, username),
  // Providing `with` tells Drizzle you want to return related data
  with: {
    blocks: true
  }
})

// Contents of `user`:
// {
//     "id": 5,
//     "username": "brianmmdev",
//     "tagline": "Developer Educator @ PlanetScale",
//     "display_name": "Brian Morrison II",
//     "img_url": "https://img.clerk.com/eyJ0eXBlIjoicHJveHkiLCJzcmMiOiJodHRwczovL2ltYWdlcy5jbGVyay5kZXYvdXBsb2FkZWQvaW1nXzJUbzRXVjRkaFZRU0J2bTlxdnpsOXFiWWNyYS5qcGVnIn0",
//     "blocks": [
//         {
//             "id": 9,
//             "url": "brianmmdev",
//             "block_type": 2,
//             "user_id": 5,
//             "label": null
//         },
//         {
//             "id": 8,
//             "url": "brianmmdev",
//             "block_type": 4,
//             "user_id": 5,
//             "label": null
//         },
//         {
//             "id": 7,
//             "url": "brianmmdev",
//             "block_type": 1,
//             "user_id": 5,
//             "label": null
//         }
//     ]
// }

What about cascading actions?
One side effect available to foreign key constraints is cascading actions. Since PlanetScale does not support foreign key constraints, it’s not possible to specify these actions when designing your database schema.
Luckily the solution is relatively straightforward. The responsibility shifts to the part you, as a developer, are likely most familiar with: the code. Earlier in this article, I suggested that foreign key constraints can be used to delete blocks associated with a user when that user is deleted. Below is the code that would need to be used to accomplish essentially the same thing:// This line will delete a user based on the passed in `userId`
await db.delete(users).where(eq(users.id, userId))

// And this line will delete the associated blocks
await db.delete(blocks).where(eq(blocks.user_id, userId))

As you can see, it’s only one more line of code that deletes the users’ blocks when that user is deleted. While this is definitely a simple example, you might ask “Doesn’t this require more work to accomplish the same thing?”
Yes and no. It does indeed require more code on the part of the developer to maintain the integrity of the data within the database, however in a more complicated schema, you’ll likely have nested parent/child table relationships that can go several layers deep. If a topmost record is deleted, there is no guarantee that every single nested record will be able to be deleted since ALL constraints on the nested tables will need to be considered by the database engine. In this situation, the database may return an error that the developer will have to handle anyway, or worse yet the application will error out resulting in a poor user experience. By surfacing the task of maintaining the integrity of the data, you’re less likely to encounter these issues over time.
Conclusion
After reading this, you should be well-equipped on how to establish relationships using Drizzle without foreign key constraints. What are your thoughts on using Drizzle with PlanetScale? Let us know on Twitter and tag @planetscale!]]></content>
        <summary><![CDATA[Learn how to build virtual relationships between tables in PlanetScale while using the Drizzle TypeScript ORM.]]></summary>
      </entry>
    
      <entry>
        <title>Horizontal sharding for MySQL made easy</title>
        <link href="https://planetscale.com/blog/horizontal-sharding-for-mysql-made-easy" />
        <id>https://planetscale.com/blog/horizontal-sharding-for-mysql-made-easy</id>
        <published>2023-08-31T17:27:00.000Z</published>
        <updated>2023-08-31T17:27:00.000Z</updated>
        
        <author>
          <name>Lucy Burns</name>
        </author>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[For developers building out an application, a transactional datastore is the obvious and proven choice, but with success comes scale limitations. A monolithic database works well initially, but as an application sees growth, the size of its data will eventually grow beyond what is optimal for a single server.
Implementing read replicas can improve your performance but will likely add lag between your primary and your replicas, leading to performance or correctness issues for your application. These complexities can sometimes require major architectural changes, leading to a suboptimal user experience and difficult compromises, having to choose between application performance or data consistency. Scaling write traffic is even more challenging; for example, even the largest MySQL database will see performance issues at a certain point.
Horizontal sharding
This is not a new challenge; organizations have faced it for years, and horizontal sharding is one of the key patterns for solving it. Horizontal sharding refers to taking a single MySQL database and partitioning the data across several database servers, each with an identical schema. This spreads the workload of a given database across multiple database servers, which means you can scale linearly simply by adding more database servers as needed. Each of these servers is called a “shard.” Having multiple shards reduces the read and write traffic handled by a single database server and makes it possible to keep the data on a single database server at an optimal size. However, since you are dealing with multiple servers rather than one, this adds additional complexity to query routing and operational tasks like backup and restore, schema migration, and monitoring.
Vertical vs. horizontal scaling for MySQL databases
When you are first getting started, you are likely only vertically scaling your database by increasing the size of your cloud instance or buying bigger machines with more CPU cores, RAM, and other storage space available. This can improve the speed and capacity of your MySQL database, enabling it to handle more connections, execute queries faster, and scale up more effectively.
This seems great, but it’s the age-old short-term solution: just purchase more resources to make room for scale. There comes a point when there are more benefits to horizontally scaling your database for performance and cost reasons.
Cost of overprovisioning your database
It’s inefficient to make big leaps in hardware to overprovision for potential spikes in traffic or use. With this method, you will end up paying for resources that you don’t have an immediate need for just to prepare for anticipated spikes in traffic. Once you outgrow your current machine, the next you invest in could easily be 50% larger, while you really only end up using 10% of it.
When infrastructure costs no longer align with the business requirements due to constant over-provisioning, teams often explore horizontal scaling methods where, instead of adding more resources on a single instance, you add more instances to handle increasing workloads. This level of granularity enables you to add smaller hosts and invest in your infrastructure more efficiently.

The problems with sharding at the application layer
Some companies have implemented horizontal sharding at the application level. In this approach, all of the logic for routing queries to the correct database server lives in the application. This requires additional logic at the application level, which must be updated whenever a new feature is added. It also means that the application needs to implement cross shard features. Additionally, as data grows and the initial set of shards runs out of capacity, “resharding” or increasing the number of shards while continuing to serve traffic becomes a daunting operational challenge.
Pinterest took this approach after trying out the available NoSQL technology and determining that it was not mature enough at that time. Marty Weiner, a software engineer who worked on the project, noted, “We had several NoSQL technologies, all of which eventually broke catastrophically.” Pinterest mapped their data by primary key and used it to map data to the shard where it resided. Sharding in this way provided scale but traded off cross shard joins and the use of foreign keys. Similarly, Etsy took this approach when moving to a sharded database system but added a two-way lookup primary key to the shard_id and packed shards onto hosts, automating some of the work of managing shards. In both cases, however, ongoing management of shards, including splitting shards after the initial resharding, presented significant challenges.
From experiences like these, there is an increasing need to separate sharding logic from the application as it introduces a plethora of complexity, making the application and your database harder to manage, which, in turn, drains developer capacity and pulls your team away from building and improving on great products for your customer base.
Horizontal sharding with Vitess
Alongside sharding at the application layer, another approach to horizontal sharding emerged. Engineers at YouTube began building out the open source project Vitess in 2010. Vitess sits between the application and MySQL databases, allowing horizontally sharded databases to appear monolithic to the application. In addition to removing the complexity of query routing from the application, Vitess provides master failover and backup solutions that remove the operational complexity of a sharded system, as well as features like connection pooling and query rewriting for improved performance.
Companies like Square (read about their journey of sharding Cash app), Slack, JD.com, Hubspot, and many more have used Vitess to scale their MySQL databases. JD.com, one of the largest online retailers in China, saw 35 million queries per second (QPS) run through Vitess during a peak in traffic on Singles Day. Slack has migrated all their databases to Vitess, surviving the massive influx of traffic from the transition to work from home in 2020. Both Etsy and Pinterest have moved some of their workloads to Vitess because of the management benefits Vitess provides. Vitess has repeatedly demonstrated its ability to run in production against high workloads with a better experience than sharding at the application layer.
See Deepthi Sigireddi, a maintainer and tech lead for Vitess, talk more about Vitess features in an excerpt from a recent tech talk:
Watch the full recording of the Vitess talk here.
Using PlanetScale for MySQL sharding
“We wanted PlanetScale and Vitess to bring to MyFitnessPal what Kubernetes brought to application delivery and deployment. Databases are hard. We would rather PlanetScale manage them.”
However, running Vitess at scale still requires a whole engineering team with the right experience. Not all organizations have the depth that Slack and Square do. Because of this, PlanetScale democratizes many Vitess features and capabilities, including horizontal sharding, online schema migrations, and more. With PlanetScale, you can unlock all of the power of Vitess in a much shorter time and without all of the required expertise, risk, and potential errors that come with running it yourself. PlanetScale is the only MySQL-compatible database platform built on top of Vitess.
At PlanetScale, we’ve built a managed database platform offering both Postgres and Vitess databases. With Vitess, anyone can access this level of scale for their MySQL databases. You can start small with a single MySQL instance and scale up as you grow. When the time comes to horizontally shard, you’ll design a sharding scheme on top of your existing PlanetScale database; there is no need to change databases just because you need to scale. At the same time, you gain the benefits of the PlanetScale database platform with features like database branching, safe and non-blocking schema changes, schema reverts, built-in query performance analytics with Insights, and much more alongside the benefits of Vitess.
This blog post was initially posted on October 22, 2020. We have updated it with updated information.]]></content>
        <summary><![CDATA[Historically, there has been the belief that you cannot horizontally scale and shard MySQL, learn how Vitess has made MySQL sharding at the database layer a reality.]]></summary>
      </entry>
    
      <entry>
        <title>Deploying multiple schema changes at once</title>
        <link href="https://planetscale.com/blog/deploying-multiple-schema-changes-at-once" />
        <id>https://planetscale.com/blog/deploying-multiple-schema-changes-at-once</id>
        <published>2023-08-29T09:00:00.000Z</published>
        <updated>2023-08-29T09:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[PlanetScale's database branching uses a declarative schema approach, but we take it even further and treat all the changes in your branch as a single deployment. As much as possible, PlanetScale deploys your entire set of changes near-atomically, which means the production database schema remains stable throughout the deployment process and changes almost all at once when all changes are ready. In this post, we will discuss the benefits of atomic multi-change deployments and work through the technical challenges of making them possible.
Why we choose to use "near-atomically"
Atomicity ensures an all-or-nothing change. A data transaction is an obvious example: you can change data in two tables, say insert to one and update the other, and enforce that either both changes happen or none do. You run both in a transaction and finalize the change with a COMMIT. The database keeps a transaction counter along with change journaling. If there's a crash halfway through the transaction, its recovery process can reliably identify the transaction's incomplete and undo the partial changes. But what's trivial for data changes is not so trivial for schema changes.
PlanetScale uses MySQL under the hood through Vitess. With MySQL, it is not possible to transactionally and atomically make changes to multiple table schema definitions. If you want to CREATE one table, ALTER another, and DROP a third, you must run these changes in some order. For this reason, we use the term "near-atomically." We also use this with PlanetScale's gated deployments, and we often use the term "gated" to indicate that all changes complete together.
Why atomic multi-change deployments?
We like to think of schema changes as deployments, similar to code deployments. Existing relational database systems have trained us to think that schema changes are necessarily dangerous, disruptive, irrevertible, and sequential.
Consider that a change to our code requires changing three different large tables:
Adding and modifying a column on one.
Adding a column and an index on another.
Adding a new check constraint on the third.
The database system considers these three unrelated changes, but we know they are semantically related.
Schema changes on large tables may take a long time, sometimes hours or more, to complete. Let's assume each of our changes runs for 8 hours. Historically, we are accustomed to accepting that we must run one ALTER TABLE after the other. Once we start our deployment, we expect 24 hours until it's complete. But during that time, the database is in a semantically inconsistent state. The deployment is partially done and partially queued up.
In an ideal world, we can wait out these 24 hours and call it a day (no pun intended). But in reality, we might find that our design was flawed, or perhaps there's an incident that takes priority, and we want to cancel the deployment. Has it been 10 hours? One of the changes will have been applied, the others are still pending. With traditional databases, you can't just cancel that completed schema change.
We're also used to the notion that a schema change is dangerous. Have we dropped the wrong column? Did you make a bad assumption about the data type? Or did you miss a constraint? The wrong schema change is notoriously known to have been the source of many production system outages. And at 8 hours each, it looks like our three-table deployment will be risky around all time zones.
Gated deployments
Gated deployments offer a change of concept, where all your changes are staged for however long it takes for all of them to be ready. In our example above, we can assume at around 24 hours for all changes to become ready. At that point, we complete the deployment, applying all the changes in production all at once. And because it is impossible to ensure atomicity, the changes are applied a few seconds apart.
With this approach, there is only one "major event" to this deployment. Since gated deployments allow you to pick your preferred time to complete the changes, you can control the time of the "event." And, if there is a change of heart during the staging period or an incident that takes over priorities, the deployment may be canceled without impacting production. The friction point, where a schema may only be partially deployed, is reduced from days or hours to seconds.
A technical overview, and when things get complicated
Some schema change operations are immediate. For example, when creating a new table or modifying a view definition. Those changes have no data directly associated and are very fast to perform. Some ALTER TABLE changes are also eligible for fast execution. And yet, many are not. If we wanted to deploy a CREATE TABLE, ALTER VIEW, and ALTER TABLE, all changes as part of one branch and one deployment, we need to somehow be able to time them such that they all complete together. If we have multiple ALTER TABLE changes on large tables, we need to find a way to not only time those to complete together but also somehow be able to run them all concurrently without putting too much load on the production database.
And on top of it, some of the changes in the migration might actually have dependencies. The user is given a free hand to change their branch, and when the time comes to deploy the branch's changes to production, we may find that one change assumes another already applies.
So, the task is to be able to run some changes concurrently, time long-running changes with immediate changes, and resolve any conflicts that imply ordering changes — all while running concurrently.
Long running changes concurrency
As described in How Online Schema Change tools work, PlanetScale uses an elaborate copy-and-swap algorithm to emulate an ALTER TABLE operation. We create a new table in the likeness of the original table, we modify that table, we bring the new table in sync with the original table by both copying over the existing rows as well as applying the ongoing changes, and we finalize by swapping the two from under the hands of the application.
This emulation mechanism is what allows us some concurrency and control over the cut-over timing. As we complete copying over a table's existing data set, we can continue to watch the ongoing changes to the table, technically indefinitely, or until we decide that it's time to cut over. We impose a brief lock to finalize the last few changes to make the swap. This allows us to run multiple concurrent operations on different tables and keep pushing the final cut-over until we know all operations are ready to complete.
And we don't have to overwhelm the production database during that time. We may alternate between the copying phases — the heavy-lifting part of the emulation — of the different tables and only parallelize the tailing of the ongoing changes.
Timing long-running changes with immediate changes
When we stage a deployment request, we begin by running — but not completing — all long-running changes. When we find, possibly hours later, that all long-running changes are ready to complete, we then introduce the immediate changes — like CREATE TABLE, ALTER VIEW, and similar statements. We can then apply the final cut-over for all long-running changes and the immediate changes, near-atomically, a few seconds apart.
Alas, what happens when one change depends on another?
Resolving dependencies, and supporting concurrency of in-order statements
Consider these simplified two changes:ALTER TABLE t ADD COLUMN info VARCHAR(128) NOT NULL DEFAULT '' AFTER id;
ALTER VIEW v AS SELECT id, info FROM t;

In production, the column info does not exist. The view v in production does not read from this column. To deploy these two changes, we absolutely have to first apply the change to t, and only then the change to v. This seems pretty straightforward: complete the migration on t, and immediately apply the change to v.
However, how do we go about the reverse?ALTER TABLE t DROP COLUMN info;
ALTER VIEW v AS SELECT id FROM t;

If we first make the change to t, then v becomes invalid. At first sight, it may appear that we should first apply the change to v, followed by the change to t. However, the table t may be large enough that it takes hours to migrate. If we want to apply the changes together, then the way to go is:
Begin the change on t.
Wait until the change is ready to complete.
Issue the immediate change on v.
Follow by completing (cutting-over) the change on t.
The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request.
Using schemadiff
PlanetScale continues to utilize Vitess's schemadiff library, which can determine, where possible, a valid sequence (read: ordering) of changes given two schemas. When schemadiff reads a schema, it maps and validates any dependency between entities. For example, it can validate that table and columns referenced by some view exist or that there are no cyclic view definitions (v1 reads from v2, which reads from v1).
When schemadiff compares two schemas and generates the diff statements, it also analyzes the dependencies between those statements. If any two diff statements affect entities with a dependency relationship in the schema(s), then schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then schemadiff needs to resolve the ordering of all three. All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs.
If you take a sample diff from one equivalence class and then some sample diff from a different equivalence class, you know there's absolutely no dependency between the two. They can be executed in any order. However, any two diffs within the same equivalence class can have a dependency and should be treated as if they do, although in some cases, the two could be executed in any different order. To that effect, for each equivalence class, schemadiff finds a permutation of the diffs such that if executed in order, the validity of the entire schema is preserved. It's worth reiterating that changes to the underlying database can only be applied sequentially. Thus, we must validate that the schema remains valid throughout the in-order execution. schemadiff achieves this by running in-memory schema migration and validation at every step.

a. Given a set of diffs,
b. Group them into equivalence classes, where changes to elements that have dependencies are grouped together.
c. Ordering of equivalence classes is arbitrary.
d. Within an equivalence class there must be a valid ordering.
Orchestrating Vitess
PlanetScale then takes that valid ordering of diffs as the blueprint for a deployment where it runs the migrations concurrently via Vitess, staging the changes until it determines that all deployments are ready to complete. At this time, it seals the change near-atomically.
When all migrations are complete, PlanetScale then stages tentative reverts for all migrations. The user has a 30-minute window to undo those schema changes without losing data accumulated. If the user does choose to revert (say, some parts of the app appear to require still the old schema or if performance tanks due to wrong indexing), then those reverts are likewise applied near-atomically. Notably, the reverts are finalized in reverse ordering to the original deployment. There is no need for computation here: we rely on the fact that the original deployment was found to have a step-by-step valid ordering. Undoing those changes in reverse order mathematically maintains that validity.
Limitations
Resources are not infinite, and only so many changes can run concurrently. Altering a hundred tables in one deployment request is not feasible and possibly not the best utilization of database branching. It is possible to go too far with a branch so that the changes are logically impossible to deploy (or rather, so complex that it is not possible to determine a reliably safe path). Like code, schema changes should be made and deployed with measures in place.
Conclusion
Treating a deployment request, a group of schema changes, as a unit that should be deployed all or none is a difficult task that requires complex validation, scheduling, and execution. But the effort pays off: we know that the deployment is cancellable up to the very last moment and without making any impact on production. We only have one potential point in time that requires our attention. We actually control that point in time. We don't need to tell the database how to go about the changes; we only need to tell it what we would like to have.]]></content>
        <summary><![CDATA[Why PlanetScale deploys branch changes near-atomically, and how it applies concurrency and dependency resolution without impacting production databases.]]></summary>
      </entry>
    
      <entry>
        <title>What makes up a PlanetScale Vitess database?</title>
        <link href="https://planetscale.com/blog/what-makes-up-a-planetscale-database" />
        <id>https://planetscale.com/blog/what-makes-up-a-planetscale-database</id>
        <published>2023-08-23T15:00:00.000Z</published>
        <updated>2023-08-23T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Hosting and providing performant, reliable, and scalable databases is no easy feat. PlanetScale strives to provide a straightforward and easy to use interface, but behind the scene it takes a lot of complicated technology to create that simplicity. In this article, we’re going to explore what exactly makes up a PlanetScale database and how each component is leveraged by PlanetScale.
It all begins with Vitess and Kubernetes
Before we can dive into what a PlanetScale database is under the hood, there are a few concepts we need to explore first. Specifically, we’re going to touch on what Vitess is, what Kubernetes is, and how they contribute to your database on PlanetScale.
What is Vitess?
Vitess is a horizontally-scaling, MySQL-compatible database platform originally designed to address scalability at YouTube. In a nutshell, Vitess allows you to run multiple instances of MySQL across multiple servers and have them appear as a single MySQL instance to your application. Along with scalability, Vitess offers a number of other benefits:
Vitess can automatically rewrite and optimize queries that may impact the performance of your database.
Each Vitess cluster has a load balancer that can handle millions of simultaneous connections.
Caching logic is built-in to handle situations where identical queries are sent in at the same time. Instead of querying the database multiple times, the same dataset is returned to each client.
Throughout the rest of the article, we'll dive a bit deeper into how Vitess is used by PlanetScale, along with other features built on top of Vitess that enhance our users' experience. That said, the biggest takeaway is:
Every database and branch on PlanetScale is an independent cluster.
Vitess components used by PlanetScale
There are several primary components used by PlanetScale within each Vitess cluster we host:
Each MySQL instance in a Vitess cluster is called a Tablet, which is a mysqld instance with a sidecar process called vttablet. Each cluster has at least one and it's where your data lives.
The vtgate is responsible for accepting queries and routing them to the proper tablet, breaking up the query, and dispatching it to multiple tablets if needed.
The entire cluster is controlled by a vtctld instance, a management interface that our internal systems communicate with to perform administrative operations.
While this section of the article provides a brief introduction to some of the Vitess components used by PlanetScale, you can find a more in-depth explanation in our What is Vitess: resiliency, scalability, and performance article, or in the Vitess documentation.
Vitess on Kubernetes
Containers provide a way to host applications and systems in an easily distributable format, regardless of the host system they may be running on. Kubernetes is an orchestration tool that provides a way for infrastructure engineers to define the containers and resources they need to run those applications and keep them online. When running in a cluster, Kubernetes will attempt to automatically distribute the load of your application across multiple hosts, as well as handle outages by automatically spinning up new resources to address offline ones as needed. Individual resources in Kubernetes are known as pods, which are collections of one or more containers running within the cluster.
Each of the Vitess components described in the previous section can be run within a Kubernetes environment, combining the benefits of resilient infrastructure with the horizontal scalability of Vitess for each database on PlanetScale.
How PlanetScale uses Vitess
Kubernetes is used by PlanetScale internally to spin up the necessary resources to host databases within Vitess clusters. Each database within PlanetScale gets its own dedicated Vitess cluster to run your database, including all of the necessary infrastructure required to keep it online and resistant to failures. When you create a database, we signal to our own Kubernetes environment that we need to create a new Vitess cluster. Once your database is created, it has at least one vttablet pod to store data and serve queries, a vtctld pod used for controlling the Vitess cluster, one or more vtgate pods to proxy traffic to the vttablet pod(s), load balancing as needed.
Vitess and database branching
If every database in PlanetScale utilizes all of the above tech when it's created, then you might ask yourself “How does PlanetScale handle branching my database?”. Here’s the neat part: each branch in PlanetScale IS its own database, meaning that it also gets its very own Vitess cluster as earlier in this article. The only key difference is when you create a branch, we’ll spin up a new Vitess cluster for it and (using the vtctld component of the two clusters) apply the schema of the source database branch with the one you just created!
Advanced edge connectivity and routing
The MySQL protocol was designed and built way before the era of cloud computing. Typically MySQL servers were hosted in the same network as the applications connecting to them and not over the public internet. Connections to the database can easily be broken if the quality of the connection suffers, resulting in application outages or data loss. This can be especially problematic when connecting to your database over large geographical distances.
In order to help solve this issue, PlanetScale has an edge routing infrastructure across supported cloud providers and regions around the world. When applications connect to databases in PlanetScale, connections are established at the node closest to them. Those connections are then proxied over a global network back to your database in its home region, where the actual data lives.
Since TLS is terminated closer to the code and our own internal network is used when routing traffic to the database, the quality of the connections to the database are improved, resulting in lower latency and faster data access for your application.
Supercharging your database
At this point, you have a powerful, highly available database ready to go, but we don't stop there. Let's look at some of the additional functionality we tack on to your database.
Online schema change infrastructure
Making schema changes to a database can be stressful as some changes may cause tables to be locked, which results in your application stalling since it needs to wait for the changes to be applied to the database. Online schema change tools are used to perform schema changes in various methods that avoid table locking. These tools, however, also require a certain skill set to manage properly.
Instead of having to implement online schema change tools such as gh-ost or pt-online-schema-change yourself, PlanetScale provides this functionality out of the box. By using the power of Vitess, we can schedule and execute online schema changes, as well as clean up old tables that are no longer being used after the migration process. This allows us to support the concept of database branching and deploy requests.
Database branches and deploy requests
When using a PlanetScale database, you can use database branches to apply and test schema changes without affecting your production database. When you want to merge changes between two branches, you'd open a deploy request to review changes and apply them to the upstream branch. Deploy requests also have a number of other benefits. Our system can detect if changes to a table will be destructive to the underlying data which avoids accidental data loss. A series of other checks are run to ensure that the changes being made are compatible with the overall system and won't cause any issues with the target database branch.
Schema reverts
When a deploy request is merged, we use a concept called a "shadow table", which is effectively a hidden table that contains the updated schema of the original table. During the process, data is synchronized between the live and shadow tables. When the deploy request is applied, we flip the status of the two tables, so the shadow table becomes the live table, and vice versa.
Schema reverts give you a window in which the old live table (now the shadow table) will remain in the system so it is available to be utilized in a situation where the applied schema changes cause an issue with your application. By reverting the schema, the status of both tables is flipped once more. Since data is still being synchronized between the two tables, any writes that have occurred during the revert window will be retained, but with the old schema. This gives you peace of mind that even if unintended changes occur to your database schema, you can quickly recover your application with no data loss.
Automated, validated backups
All PlanetScale databases come preconfigured with automated, encrypted backups. You can also configure additional backup schedules as needed.
Whenever a backup occurs, the previous backup for that database is restored to a separate MySQL instance in the database architecture, then replicates the changed data into that instance before taking a new backup. This ensures that all backups are validated in PlanetScale as well to prevent data loss due to corrupted backups.
Primary and replicas
In database architecture, replicas are used to add high availability and improve performance by providing an additional copy of your database to read from. However, adding and maintaining replicas can be challenging in a traditional MySQL configuration.
PlanetScale handles this complexity for you. Every single production branch of a database in PlanetScale comes with at least one additional replica.
Availability zones and automated failovers
Databases on our Base plan are set up with a more advanced infrastructure configuration that provides increased resiliency by distributing the Vitess components of a given database across multiple availability zones in the selected region.
In cloud infrastructure, availability zones (AZ) are separate data centers in a given geographical region. For example, whenever you create an EC2 instance in AWS in the us-east-1 region, you are prompted to select from 6 different AZs provided none others are enabled.
Spanning a single service across multiple AZs is considered a best practice for high availability. Replicas of your database are automatically stored in separate AZs to avoid disasters that may occur at a single data center. In the situation where a disaster does occur and renders your primary MySQL node inaccessible, the Vitess instance running your database branch will automatically failover to one of the replicas and elect it as the new primary, preventing any downtime that would otherwise occur.
A further breakdown of this is available in our Architecture doc.
Built-in monitoring with Insights
Monitoring the performance of queries sent to any database is critical to identifying which queries are not performing optimally, or worse yet, those that will overload your database resulting in slow application performance. To solve this issue, every database in PlanetScale comes with built-in query performance monitoring.
Some databases in PlanetScale receive millions of queries per second. Instead of tracking the statistics on individual queries, we run all queries through a normalization process that allows them to be identified based on their patterns. This allows us to fingerprint a specific query pattern and emit aggregated telemetry for that pattern such as the total execution time, number of queries, rows read, rows written, etc. Through this normalization process, the query data is anonymized by default so we don't track query parameters or the data itself.
Beyond aggregated statistics, Insights also captures details about individual executions of queries that take more than 1 second to execute, read more than 10k rows, or result in an error. If "complete query collection" is enabled we also record the raw SQL for these queries to provide additional context for debugging. Keeping a log of expensive or errored queries makes it easier to troubleshoot problematic queries and keep a handle on errors that your application may encounter.
The Insights tab of your database provides summary statistics and interactive time series graphs for your database as a whole, and for individual query patterns. With this data in hand, it’s easy to identify which queries your application is executing at any given time and keep an eye on query performance.
Availability monitoring
Monitoring how queries affect the performance of your database is just one important thing that needs to be considered when designing robust database infrastructure. Another consideration would be ensuring that the database can send and receive network traffic using the infrastructure external to the database.
For databases hosted with PlanetScale, we also have custom monitoring systems built to ensure that traffic is flowing properly and the database can respond to queries as intended. This helps in avoiding situations where the database itself may be functioning perfectly fine, but upstream networking components or other infrastructure are impacted.
Security and access
Connecting your application to your database branch works exactly like connecting to any other MySQL instance; with a connection string. Connection strings can also have roles assigned to them, ranging from read-only to full schema-changing capabilities.
Administering your PlanetScale database has additional security features that are accessible in an easy-to-use dashboard. User accounts can have granular permissions assigned to them on an organization and database level. Single sign-on is available to enable users to log in with their existing ID provider.
Service tokens offer a way to automate the administration of your PlanetScale organization or database programatically using our API or pscale CLI. Service tokens also have granular permissions, enabling you to lock them down based on their intended use.
We are also a GitHub Secrets Scanning partner. If service tokens or connection strings are accidentally published into a repository, the permissions for that given secret will automatically be revoked and you'll be notified.
Automatic MySQL version updates
Using the combined power of Vitess and Kubernetes, we're able to keep your database up to date with the latest version of MySQL, removing the difficulty of having to perform minor or major updates when new versions of MySQL are released. We're also able to make sure updates are applied successfully and easily roll back if needed.
Conclusion
We've made it very simple to create a database on PlanetScale, but as you can see there is so much more that goes into the process of both creating databases, as well as building in common and sometimes necessary tooling to keep your database running optimally. If you have any questions or comments about what you've read, feel free to send us a note on Twitter at @planetscale!]]></content>
        <summary><![CDATA[Learn about all of the tech driving every PlanetScale database.]]></summary>
      </entry>
    
      <entry>
        <title>Vitess for us all</title>
        <link href="https://planetscale.com/blog/vitess-for-us-all" />
        <id>https://planetscale.com/blog/vitess-for-us-all</id>
        <published>2023-08-22T13:00:00.000Z</published>
        <updated>2023-08-22T13:00:00.000Z</updated>
        
        <author>
          <name>Deepthi Sigireddi</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Learn how middleware technology works, the pitfalls of application-level sharding, and how Vitess enables horizontal sharding of MySQL for near infinite scale.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing IP restrictions</title>
        <link href="https://planetscale.com/blog/introducing-ip-restrictions" />
        <id>https://planetscale.com/blog/introducing-ip-restrictions</id>
        <published>2023-08-15T12:00:00.000Z</published>
        <updated>2023-08-15T12:00:00.000Z</updated>
        
        <author>
          <name>Iheanyi Ekechukwu</name>
        </author>
        
        <author>
          <name>David Graham</name>
        </author>
        
        <author>
          <name>Ayrton</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Starting today, you can add an extra layer of security in connecting to your database by defining which IP addresses can connect to each database branch. Giving your organization the tools you need to operate your databases securely is a top priority, and IP restrictions, launching today in beta, are one additional way to provide an extra layer of security.
While PlanetScale passwords have limited access by default, with each password valid for a single specific branch, some organizations may prefer or even require more stringent access control. IP restrictions let admins specify the ranges of IP addresses allowed to connect for a given password.
Examples of when you may want to use IP restrictions:
You want to segment database access so that the production database can only be connected to from production environments or development branches.
You use a bastion in production and want to ensure that all database connections originate or pass through the bastion.
You want to allow a single client to be able to access your database (e.g., for debugging) and want to provide the least amount of access for them to do so.
You have compliance requirements that require implementing a more stringent access control list in your database.
We know that security is of utmost importance in operating databases, and we hope that IP restrictions provide one more tool for you to help manage your production systems. To learn more, please visit our IP restrictions documentation.]]></content>
        <summary><![CDATA[PlanetScale now supports IP restrictions for database passwords as another tool to operate your database securely.]]></summary>
      </entry>
    
      <entry>
        <title>Storing time series data in sharded MySQL to power Query Insights</title>
        <link href="https://planetscale.com/blog/storing-time-series-data-in-sharded-mysql" />
        <id>https://planetscale.com/blog/storing-time-series-data-in-sharded-mysql</id>
        <published>2023-08-10T18:00:00.000Z</published>
        <updated>2023-08-10T18:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Every day PlanetScale processes more than 10 billion of our customers’ queries. We need to collect, store, and serve telemetry data generated by these queries to power Insights, our built-in query performance tool. This post describes how we built a scalable telemetry pipeline using Apache Kafka and a sharded PlanetScale database.
Insights requirements
To show you Insights, we pull from the following datasets:
Database-level time series data (e.g., queries per second across the entire database).
Query pattern-level time series data (e.g., p95 for a single query pattern like SELECT email FROM users where id = %)
Data on specific query executions for slow/expensive queries (the “slow query log”).
The database-level data fits well into a time series database like Prometheus, but we run into several issues when trying to store the query-pattern level data in a time series database.
On any given day, there are 10s of millions of unique query patterns being used across PlanetScale, and we anticipate that this number will continue growing. Most time series databases start having issues when dimension cardinality is that high. We evaluated the cost of storing this data in Prometheus and found that it would be expensive enough to be fairly unattractive.
We need to store additional data for each query pattern aggregation, such as the complete normalized SQL, the tables used, and the last time the query was executed.
To power Insights search, we need to be able to filter query patterns in sophisticated ways like substring matching against normalized SQL.
Given these requirements, we built a hybrid solution that uses Prometheus for database-level aggregations (where cardinality is suitably low) and a sharded PlanetScale database, backed by MySQL and Vitess, to store query-pattern level statistics and individual slow query events.
Insights pipeline

The Insights pipeline begins in VTGate. VTGate is a Vitess component that proxies query traffic to the underlying MySQL instances. We’ve added instrumentation to our internal build of Vitess that does the following (in addition to serving metrics that Prometheus scrapes):
Sends an aggregate summary for each query fingerprint (for more on how we determine fingerprints, read this blog post about query performance analysis) to Kafka every 15 seconds. 15 seconds is a good balance between keeping the number of messages manageable and providing a near real-time experience.
Sends slow query events to a Kafka topic immediately.
A primary design goal for Insights is that the instrumentation should never slow your database down or cause unavailability. We impose several limits at the instrumentation site to ensure this.
We set a limit for the number of unique query patterns per interval. Since every unique query requires memory to track in VTGate, we need to ensure that we don’t consume an unbounded amount of memory if a database sees an enormous number of unique query patterns very quickly. We monitor VTGates to ensure that even our largest customers aren’t regularly exceeding this threshold.
We limit the number of recorded slow query log events using a continuously refilled token bucket rate limiter with a generous initial capacity. This allows us to capture bursts of slow queries but limit overall throughput. Typically you don’t need to see hundreds of examples of the same slow query, so this doesn’t detract from the product.
Data submitted in VTGate is published to a bounded memory buffer and flushed to Kafka asynchronously. Asynchronous publication minimizes per-query overhead and ensures we continue to serve queries even during a Kafka outage. We guard against a temporary Kafka unavailability by buffering up to 5MB, which will be sent when Kafka becomes available again.
Kafka Consumers
Data is read from Kafka by our application and written to MySQL. The query pattern data is aggregated by time in the database. We store the query pattern data in both per-hour and per-minute roll-up tables to serve large time periods efficiently and small periods with high granularity. Slow query events are written one-to-one into a MySQL table. For both the aggregate and slow query topics, we track the offset and partition from the underlying Kafka messages in the MySQL tables and use uniqueness constraints to avoid duplicating data if Kafka consumers retry batches following a failure.
Aggregate query data is mapped to Kafka partitions by setting the Kafka key to a deterministic hash of the database identifier and the query fingerprint. Because of this, all messages for a given database/query pattern will arrive in the same partition and we can merge aggregate Kafka messages in memory for each consumer batch to avoid unnecessary database writes. In practice, we’ve found that in-memory coalescing decreases database writes by about 30%–40%. Larger batches yield better write coalescing but require more memory in the consumer and increase end-to-end latency. Under normal operations, the average batch size is around 200 messages but can go as high as 1,000 if there is a load spike or we’re working through a Kafka backlog. The higher coalesce rate in larger batches helps us quickly burn down message backlogs when they occur.
Sharding
The Kafka consumers issue about 5k writes per second to the MySQL database, and we need to be ready to scale this out as PlanetScale grows. To ensure that the database doesn’t become a bottleneck, we sharded the Insights database cluster based on the customer database ID. (If you want to learn more about sharding, read our blog post on how database sharding works). Database ID works well as a shard key because we never need to join data across customer databases, and it results in a fairly even distribution of data across shards.
Insights originally shipped with four shards, but we increased this to eight earlier this year to keep up with increased write volume and to build operation experience resharding. Vitess can re-shard an actively used database, but we opted to provision a new, larger, PlanetScale database when we needed to increase the number of shards. Since Insights currently stores eight days of data, we provisioned a new set of consumers, let the new branch receive duplicate writes for eight days, and then cut the application over to read from the new database. This method allowed us to test and gain confidence in the new cluster before placing it in the critical path. Based on load tests and resource utilization metrics in production, we’ve found that our maximum write throughput has so far scaled linearly with the number of shards.
We’ve successfully run the Insights database cluster on fairly small machines (2 vCPUs and 2GB memory). A larger number of smaller shards keeps backups and schema changes fast, gives us the option of quickly scaling up to larger hardware if we encounter an unexpected throughput increase, and gives us breathing room to backfill a new cluster with more shards when necessary.
Percentile Sketches
Time series latency percentiles are critical at the database level, to monitor overall health, and at the per query-pattern level to spot problematic queries. The database-level data is stored in Prometheus, so we can use the built-in quantile estimation tools. Since we’re storing the query pattern data in MySQL, though, we had to find a way to store and retrieve percentile data in MySQL without the help of any built-in functions.
As a brief refresher, a percentile is a summary statistic generated from a set of observations. If the 95th percentile of query latency is 100ms, 95% of the observed queries will be faster than 100ms, and 5% will be slower. Percentiles are typically more useful than other simpler statistics like the mean because they give you a more concrete idea of the actual performance of your system. For example, if the mean response time for a simple lookup query is 100ms — is your query fast enough? It could be that response time clusters tightly around 100ms in which case you probably need to find a way to improve performance. Or it could be that the vast majority of queries are taking a few milliseconds but a single query took 30s, in which case there’s probably nothing to be improved. If you know the 50th percentile query latency is 100ms, on the other hand, you know half of the time your query executes, it’s taking more than 100ms and there’s definitely room for improvement.
Calculating percentiles is harder than averaging though. The most straightforward way to determine the nth percentile is to record every observation, sort them, and then return the value n% into the sorted observations. This would require collecting and storing raw latencies for every single query which is impractical at scale. Another approach would be to precompute percentiles at the instrumentation site, but then we run into another problem: we need to be able to combine percentiles to merge data from multiple sources or roll percentiles up to larger time buckets. Sadly, averaging percentiles does not yield statistically meaningful results.
To efficiently collect and store percentile data that can be combined in a valid way, we decided to use DDSketch. DDSketch is a probabilistic data structure and set of algorithms built to compute error-bounded quantile estimates. DDSketches are fast to compute, bounded in size, and can be merged without losing statistical validity. The core idea is that a set of observations can be approximated by grouping values into buckets with exponentially increasing widths, and then storing a count of occurrences for each bucket. Quantiles can be calculated by storing the buckets in sorted order and finding the bucket key which contains the nth percentile value. Sketches can then be merged by summing the bucket counts. The accuracy of a DDSketch is determined by a parameter, ⍺, which controls bucket width and bounds the relative error of quantile estimates. Setting a lower ⍺ yields more accurate quantiles at the cost of increased sketch size. We’re using ⍺=0.01 which is sufficiently accurate (estimates can be off by at most 1%) and yields suitably small sketches.
Each VTGate instance records a sketch of the latencies for each query pattern and sends it along with the other aggregates every 15 seconds. The sketches are read from Kafka and written to MySQL in a custom binary format. We’ve implemented a small library of loadable C++ MySQL functions that know how to read and write the binary format, allowing us to aggregate sketches and compute arbitrary percentiles in MySQL. Performing these functions in MySQL allows us to calculate percentiles without needing to pull the underlying sketches into our application. It also lets us use the full expressive power of SQL to get the data we need. We plan to open source the MySQL function library in the near future.
MySQL as a time series database
MySQL is not typically the first data store that comes to mind for time-series data. MySQL was not explicitly designed as a time series database. It requires schemas and provides all manner of durability and transactional guarantees that are critical for application data but not strictly necessary in the time series domain. So, why are we storing time series data in MySQL? There are several reasons why this made sense in our case:
The high cardinality of our primary dimension (query pattern fingerprint) made using Prometheus and many other time-series databases prohibitively expensive.
Our set of dimensions is well-known and changes infrequently.
The product requires the ability to filter the dataset in ways that many time series databases do not support.
We have a natural shard key.
A wide variety of OLAP databases could also serve our needs here, but all of them involve significant operational overhead and a steep learning curve. We were pleased that our problem fit nicely into sharded Vitess and MySQL and we could avoid deploying and maintaining an additional storage system. With Kafka and Vitess sharding, we can scale all of the components of the Insights pipeline as volume increases and we’re well positioned to keep up with PlanetScale’s growth.]]></content>
        <summary><![CDATA[How we built a scalable telemetry pipeline with Apache Kafka and PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Is your database bleeding money?</title>
        <link href="https://planetscale.com/blog/database-bleeding-money" />
        <id>https://planetscale.com/blog/database-bleeding-money</id>
        <published>2023-08-08T09:00:00.000Z</published>
        <updated>2023-08-08T09:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Your database should work for you — not the other way around. And yet, many companies are (sometimes unknowingly) losing millions of dollars every year on their databases. What gives?
There are three unexpected, but significant ways that your database might be burning cash:
Downtime
Database management
Database DevOps
The result? You might be bleeding money due to errors (both technical and human), delays, and downtime. Your bottom line suffers. Your users aren’t happy. And it puts unnecessary stress on your engineers.
3 ways your database might be costing you significant money
Let’s dive into exactly how databases can be outrageously expensive in more detail.
1 - Database-caused downtime
Downtime can happen as a result of human error, system immaturity, and application issues. “So we had an hour of downtime — it can’t be that bad, right?” Wrong. Even just minutes of downtime can cost you in various ways:
There’s the actual cost of lost revenue during the time that you’re down: For example, if you’re a major online retailer, even one hour of downtime can easily translate to millions of dollars in sales lost.
There’s also the cost of lost contracts: This is especially pertinent for SaaS service providers — for example, a company that provides authentication. If your database goes down, then all of your customers go down with it. Imagine what’ll happen when it comes time for their renewal: They’ll simply find a competitor who didn’t have those performance issues.
The research tells us that even an hour of downtime can add up to an average of $300,000. It might all sound extreme, but we’ve seen it happen before.
You might recall how in 2019, Costco had website issues during the holiday shopping season. The cost? $11 million in sales alone.
In March of 2015, Apple had a 12-hour outage that cost them $25 million.
Facebook experienced a 14-hour outage in March 2019 that ended up costing them $90 million.
Think of the ripple effect, too. If Facebook goes down, sure, its everyday users can’t log on. What about the brands that pay Facebook to run ads? These same brands rely on the social media platform to boost their sales. How much money did they lose during this outage?
It’s difficult to even quantify the full financial impact of these pricey database outages, but one thing is clear: downtime should be avoided at all costs.
In addition to the financial hit, downtime also comes with a severe loss of trust. These days, downtime is completely unacceptable to users. Your site has to be fully functional and accessible 24/7/365. If users open your app and it doesn’t load or you’re constantly scheduling maintenance windows, they’re going to end up frustrated and switch to a more reliable competitor.
Have you ever dropped the wrong table or index? Did your queries dramatically slow or fail, as a result? The result can be site-wide outages. Then, you have to deal with restoring the right backup. To prevent this, PlanetScale warns you if the table to be dropped was recently queried, so you can avoid dropping a table that is in use. And if there is a mistake made, you can revert schema changes while retaining the data in most cases.
PlanetScale offers fully-managed Vitess clusters. Vitess is an open-source database clustering system that enhances the scalability and manageability of MySQL. Vitess has been pressure-tested at scale. It is widely adopted among the hyperscalers and is the primary datastore at companies like Slack, HubSpot, and Etsy. In the time it takes you to read this blog post, Vitess clusters will have served 10s of millions of users and 100s of millions of queries across 100s of petabytes of data.
2 - Database management
Who’s managing your database? How many hours a week are they spending on it? It’s not uncommon for Engineering teams — especially those belonging to small and medium-sized businesses — to not have dedicated database administrators (DBAs) or anyone else whose primary responsibility is managing the database infrastructure.
While an app or company is small, this may be feasible. But as you grow, your database needs to be able to grow alongside you — and it’s going to require more and more attention. If there’s no one dedicated to managing it, then the burden will fall on your engineers.
The average engineer is typically not a database expert. Learning how to deal with scaling, uptime, backups, monitoring, version updates, security, compliance, and so on becomes time-consuming, especially considering that your database always needs to remain up. This means that for your engineers, managing your database can easily keep them on the clock 24/7.
Most importantly, if your engineers are spending their time managing the database, they have less time to make crucial application updates that actually set your business apart from your competitors, who are busy speeding ahead of you shipping new features that customers actually care about. At the end of the day, the customer only sees the end product, not all the time you’re spending on infrastructure.
What’s the next step, then?
If you’re at the point where you’re actively trying to scale, you might think that the next natural step is to hire a DBA or even a team of people, especially if your database has become very large and performance is starting to become an issue. So, what’s that going to cost you?
Hiring a DBA isn’t cheap. In fact, there was a 6.9% salary increase for this role in 2022. Meanwhile, there was a 50% drop year-over-year in venture funding, as of Q3 of 2022. In other words, DBAs are expecting more, while budgets are shrinking.
It’s not just the salaries you’re paying, either. You still have the astronomical fees you pay just to host your database. Why settle for this?
What if your database provider could do more for you? This is where PlanetScale shines. With the PlanetScale platform, you essentially get a built-in DBA:
Git-like workflows for schema changes
Automatic no-downtime version updates
Automated and pre-tested backups
Replicas included on every production branch
Built-in query monitoring
Revert button to easily undo schema changes with no downtime
Horizontal sharding with minimal application changes
Easy-to-configure options for read-only regions, additional replicas, backups, and more
World-class support options
Built-in caching for some frequently used queries
Barstool’s CTO, Andrew Barba, knew that if he wanted to scale rapidly and increase velocity, he’d have to hire a lot more engineers. He was also mindful of their many outages — one of which cost the company a couple of million dollars in just 45 minutes. They ended up moving over to PlanetScale completely. “In the end, we saved 20-30% by switching to PlanetScale.”
3 - Database DevOps
Making database schema changes safely is a multi-step process that can be prone to errors and downtime if there are no safeguards in place.
This process frequently takes hours, if not days or weeks, to manually review and validate every database schema change. It’s not something you can really avoid, either: Schema changes are included in roughly 57% of all application changes. This tedious review process often becomes a huge blocker to shipping application changes. The more time your engineers are spending on this task, the less time they have to continue innovating on your application.
And, again, not making changes to a database isn’t exactly an option. So, companies need a way to do it safely, quickly, and cost-effectively. The problem is, there isn’t really an easy and universal solution to safeguarding this process.
The ongoing management and deployment of database changes is by far the slowest and riskiest part of the application release process.
Why? Most of these changes are performed manually by database administrators (DBAs) who spend countless hours to create, review, rework, and deploy database changes in support of rapid application delivery. This creates a huge bottleneck for the overall release process, as database changes happen every day.
In software development, processes around continuously deploying application code have evolved and matured to a degree that even hobby developers typically have some kind of robust CI/CD process in place. Surprisingly, this level of maturity hasn’t fully reached the database world. The database is often treated as a separate function of shipping application changes – even though it’s an important and essential part of the process.
Nearly all (92%) report that they face difficulties. The top challenge participants cited is the lack of tools to automate the database deployment process (50%).
This was followed closely by long database change review and approval cycles (49%) and having a very manual deployment process with many steps that can fail (48%).”
PlanetScale offers extra protection with safe migrations. This means that branching enables you to have zero-downtime schema migrations, the ability to revert a schema, and protection against accidental schema changes. With our non-blocking schema change process, we make a copy of affected tables and apply changes to that copy — instead of directly modifying tables when you deploy a request. And importantly, our revert feature allows us to go back in time and revert a schema migration deployed to your database and even retain lost data.
The bottom line
Can databases be exceedingly costly? Yes. Do they have to be? Absolutely not. With the right engine operating behind the scenes — plus the important guardrails you need to keep your site up and running — your database can work swiftly, efficiently, and proactively.]]></content>
        <summary><![CDATA[Databases can cost your company millions if they don’t function as they should. What are the biggest pitfalls, and how can you avoid them?]]></summary>
      </entry>
    
      <entry>
        <title>How PlanetScale unlocks developer productivity</title>
        <link href="https://planetscale.com/blog/how-planetscale-unlocks-developer-productivity" />
        <id>https://planetscale.com/blog/how-planetscale-unlocks-developer-productivity</id>
        <published>2023-07-26T14:01:00.000Z</published>
        <updated>2023-07-26T14:01:00.000Z</updated>
        
        <author>
          <name>Justin Gage</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[You and your team have worked tirelessly on finding and hiring the best software engineers around, but are they as productive as they want to be? More and more organizations are focusing on how to improve developer productivity, and we believe it starts with your database. This guide will walk through what slows developers down, why it matters, and how a better database can unlock developer productivity — and ergo, innovation — for your organization.
What slows developers down?
You can hire the most talented engineers out there, but if your processes and tech stack are inefficient and outdated, it’s going to slow the team down.
Technical debt is going to be a part of any company, but too much of it — and a lack of clear guidelines around how your team is tackling it — can waste precious engineering hours.
Friction between teams responsible for different parts of the stack can stick code in limbo. Waiting on a DBA to approve schema changes or run a migration is costly.
Inefficient processes for building, testing, and deploying code can waste engineering time on things that aren’t building customer-facing features.
Clunky tools can require extra time to create something that would be much faster and easier with modern, streamlined tooling.
This guide focuses on a specific, often unsung culprit of lost engineering productivity: your database. You might be surprised to see precisely how impactful it can be when your database and the processes around it prevent your developers from spending their time on what matters.
Your database might be the problem
If you look at most of the things that slow developers down, there’s a theme: they can (and usually do) relate to your application database. Your team just wants to ship code, but a majority of the time, that requires interacting with your database. And this significantly slows down the pace of your team.
Interfacing with a separate team
In sufficiently large organizations, an independent team of DBAs manages the database, reviewing and tuning every desired change from engineers. That separation of responsibilities makes sense — at a certain size, the database becomes too complex for a single engineering team to understand intimately — but completely slows down any momentum for features your engineers are building. 49% of organizations report a lengthy process for database review and approval cycles:
Planning for massive scale while also rapidly iterating on our feature set was a demanding place to be. PlanetScale delivers world-class operations, a scalable platform, and the flexibility the business needed.
The ongoing management and deployment of database changes is by far the slowest and riskiest part of the application release process.
Why? Most of these changes are performed manually by database administrators (DBAs) who spend countless hours to create, review, rework, and deploy database changes in support of rapid application delivery. This creates a huge bottleneck for the overall release process, as database changes happen every day.
More than half (57%) of code changes end up requiring an accompanying database change. Waiting weeks for a DBA team to approve and adjust your code significantly slows your developers’ productivity.
Genuine fear around schema changes
For smaller organizations without a dedicated team to manage the database, you have the opposite problem: Engineers are well aware of how delicate the database is and are cautious (perhaps overly so) about making any changes. 48% of organizations say their database change process has several steps that can fail.
Chances are, your developers have been stung by downtime or data loss before, but the extra caution around making database changes significantly slows down how quickly you can ship code.
Ongoing monitoring and improvements
Effective organizations have performance and scalability built into their testing processes from the start, but it’s impossible to predict the future. Your team might ship a new feature that loads quickly and works well initially, but that begins to change as the underlying table grows in size and complexity. When your app becomes slow and your users complain, developers start working on rewriting queries, refactoring application code, modifying the schema, and even caching (good luck). Whatever works, it’s yet another task your team is spending time on instead of working on new features.
The way we build, test, and deploy software has changed massively over the past decade, but the way we interact with the database has stayed largely the same.
What if your database made you more productive?
For as long as software engineering has been around, the database has existed outside the scope of DevOps. Our application code is tightly version controlled, reviewed, tested, and deployed, but we still make schema changes manually and hope for the best. The solution is to bring the database into the DevOps cycle and create a dedicated workflow for managing changes.
Here’s what a traditional workflow looks like for making a database change with a DBA team:

Your workflow to make a schema change starts with a team building a new feature. Most meaningful new features will require a database change. If you have a DBA team, a request gets sent their way to go through their review process. If not, your team does extensive testing before gingerly making the change themselves. If the change goes well, great! If not, you’re faced with the same cycle all over again.
Throughout this entire process, your database itself acts as an almost innocent bystander. We’ve built these clunky, backward change processes around a tool that has stayed roughly the same for the past 20 years, while the way we create and deploy code — and the tools we use to do it — have gotten better by leaps and bounds.
Bringing your database into the DevOps cycle
The database you need has features built in that make change management as simple as a comment and a click, letting you bring it into your DevOps cycle natively. One of the core insights we built PlanetScale on top of is that the databases have been largely untouched by the changes in how we ship software over the past 10 years. It’s time to change that.
Let’s cover a few flagship features that help PlanetScale unlock your developers’ productivity.

What if your database had branches?
The first thing you do when work starts on a new feature is git checkout -b “new-branch-name” — branching is part and parcel of modern software development and delivery. PlanetScale brings that same functionality to the database, allowing you to create independent, merge-able branches of your data.

Branches take the guesswork out of schema changes. When you create a branch, an isolated database instance gets initialized (schema only for development) where you can make and test your changes. When you’re ready, you can promote it to main just like you would with code.
Deploy requests and database CI/CD
When your database lives outside your DevOps cycle, changes get made (and reverted) statically and non-collaboratively. PlanetScale’s deploy requests feature brings the database into the cycle by giving your team a singular view of the proposed changes and their impact.

After reviewing your request and the impact it will have on tables, your admin can merge the request and promote it to production (all of which is stored in history, by the way) — just like a Pull Request on GitHub. It’s like a review process for your database, built right in.
Reverting schema changes easily
In a perfect world, development would mirror production identically and schema changes that were tested properly would work every time. In the real world though, no matter how smoothly your database handles changes, you’ll need to revert one every now and then. PlanetScale makes that process criminally easy: You can one-click undo a schema change with no downtime or data loss.
Insights and automated monitoring
So you’ve used branching and deploy requests to seamlessly get your database change into production. The next step is monitoring your changes and making sure that the query you wrote that utilizes the new data is performant, keeping your users happy. PlanetScale Insights gives you a remarkably granular view of query performance, all built-in natively to the database.

You can choose from metrics like rows read, query latency, number of times a query has run, etc. Seeing your recent queries, how they performed, and any errors they ran into in a single place makes it easy for your developers to quickly analyze and make adjustments on the fly.
Don’t just take our word for it
We built PlanetScale because we were frustrated with the database being the bottleneck for moving customer-facing features forward. Developers at organizations across the globe, from startups to Fortune 500 companies, are using PlanetScale features like branching, deploy requests, and insights to unlock developer productivity and ship things faster.
Planning for massive scale while also rapidly iterating on our feature set was a demanding place to be. PlanetScale delivers world-class operations, a scalable platform, and the flexibility the business needed.
We used to check the AWS dashboard nightly, now we never think about PlanetScale. We don’t want to be DevOps experts and we don’t need DevOps database work. We want to always focus on making our product better.
Having Insights where it captures 100% of queries scratches an itch like nothing else can. You can get the complete picture.]]></content>
        <summary><![CDATA[A guide on what slows developers down, why it matters, and how a better database can unlock developer productivity.]]></summary>
      </entry>
    
      <entry>
        <title>Incorporating databases into your CI/CD pipeline</title>
        <link href="https://planetscale.com/blog/databases-ci-cd-pipeline" />
        <id>https://planetscale.com/blog/databases-ci-cd-pipeline</id>
        <published>2023-07-18T13:00:00.000Z</published>
        <updated>2023-07-18T13:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[In this tech talk, we demo how to build your own CI/CD pipelines, where to incorporate the database, and why the database has historically been left out of CI/CD.]]></summary>
      </entry>
    
      <entry>
        <title>Performant database tree traversal with Rails</title>
        <link href="https://planetscale.com/blog/performant-database-tree-traversal-with-rails" />
        <id>https://planetscale.com/blog/performant-database-tree-traversal-with-rails</id>
        <published>2023-07-12T17:30:00.000Z</published>
        <updated>2023-07-12T17:30:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We recently solved an interesting performance problem in our Rails API when performing a tree traversal.
We have some code that figures out a 3-way merge for your database schemas. To do this, we need to calculate a merge base between two schemas (like git!).
Our Rails API keeps track of changes to each PlanetScale database schema. We call each of these changes a "schema snapshot," similar to a git commit that stores the state of the schema at a specific time. Each snapshot can have one or two parents. When merging branches, we perform a breadth-first search on the history of each change until we find the common ancestor between both branches. This is the merge base.

Going one-by-one is slow
Finding the merge base involves checking each snapshot in the history of both branches.
Usually, this is fast because most databases only have a few changes.
Others, though, would have thousands. This is where we'd run into performance issues. Since we were doing a database query for every step as we traversed the tree. Even though each query was fast, the network time alone added up quickly.select * from schema_snapshots where id = 20
select * from schema_snapshots where id = 21
select * from schema_snapshots where id = 22
select * from schema_snapshots where id = 23
select * from schema_snapshots where id = 24
// *thousands more queries*

Solving the N+1 problem
This is a classic "N+1" performance problem. For each step in the process, we trigger another query.
Usually, fixing these in Rails is quite simple. There is an includes method for preloading the data we need. Unfortunately, in this case, due to the data structure, the normal preloading techniques do not work. We had to invent something new.
In-memory cache
First, we started by creating an in-memory cache for storing the snapshots. This is a temporary cache that only exists to find the merge base. In-memory is important here because our primary performance issue was caused by the sheer number of network requests.class SnapshotMergeBaseCache
  attr_reader :store, :hits, :misses

  def initialize
    @store = {}
    @misses = 0
    @hits = 0
  end

  def get(id)
    if @store[id]
      @hits += 1
      return @store[id]
    end

    @misses += 1

    add(SchemaSnapshot.find(id))
  end

  def add(node)
    @store[node.id] = node
  end
end

This gives us a place to store all of our snapshots in memory. It uses a simple hash where the id is the key, and the value is the snapshot data.
The add method puts a snapshot into the cache. The get method is for retrieving it. The get method also keeps stats on the hits/misses. We used these stats to understand how well it worked once in production.
Preloading the cache
Now that we have the cache, the next step is bulk-loading it with snapshots. Conveniently, the snapshot history is pretty predictable when finding the merge base. We could preload the X most recent snapshots for each branch and drastically reduce the number of trips back to the database.cache = SnapshotMergeBaseCache.new

from_branch.recent_snapshots.limit(FROM_PRELOAD_COUNT).each do |snap|
  cache.add(snap)
end
into_branch.recent_snapshots.limit(INTO_PRELOAD_COUNT).each do |snap|
  cache.add(snap)
end

Now, when running our breadth-first search, we can use cache.get(id) to find the next node. It hits the cache in most cases, avoiding the network request and solving our performance problem.
Rolling out & testing
Making changes like this can be tricky. There is often a wide gap between what you expect to happen and the reality of production.
First, we needed to ensure that it was accurate. We ran a few tests where we calculated the merge base using both the old and new methods for thousands of databases. This made us confident the new code was returning the correct results.
We then used feature flags to test rolling out the new code path and recorded data on how it performed. The hits and misses data proved useful for fine-tuning the number of snapshots we preloaded. After a couple iterations, we released it to 100% of our customers.
Alternative solutions
Adding an in-memory cache is just one way of solving this problem. This worked out best for us due to the high number of snapshots we needed to traverse for some databases. It was also simple to layer this solution onto our existing code without many major changes. This reduced the risk when rolling it out.
Database recursive CTE
One option for solving this is letting the database do the work. This can be accomplished with a recursive common table expression. With this, the database could follow the pointer to each record until it finds the common merge base.
Materialized path
The materialized path technique is a way to represent a graph in a SQL database. It stores the relationship history in a single column, such as 20/19/15/10/5/3/1. By doing this, you can then look at the history of two nodes and find their common parent.
This is a great option that works well for tree structures with a known limit. In our case, storing thousands of relationships didn't make this feasible.]]></content>
        <summary><![CDATA[Learn how to solve a tree traversal N+1 query problem in your Rails application.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing PlanetScale Scaler Pro</title>
        <link href="https://planetscale.com/blog/announcing-scaler-pro" />
        <id>https://planetscale.com/blog/announcing-scaler-pro</id>
        <published>2023-07-06T09:00:00.000Z</published>
        <updated>2023-07-06T09:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Note: The Scaler Pro plan has since been renamed to the Base plan.
Today, we're announcing a new generation of PlanetScale plans and pricing. Most prominently: we're replacing our Teams plan with a new "Scaler Pro" offering that combines the best of our current plans and enterprise offerings for companies of all sizes. These plans allow customers to select exactly the resources they need for their workloads.
As we've onboarded thousands of customers, large and small, one of the most common pieces of feedback we've received is that customers are unsure how to map our pricing model onto their business. Reads and writes are easy to understand conceptually, but they don't map back to capacity planning that businesses are accustomed to. Many customers want more control than the traditional serverless model provides and clarity on how it compares to products like Amazon RDS and Google Cloud SQL.
The serverless pricing model is an interesting framework for databases, but many real-world workloads need to be delineated in more concrete metrics with as little variability as possible. When predictability, stability, and availability matter most, worrying about unbounded and unknown costs is a distraction nobody can afford.
Scaler Pro gives you the controls needed to understand how to grow and scale. Our new plans come with the ability to scale up and down on demand, all with cutting-edge resiliency and availability features. With Scaler Pro, you get a competitive price compared to established Database-as-a-Service (DBaaS) solutions while gaining significantly more features.
This is something we haven't done lightly. We pride ourselves on the innovations we've made in our serverless database pricing model and still aim to be the best database for serverless.
Scaler Pro pricing
Scaler Pro databases are priced on a combination of resources (CPU and memory) and disk storage. Every database has a 'cluster size' encompassing the components that make up a PlanetScale database. Much like our existing plans, we only charge for the storage allocated in your tables, but not binary logs or other metadata.
Refer to our Plans documentation for a full list of Scaler Pro configuration options and pricing.
Now let's dive into what goes into a PlanetScale Scaler Pro database, how it stacks up next to common alternatives that you might already be using, and what you're buying when you provision one.
What is in a PlanetScale database?
When you think of your PlanetScale database, what do you imagine? For some, a 'serverless database' means something mystical; others are probably picturing a complex web of virtual machines, hard drives, network switches, and all of the pieces in between.
Let's look at a Scaler Pro database in the PlanetScale product and break it down a bit:

In this diagram, the most important components are the primary and replicas. Each of these is an instance of MySQL and the Vitess components around it to handle orchestrating failovers, backups, etc.
Why is this important? It's where the data is! Underneath it all, PlanetScale uses standard MySQL replication to replicate your data and exists in multiple availability zones. All Scaler Pro production branches by default are:
Replicated across three availability zones in GCP or AWS
Configured to use MySQL's semi-sync to ensure that writes are persisted across at least two availability zones before being acknowledged to the client
Monitored continuously with Orchestrator, which handles planned and unplanned failovers
For Scaler Pro databases, this means that any piece of data you write will exist in at least two redundant block storage volumes that span failure domains, ensuring that the data will be there when you need it. With the Vitess-native Orchestrator looking over your database, the loss of a virtual machine or other problem will be swiftly remediated without a human involved.
Beyond that, PlanetScale runs even more infrastructure behind the scenes: backups are taken by spinning up MySQL instances on-demand that validate existing backups and create new ones, a bespoke pipeline to power PlanetScale Insights, and much more.
Amazon RDS vs. Scaler Pro and Availability Zones
In order to create logical and geographic redundancy, cloud providers split up their regions into multiple isolated datacenters that they call availability zones (AZs). In us-east-1, there are six of them: us-east-1a through us-east-1f. Each of these has isolated power and networking, and can be leveraged to create more resilient applications and infrastructure by expecting the failure of one or more of them in your code.
It may come as no surprise, but Amazon is very familiar with the tradeoffs necessary to run MySQL databases. To better understand what Scaler Pro is offering, we can compare it to the different tiers of RDS databases:
Single-AZ
Multi-AZ with one standby
Multi-AZ with two readable standbys
We can quickly eliminate single Availability Zone (AZ) from consideration for production workloads: it doesn't offer any failover capabilities, no redundancy for the degradation of an availability zone, and resiliency to losing data is limited to backups. A Single-AZ database might be great for small or staging workloads, but not much more. This would be the equivalent of a development branch on PlanetScale.
Looking at Multi-AZ, things get more interesting. This option includes a "standby" instance to which all writes are synchronously replicated, ensuring that if the primary availability zone were to vanish, all of the acknowledged queries would be contained in the second availability zone, just a failover away. This is great for resiliency but not availability. Failover times are a minute or longer, and no read replicas are available for scaling database traffic.
Finally, we land on Multi-AZ with two readable standbys. This option includes a primary and two replica databases and ensures that any writes to the primary have been acknowledged by at least one replica.
If you've been reading along, this will sound familiar — a PlanetScale Scaler Pro database has the same semantics! The biggest benefits of this option are that you gain two replica instances that you can use to scale your reads, and much faster failovers due to the live databases that are running and ready to take traffic.
Because there's such a direct comparison, it's now much easier to understand how much switching to PlanetScale could save you. Let's take a look at an apples-to-apples comparison:

As you can see, PlanetScale offers all of the benefits of a Multi-AZ with two readable standby databases, at the price of a Multi-AZ with one standby database. This is before we add PlanetScale's industry-leading Insights and non-blocking schema changes.
Wrapping up
Whether you're a current PlanetScale customer or are considering switching from RDS or Cloud SQL, as many have already done, we hope the Scaler Pro pricing scheme works for you. It's as easy as clicking a button to migrate an existing Developer or Scaler database today. For more in-depth coverage of the Scaler Pro plan, check out the plans article in our docs.
If you want to give it a try with a new database, plug your RDS database into our Imports tool today and give it a whirl — we don't think you'll be disappointed!
If your RDS database requires more scale than any of the sizes on our current price sheet, reach out to our sales team today — we can accommodate workloads that require terabytes of memory and petabytes of storage.]]></content>
        <summary><![CDATA[See how Scaler Pro combines the best of our current plans for companies of all sizes while enabling you to grow with the best database for serverless.]]></summary>
      </entry>
    
      <entry>
        <title>Sharding vs. partitioning: What’s the difference?</title>
        <link href="https://planetscale.com/blog/sharding-vs-partitioning-whats-the-difference" />
        <id>https://planetscale.com/blog/sharding-vs-partitioning-whats-the-difference</id>
        <published>2023-06-30T17:10:05.128Z</published>
        <updated>2023-06-30T17:10:05.128Z</updated>
        
        <author>
          <name>PlanetScale</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Sharding and partitioning are techniques to divide and scale large databases. Sharding distributes data across multiple servers, while partitioning splits tables within one server.
Database sharding and partitioning
Partitioning and sharding are two common ways to improve performance, manageability, and availability of larger databases. You need to understand the differences between the two solutions in order to determine the most appropriate approach for your database architecture.
What is sharding?
Sharding, also known as horizontal partitioning, is a database partition approach that divides the database schema and distributes them across multiple instances or servers into smaller parts that are faster and easier to manage. When a database is sharded, a replica of the schema is created. This is then used to divide data to be stored in a shard based on a shard key. To make this possible, a special logic or identifier called a "shard key" is used to determine which specific instance or server holds the data to query.
How does sharding work?
To gain a deeper understanding of how sharding operates, and how you can use it, let's consider a scenario to illustrate its effectiveness. Imagine a social media platform with millions of users worldwide. In this case, you can implement sharding based on geographical regions. For example, users from North America would have their data stored in instance 1, while users from Europe would be allocated to instance 2, and so forth.
These code snippets demonstrate a basic implementation of database sharding based on geographical regions in MySQL. The users table holds user data, while the user_regions table maps each region to a specific database instance. This allows for distributing user data across multiple database instances based on their respective regions.-- Create the shard mapping table for users

-- Create the shard mapping table for user regions

CREATE TABLE user_regions (
  region VARCHAR(255) NOT NULL,
  instance_id INT NOT NULL,
  PRIMARY KEY (region, instance_id)
);

INSERT INTO user_regions (region, instance_id) VALUES ('North America', 1), ('Europe', 2), ('Asia', 3);

Here is the query that retrieves the user record for the username “johndoe” from the database instance responsible for storing users in the North America region. This query utilizes database sharding, which splits the user database tables into multiple instances.-- Create a function to get the instance ID based on the username
DELIMITER $$

CREATE FUNCTION get_user_instance_id(username VARCHAR(255)) RETURNS INT
BEGIN
  DECLARE region VARCHAR(255);

  SELECT region INTO region FROM users WHERE username = @username;

  RETURN (SELECT instance_id FROM user_regions WHERE region = @region);
END $$

DELIMITER;

SELECT * FROM users WHERE username = 'johndoe';

By geographically dividing the data, sharding allows for localized access and efficient management of user information. This approach proves particularly beneficial when it comes to optimizing performance for user interactions. Imagine a user in Europe trying to retrieve their profile information. Instead of traversing through the entire database, the system can use a shard key to quickly pinpoint the specific shard (instance) where the data is located, leading to faster response times and a better user experience.
However, it's important to consider certain factors to ensure fair distribution of data across instances. The varying user populations across different regions should be taken into account. For instance, North America may have a significantly larger user base compared to other regions. To address this, a more intelligent sharding strategy can be used, such as a combination of geographical and demographic factors. This way, the distribution of data can be more balanced and reflective of the user distribution across the regions.
What are the advantages of using sharding?
Some benefits of sharding include:
Improved response time
Maintenance tasks, like backups, take less time to complete
Schema migrations complete faster
Increased read/write throughput
Increased storage capacity
Improved availability
Outages are more isolated and less impactful
To learn more about the benefits of sharding, check out our Benefits of sharding blog post.
What are the disadvantages of using sharding?
Sharding brings the complexity of managing tables that are distributed across multiple servers. It can be difficult to manage database queries. As the data grows, merging shards can become more complicated to handle. Using the wrong sharding architecture can slow down performance. Be sure to choose a sharding technique that allows a balanced data distribution across all shards.
If you’re looking for an elegant, safe, and easy-to-implement database sharding solution, PlanetScale has you covered. PlanetScale supports Vitess, the technology that helped scale YouTube and continues to scale massive companies like Etsy and GitHub. PlanetScale Vitess clusters offer nearly-infinite scale through horizontal sharding.
For a more in-depth look at database sharding, check out our “How does database sharding work?” blog post.
What is partitioning?
Partitioning is just a general term referring to the process of dividing tables in a database instance into smaller sub-tables or partitions. These partitions can be accessed and managed separately to enhance performance, maintainability, and availability of the database.
The remainder of this article will cover partitioning.
When to use partitioning on a database
Querying a record in a database that has millions of records can be costly. To optimize the queries, database partitioning can help reduce the query response and resources. Here is an example of how you can use database partitioning to make the query faster.-- Create the 'users' table with partitioning
CREATE TABLE users (
  id INT NOT NULL AUTO_INCREMENT, -- Unique ID for each user
  username VARCHAR(255) NOT NULL, -- User's username
  email VARCHAR(255) NOT NULL, -- User's email address
  password VARCHAR(255) NOT NULL, -- User's password
  PRIMARY KEY (id)
)

The provided code shows how to partition a database table based on the id column. The PARTITION BY RANGE clause is used to divide a table into multiple partitions. In this case, there are three partitions defined. There is partition p_0, p_1, and p_2.
PARTITION BY RANGE (id) (
  PARTITION p_0 VALUES LESS THAN (150000),
  PARTITION p_1 VALUES LESS THAN (250000),
  PARTITION p_2 VALUES LESS THAN (MAXVALUE)
);

Insert user data into the users table:INSERT INTO users (username, email, password) VALUES
  ('johndoe', 'john.doe@example.com', 'password123'),
  ('janedoe', 'jane.doe@example.com', 'password456');

Now that the partitions are created, you can retrieve user data from the appropriate partition based on the ID range.-- Example query to retrieve users with IDs less than 150,000
SELECT * FROM users PARTITION (p_0);

-- Example query to retrieve users with IDs between 150,000 and 250,000
SELECT * FROM users PARTITION (p_1);

-- Example query to retrieve users with IDs greater than 250,000
SELECT * FROM users PARTITION (p_2);

What are the advantages of database partitioning?
Partitioning a database allows you to distribute data across multiple physical or logical storage units called partitions. By dividing the data, you can improve query performance by reducing the amount of data that needs to be scanned or accessed. Database partitioning adds efficiency to querying the database. It also makes maintenance operations easier. When a database is partitioned, you can target a specific partition to query a record, rather than traversing through the entire dataset. For example, if you partition a database by date, queries that only need to access data from the last month can be executed much faster than if they had to access all the data in the database.
Accessing a database in parts can give security control. If you are storing some confidential information in the database, you can allow a certain group of users to access only partitions that don't have confidential information.
What are the disadvantages of database partitioning?
One of the disadvantages of database partitioning is the complexity it brings. Partitioning can streamline certain maintenance tasks, but it can also complicate other aspects. Complexity can increase the chances for errors. For instance, the management of backups and recovery procedures can become more difficult to handle when you have multiple partitions. It can also lead to a false sense of security: If you're not careful, having multiple partitions could lead to a data loss disaster. Juggling partitions can lead to wasted space, and it may be unnecessary for the average user.
What are the differences between sharding and partitioning?
While sharding and partitioning share the common goal of dividing a large database into smaller ones, they have different approaches to achieve this. When sharding a database, the data is distributed across multiple servers, resulting in new tables spread across these servers. On the other hand, partitioning involves splitting tables within the same database instance. Sharding is referred to as horizontal scaling, and it makes it easier to scale as you can increase the number of machines to handle user traffic as it increases. Partitioning splits based on the column value(s). All columns should be retained when partitioned – just different rows will be in different tables. It is also easier to manage data with partitioning, as all partitions are in one database instance.
Conclusion
In conclusion, both sharding and partitioning are powerful techniques that enable scaling and efficient data management in large databases. By understanding their differences and considering factors such as data distribution, performance optimization, and manageability, you can choose the most appropriate approach for your specific database architecture. Implementing sharding or partitioning can significantly enhance the performance and scalability of your database, allowing it to handle increasing user traffic and serve millions of requests effectively.]]></content>
        <summary><![CDATA[Sharding and partitioning are two common ways to improve performance, manageability, and availability of larger databases.]]></summary>
      </entry>
    
      <entry>
        <title>Introduction to PlanetScale</title>
        <link href="https://planetscale.com/blog/introduction-to-planetscale" />
        <id>https://planetscale.com/blog/introduction-to-planetscale</id>
        <published>2023-06-29T13:00:00.000Z</published>
        <updated>2023-06-29T13:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[This video is an introduction to the PlanetScale database platform and how it can increase developer velocity, including a demo of features, such as branching, deploy requests, and query insights.]]></summary>
      </entry>
    
      <entry>
        <title>How PlanetScale keeps your data safe</title>
        <link href="https://planetscale.com/blog/how-planetscale-keeps-your-data-safe" />
        <id>https://planetscale.com/blog/how-planetscale-keeps-your-data-safe</id>
        <published>2023-06-28T00:03:57.138Z</published>
        <updated>2023-06-28T00:03:57.138Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Keeping data safe and durable should be a top priority for any business that depends on databases to store and manage critical information. At PlanetScale we take data safety extremely seriously. In this post, we will walk you through our multi-layered approach to ensure your data is safe.
Vitess and MySQL
Whenever you create a database on PlanetScale you are actually creating a complete Vitess cluster. Vitess is an open-source database clustering system that enhances the scalability and manageability of MySQL.
Vitess is very widely adopted among the hyperscalers and is the primary datastore at companies like Slack, Hubspot, and Etsy. In the time it takes you to read this blog post, Vitess clusters will have served 10s of millions of users and 100s of millions of queries across 100s of petabytes of data.
MySQL is well-known for its support of ACID (Atomicity, Consistency, Isolation, Durability) compliance, which ensures that data is reliably stored and retrieved in a consistent manner. The transactional nature of MySQL’s database engine ensures that transactions are serializable and predictable. This means that even if the database is interrupted by a system failure or network issue, transactions are either executed in full or not at all. ACID compliance ensures that the integrity of the data is maintained at all times, guaranteeing the reliability and durability of MySQL databases.
MySQL’s semi-synchronous replication further enhances data durability by ensuring that transactions are replicated to multiple servers. Semi-synchronous replication is a mode of replication in which the master waits until at least one replica acknowledges receipt of the transaction before moving on to the next one. This feature ensures that in case of a primary node failure, the replica that has received the transaction is up-to-date and can be promoted as the new primary node without data loss.
Finally, we mount the MySQL data volume on cloud block storage, such as Amazon Web Services (AWS) Elastic Block Store (EBS) and Google Cloud Persistent Disk (GCPD), which are designed to be highly durable and reliable. Both EBS and GCPD use data replication to ensure that data is stored redundantly across multiple drives, which helps to reduce the risk of data loss due to hardware failures or other issues.
In addition, both EBS and GCPD are designed to be self-healing, meaning they can detect and repair data inconsistencies automatically without user intervention. This makes it easier to ensure that data is always available and up-to-date, even in the face of hardware failures or other issues.
Safe migrations and Revert
PlanetScale allows you to enable safe migrations, which protects against potentially destructive actions such as accidentally dropping a column or table. Safe migrations forces all schema changes to go through a deploy request, which is auditable, rate limited, and, most importantly, revertable.
If you drop the wrong column or table, Revert allows you to instantly undeploy a schema change without any data loss. This turns multi-hour outages into a couple of seconds.
Backups and validation
All PlanetScale databases have a mandatory backup schedule included with every database plan at no additional cost. Backups are essential safeguards against application bugs that delete data and can go undetected for a long time.
To ensure our backups are valid, each new mandatory backup restores from a previous backup to validate that it was taken properly and ensure that there is always at least one healthy backup before your database’s binary logs are rotated out.
You can configure additional backup and retention policies to suit your needs.
PlanetScale security
A strong security posture is one of the most important requirements of a database platform. All PlanetScale databases are encrypted at rest and in transit. It is impossible to connect to a PlanetScale database without an SSL certificate and we ensure all credentials are generated by PlanetScale to guarantee they meet the strictest complexity requirements.
If you accidentally push a PlanetScale database credential into a public GitHub repository, it will be automatically invalidated within seconds to prevent unwanted data access.
Maturity
Finally, one of the most important reasons startups and enterprises choose PlanetScale is maturity. MySQL has been serving mission-critical applications at web scale for 28 years. Layering on Vitess, which has served some of the largest sites on the planet for over a decade, you know that every code path has been battle hardened.
Database storage engines take a long time to get right. If you are trusting a storage engine that has been around for less than a decade, you are taking extreme risk with your most important asset: your data.]]></content>
        <summary><![CDATA[A detailed description of the multi-layered approach PlanetScale takes to ensure your data is safe.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 17</title>
        <link href="https://planetscale.com/blog/announcing-vitess-17" />
        <id>https://planetscale.com/blog/announcing-vitess-17</id>
        <published>2023-06-27T09:01:00.000Z</published>
        <updated>2023-06-27T09:01:00.000Z</updated>
        
        <author>
          <name>Matt Lord</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[GA announcements
The VTTablet settings connection pool feature, introduced in v15, is now enabled by default in this release. This feature simplifies the management and configuration of system settings, providing users with a more streamlined and convenient experience.
The new Topology Service-based Tablet Throttler (also known as lag throttler) is now GA and enabled by default.
MySQL compatibility improvements
Vitess now supports additional statements such as Prepare, Execute, and Deallocate along with many additional functions including comparison operators, numeric functions, date and time functions, JSON functions, and more.
The query planner has undergone several improvements resulting in more efficient query plans, especially for complex operations such as aggregation, grouping, and ordering – leading to improved query performance. The evaluation engine used when executing queries has also been significantly improved – showing over a 2× performance improvement. We also added a new virtual machine-based engine which will eventually replace the AST-based one and offer even greater performance improvements (not enabled by default in v17).
Schema tracking has also been enhanced in this release, enabling the Vitess query planner to quickly detect any changes in the database schema. This ensures that queries remain up-to-date with the latest schema modifications, improving overall data consistency.
Replication enhancements
Vitess now supports much more efficient MySQL replication within each replica set that corresponds to a Vitess shard.
We have added support for the noblob binlog_row_image type. If you are using TEXT, BLOB, or JSON columns, this can drastically reduce the overall size of your binary logs, reducing disk I/O and storage along with network I/O and related CPU overhead. Unlike the default (image type full), where each row change event contains the full BEFORE and AFTER images for all columns, with noblob these large columns are only included in the event if they are modified.
We have also added support for the new binary log transaction compression added in MySQL 8.0. Zstandard is used to compress the contents of each GTID before storing the compressed events in the binary log. This greatly reduces disk I/O and storage along with network I/O – at the cost of some extra CPU cycles when reading and writing the log.
These features can also be combined for even greater efficiency gains. Aside from the reduced hardware/service costs around disk, network, and CPU resources, these new features make it practical to retain binary logs for a longer period of time. This can aid in backups, restores, and disaster recovery-related operations.
Usability enhancements
Traffic throttling improvements
The transaction throttler can now throttle DMLs even in autocommit mode. Previously it only throttled on explicit BEGIN statements.
The transaction throttler has a new --tx-throttler-tablet-types flag to control the types of tablets influencing the throttler.
VTorc improvements
VTOrc has had many bug fixes and is now able to handle dead primary recoveries much faster than before.
VTAdmin improvements
We migrated vtadmin-web from create-react-app to Vite, which allows us to easily keep dependencies up to date and vulnerability-free.
Other improvements
You can find the full set of fixes and improvements in the release notes: https://github.com/vitessio/vitess/releases/tag/v17.0.0.
Try it out
We are very pleased with the great strides we have made with v17 and hope that you will be as well. We encourage all current users of Vitess and everyone who has been considering it to try this new release! We also look forward to your feedback, which can be provided via Vitess GitHub issues or the Vitess Slack.]]></content>
        <summary><![CDATA[In this release of Vitess, several significant enhancements have been introduced to improve the compatibility, performance, and usability of the system.]]></summary>
      </entry>
    
      <entry>
        <title>Action on your product data in real time</title>
        <link href="https://planetscale.com/blog/action-on-your-product-data-in-real-time" />
        <id>https://planetscale.com/blog/action-on-your-product-data-in-real-time</id>
        <published>2023-06-22T13:00:00.000Z</published>
        <updated>2023-06-22T13:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[PlanetScale and Hightouch cover how you can make use of one of your most valuable company assets: product data.]]></summary>
      </entry>
    
      <entry>
        <title>Datetimes versus timestamps in MySQL</title>
        <link href="https://planetscale.com/blog/datetimes-vs-timestamps-in-mysql" />
        <id>https://planetscale.com/blog/datetimes-vs-timestamps-in-mysql</id>
        <published>2023-06-22T00:03:57.138Z</published>
        <updated>2023-06-22T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[There are several different ways to store dates and times in MySQL, and knowing which one to use requires understanding what you'll be storing and how MySQL handles each type.
There are five column types that you can use to store temporal data in MySQL. They are:
DATE
DATETIME
TIMESTAMP
YEAR
TIME
Each column type stores slightly different data, has different minimum and maximum values, and requires different amounts of storage.
In the table below, you'll see each column type and their various attributes.| Column    | Data        | Bytes | Min                 | Max                 |
|-----------|-------------|-------|---------------------|---------------------|
| DATE      | Date only   | 3     | 1000-01-01          | 9999-12-31          |
| DATETIME  | Date + time | 8     | 1000-01-01 00:00:00 | 9999-12-31 23:59:59 |
| TIMESTAMP | Date + time | 4     | 1970-01-01 00:00:00 | 2038-01-19 03:14:17 |
| YEAR      | Year only   | 1     | 1901                | 2155                |
| TIME      | Time only   | 3     | -838:59:59          | 838:59:59           |

Dates, years, and times
Based on this table, the first question you must ask yourself is: "What kind of data do I need to store?"
The DATE column type
If you are storing a date only, with no time information, the choice is easy! You would use the DATE column. This column is helpful for things like birthdays, anniversaries, employee start dates, etc. The DATE column is a no-fuss affair: it stores dates.
It can store a massive range, from the beginning of the year 1000 to the end of the year 9999, in a minimal footprint, only 3 bytes.
The YEAR column type
Likewise, if you're storing a year with no date information, there is a unique column just for that: the YEAR column. This column is not widely used, but it serves its purpose exceedingly well when you need it. The YEAR column is compact at only 1 byte, but the range is not very large, from 1901 to 2155. If you need a wider range, you'd probably migrate to a signed or unsigned small integer, depending on your needs.
The TIME column type
Finally, rounding out the single-purpose column types is the TIME column. The time column stores time only, in the hhh:mm:ss format. The legal range of this column is much wider than 00:00:00 to 23:59:59 because you can use it to store intervals of time and wall-clock time. The valid range spans from -838:59:59 to 838:59:59, approximately 35 days in either direction.
If you are storing a time or duration irrespective of date, the TIME column is made for that. If you are storing a date and time together, you should use either the DATETIME or TIMESTAMP columns to keep it as one logical unit instead of splitting it into two columns.
Legal ranges of TIMESTAMP and DATETIME in MySQL
DATETIMEs and TIMESTAMPs both serve a similar purpose: storing a reference to a specific point in time. If you were tasked with storing a reference to February 14th, 2029, at 8:47am, you could use either type to satisfy that requirement!
Again, the question of whether to use a DATETIME or TIMESTAMP in MySQL depends on what you are trying to do. Neither is inherently bad, and neither is inherently good.
The first difference is that the storage size for a TIMESTAMP is 4 bytes, which is half the size of a DATETIME at 8 bytes. Storage space is cheap, but when designing table schemas, we always want to choose the smallest possible column type that fits the full range of our data. When creating your tables, there is no need to be generous!
Because the storage size of the TIMESTAMP is much smaller than the DATETIME, the range of legal values is likewise much smaller.
The 2038 problem in MySQL
A TIMESTAMP ranges from 1970-01-01 00:00:00 to 2038-01-19 03:14:17. The year 2038 might sound familiar to you! This is one incarnation of the famous Year 2038 Problem. You may have also seen it written as Y2038, Y2K38, or even The Epochalypse, which should be credited for its clever wordplay.
The 2038 issue arises in systems that calculate Unix time, defined as the seconds that have passed since the Unix epoch at 00:00:00 UTC on January 1, 1970. These systems use a signed 32-bit integer to store this time, which can only hold integer values between -2^31 and 2^31 - 1. The maximum timestamp that can be accurately represented corresponds to 2^31 - 1 seconds after the epoch, which is 03:14:07 UTC on January 19, 2038.
For our purposes, this immediately eliminates TIMESTAMP as a contender for storing dates in the far future. If there is even a reasonable chance that you might need to store dates beyond 2038, it's best to use a DATETIME.
For now, there are instances where a TIMESTAMP is perfectly adequate! One such scenario is when you are recording the current time when an action takes place. Because we're still more than a decade away from the year 2038, it's a perfectly reasonable choice to use TIMESTAMP for columns like:
updated_at
created_at
deleted_at
archived_at
posted_at
These columns are usually populated by your application. As long as these columns record the current time at which an event happened, you don't have to worry about the 2038 problem for a long time. And surely, by then, we'll have a solution for it. (Surely, right?)
When to use DATETIME
Given the relatively narrow range of legal values for a TIMESTAMP column, you should consider a DATETIME for any datetime where you might need to populate a date outside the 1970 to 2038 range. If the field is open to user input, you'll likely want to use a DATETIME or validate that the input falls within the 1970-2038 window for a TIMESTAMP.
Timezone treatment of DATETIME versus TIMESTAMP in MySQL
The storage size and the legal range are the most significant, most obvious differences between the DATETIME and TIMESTAMP columns, but there is one slightly more pernicious difference: timezones. Timezones are the bane of almost every developer's existence, and you're not free from them in MySQL either.
The DATETIME column does absolutely nothing concerning timezones. If you write a value of 2029-02-14 08:47 into the database, you will always and forever receive a value of 2029-02-14 08:47. No matter what your server or connection timezones are set to, you'll always get 2029-02-14 08:47 back.
TIMESTAMP on the other hand, tries to "help you out" by always converting to and from UTC. From the documentation:
MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time.
Let's do a small example to prove this. We will create a table with DATETIME and TIMESTAMP columns and insert some data.CREATE TABLE timezone_test (
    `timestamp` TIMESTAMP,
    `datetime` DATETIME
);

Now we'll explicitly set our connection timezone to UTC and insert 2029-02-14 08:47:SET SESSION time_zone = '+00:00';

INSERT INTO timezone_test VALUES ('2029-02-14 08:47', '2029-02-14 08:47');

SELECT * FROM timezone_test;

-- | timestamp           | datetime            |
-- |---------------------|---------------------|
-- | 2029-02-14 08:47:00 | 2029-02-14 08:47:00 |

Looking good so far; the value we inserted is the value we got back! Let's change our session timezone to -05:00 now and see what happens:SET SESSION time_zone = '-05:00';
SELECT * FROM timezone_test;

-- | timestamp           | datetime            |
-- |---------------------|---------------------|
-- | 2029-02-14 03:47:00 | 2029-02-14 08:47:00 |

You'll notice that the timestamp column has shifted five hours, while the datetime column remains the same. In practice, this shouldn't matter too much, as hopefully your server and connection timezones are always set to UTC, but if they aren't, you might be in for a big surprise when the values you put in the database are not the values you get back out.
Hopefully this helps demystify the differences between all of the columns you can use to store temporal data in MySQL! If you'd like to watch a video on the same topic, you can find that and much more in our free MySQL for Developers course.]]></content>
        <summary><![CDATA[Learn the differences between datetimes and timestamps in MySQL, the DATE, YEAR, and TIME columns, timezones, and when to use each.]]></summary>
      </entry>
    
      <entry>
        <title>Generated Hash Columns in MySQL</title>
        <link href="https://planetscale.com/blog/generated-hash-columns-in-mysql" />
        <id>https://planetscale.com/blog/generated-hash-columns-in-mysql</id>
        <published>2023-06-15T00:03:57.138Z</published>
        <updated>2023-06-15T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[One of the hidden gems in the MySQL documentation is this note in section 8.3.6:
As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns.
We will build on this idea by creating generated hash columns for indexed lookups on large values and enforcing uniqueness across many columns. Instead of creating huge composite indexes, we'll index the compact generated hashes for fast lookups.
Before diving into generated hash columns, let's look at generated columns in general.
Generated columns in MySQL
A generated column can be considered a calculated, computed, or derived column. It is a column whose value results from an expression rather than direct data input. The expression can contain literal values, built-in functions, or references to other columns. The result of the expression must be scalar and deterministic.
To create a generated column, you give MySQL an expression, and then the database is in charge of populating the column with the result of that expression. The value in the generated column is always up to date and can never fall out of sync.
When creating a generated column, you can make it a STORED or VIRTUAL column. When using a STORED generated column, the value of the expression is written to disk as if it were a regular column. A VIRTUAL generated column is calculated at runtime every time. Performance metrics of VIRTUAL or STORED are highly dependent on the situation, but a good rule of thumb is that if it's expensive to calculate the value, store it.
The syntax for creating a generated column is:col_name data_type [GENERATED ALWAYS] AS (expr)
  [VIRTUAL | STORED] [NOT NULL | NULL]
  [UNIQUE [KEY]] [[PRIMARY] KEY]
  [COMMENT 'string']

Let's create a simple circles table that declares the diameter as a regular column and then uses generated columns to calculate the radius and area columns according to their mathematical formulas:CREATE TABLE circles (
  diameter DOUBLE,
  radius DOUBLE AS (diameter / 2),        -- Generated radius
  area DOUBLE AS (PI() * POW(radius, 2))  -- Generated area
);

Both columns are VIRTUAL as that is the default.
Now if we insert 3 different diameters and then read the table back, we'll see that all columns are populated with values.INSERT INTO circles (diameter) VALUES(1), (5), (10);

SELECT * FROM circles;

-- | diameter | radius | area               |
-- |----------|--------|--------------------|
-- |        1 |    0.5 | 0.7853981633974483 |
-- |        5 |    2.5 | 19.634954084936208 |
-- |       10 |      5 |  78.53981633974483 |

Now that we know how generated columns work, let's move on to making generated hash columns.
Generated hash columns for strict equality lookups
A generated hash column is nothing special. It's merely a generated column in which the value generated is a hash of something else. Usually, this is an MD5 hash, but it could be a CRC32, SHA256, or any other hash that suits your needs. It must be noted that, importantly, this technique has nothing to do with securely storing passwords or other sensitive information. Securely storing sensitive information is an entirely separate topic. We're using hashing functions to create very small, deterministic results for speedy lookups, not to protect information.
The first scenario where you might want a generated hash column is when you need to do a strict equality lookup on a value that is too large to add a B-tree index. Any B-tree indexes added to BLOB or TEXT columns require a prefix because they are too large to be indexed in their entirety. Again, from the MySQL documentation:
Rather than testing for equality against a very long text string, you can store a hash of the column value in a separate column, index that column, and test the hashed value in queries. (Use the MD5() or CRC32() function to produce the hash value.)
An example of a column that might need this is a url column. Since URLs can be incredibly long, you might store them in a TEXT column to adequately accommodate the full possible length. To perform a strict equality, indexed lookup on this column, we'll create a generated hash in an example table called visits.CREATE TABLE visits (
  url TEXT,
  url_md5 CHAR(32) AS (MD5(url)),
  -- [other columns...]

  KEY(url_md5)
);

In this table, we've created the hash and put an index on that hash. If we were to look up the value for https://planetscale.com, we could take advantage of the index by looking up the hash instead of the URL:SELECT * FROM visits WHERE url_md5 = MD5("https://planetscale.com");

-- | url                     | url_md5                          |
-- |-------------------------|----------------------------------|
-- | https://planetscale.com | fccb2478f30ecea53d4f2f294b5e891a |

Alternatively, you could calculate the hashed value on the application side instead of MySQL, resulting in the same outcome.SELECT * FROM visits WHERE url_md5 = "fccb2478f30ecea53d4f2f294b5e891a";

-- | url                     | url_md5                          |
-- |-------------------------|----------------------------------|
-- | https://planetscale.com | fccb2478f30ecea53d4f2f294b5e891a |

I prefer to let MySQL do the hashing, making the queries more readable.
You may be worried about hash collisions using this strategy. While the odds of a randomly occurring MD5 collision are vanishingly small, they are not strictly zero. The odds are much higher if you use a different algorithm like CRC32.
To work around every possibility of a collision, no matter how unlikely, you can treat the MD5 hash as a redundant condition. A redundant condition is a condition that does not change the results of the query but does provide MySQL with the option to choose a different execution path, usually an index-assisted path.
Using our example from earlier, we'll add another condition on url alone to eliminate any potential hash collisions but still use the url_md5 index.SELECT * FROM visits
  WHERE
  url_md5 = MD5("https://planetscale.com")  -- This allows MySQL to use the index
  and
  url = "https://planetscale.com"           -- This eliminates any MD5 hash collisions

-- | url                     | url_md5                          |
-- |-------------------------|----------------------------------|
-- | https://planetscale.com | fccb2478f30ecea53d4f2f294b5e891a |

We can make one more optimization with the current setup before we move on to hashing multiple columns.
Storing MD5 hashes in binary columns
Every character column in MySQL is defined with a character set and a collation. The character set determines what characters are allowed to go in the column, and the collation is a set of rules that determines how those characters compare to each other.
In our example above, we defined the url_md5 column as a CHAR(32) character column. We're not interested in the character aspect of the MD5 hash, i.e., we won't be sorting the hashes or doing partial string matches. We're only interested in strict equality checks against a sequence of bytes. Since that is true, storing the data in a character representation is unnecessary. Storing the value of the hash as a BINARY(16) is more efficient than storing it as a CHAR(32).
To store the values as a binary string of data, we need to use the UNHEX function to convert the characters into a sequence of bytes.CREATE TABLE visits (
  url TEXT,
  url_md5 BINARY(16) AS (UNHEX(MD5(url))), -- Binary instead of character
  -- [other columns...]

  KEY(url_md5)
);

We need to carry this change through to the query itself as well, by using the UNHEX function there too:SELECT * FROM visits WHERE url_md5 = UNHEX(MD5("https://planetscale.com"));

-- | url                     | url_md5                          |
-- |-------------------------|----------------------------------|
-- | https://planetscale.com | FCCB2478F30ECEA53D4F2F294B5E891A |

Even though url_md5 is now a binary column, we still see the results as a string of characters! If this is the case for you, your client is likely using the --binary-as-hex option to give you a more human-readable display. You can use the --skip-binary-as-hex option to see the raw bytes.
We've reached the point of maximum performance for a single hash column. Let's look at the use cases for hashing multiple columns into one.
Generated hashes across multiple columns
The same basic principle applies to hashing multiple columns as hashing single columns: compact indexes can be faster than wide indexes. We can use the compact hash for strict equality searches or to enforce uniqueness across multiple columns at once.
Composite indexes in MySQL can consist of up to 16 columns, which is rarely necessary! Composite indexes have many advantages over the hashing strategy, primarily because each piece of the composite index can be used, provided you work left to right and don't skip any columns. (For an in-depth guide, see our video on composite indexes in MySQL.) However, a hash can be exactly what you're looking for in certain circumstances.
To create a composite hash, we first need to generate a concatenated string to hash. We'll do this using MySQL's CONCAT_WS function.
The WS in CONCAT_WS stands for "with separator," allowing us to define a separator to be used in between the column values. If we were to use the standard CONCAT function, we might end up with ambiguous concatenated values:SELECT CONCAT("a", "b", "c"), CONCAT("ab", "c");

-- | CONCAT("a", "b", "c") | CONCAT("ab", "c") |
-- |-----------------------|-------------------|
-- | abc                   | abc               |

The CONCAT function also doesn't handle NULL values very elegantly, turning the result completely NULL.SELECT CONCAT("a", "b", "c"), CONCAT("ab", NULL, "c")

-- | CONCAT("a", "b", "c") | CONCAT("ab", NULL, "c") |
-- |-----------------------|-------------------------|
-- | abc                   |                         |

The CONCAT_WS function solves both of these issues:SELECT CONCAT_WS("|", "a", "b", "c"), CONCAT_WS("|", "ab", "c");

-- | CONCAT_WS("|", "a", "b", "c") | CONCAT_WS("|", "ab", "c") |
-- |-------------------------------|---------------------------|
-- | a|b|c                         | ab|c                      |

SELECT CONCAT_WS("|", "a", "b", "c"), CONCAT_WS("|", "ab", NULL, "c");

-- | CONCAT_WS("|", "a", "b", "c") | CONCAT_WS("|", "ab", NULL, "c") |
-- |-------------------------------|---------------------------------|
-- | a|b|c                         | ab|c                            |

Using the CONCAT_WS function, we can now create a composite, generated hash. Let's make an addresses table that stores CASS standardized U.S. addresses.CREATE TABLE `addresses` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `primary_line` varchar(255),
  `secondary_line` varchar(255),
  `urbanization` varchar(255),
  `last_line` varchar(255),

  PRIMARY KEY (`id`)
)

Each standardized address consists of the primary and secondary lines, the urbanization, and the last line. Here is a random sampling of what this table looks like:| id | primary_line          | secondary_line    | urbanization      | last_line               |
|----|-----------------------|-------------------|-------------------|-------------------------|
|  1 | 457 HAZELWOOD CV      | KATHLEEN LAMARRE  |                   | COPPELL TX 78759-2042   |
|  2 | 5322 MUSKET RDG       | MARK E STEKLE     |                   | AUSTIN TX 75019-2042    |
|  3 | M543 PASEO DEL MONTE  |                   | URB MONTE CLARO   | BAYAMON PR 00961-5806   |
|  4 | PO BOX 990            | 3-D SIGNS         |                   | CYPRESS TX 77410-0852   |
|  5 | 949 CALLE PORTO MAYOR | STE D3            | URB PORTOBELLO    | TOA ALTA PR 00953-5407  |
|  6 | 621 CARR 382          |                   | PARC SOLEDAD      | MAYAGUEZ PR 00682-7603  |
|  7 | 1602 CIMARRON TRL     | TEN C ENTERPRISES |                   | HURST TX 76053-3921     |
|  8 | 45C CALLE REINA       |                   | PARC CARMEN       | VEGA ALTA PR 00951-5806 |
|  9 | 1910 ROSEMONT ST      | CABILLO DANIEL    |                   | MESQUITE TX 75149-1549  |
| 10 | 110 N FM 3083 RD      | ATTN PROPERTY TAX |                   | CONROE TX 00738-1866    |
| 11 | C2 CALLE B            |                   | VILLAS DE CAPARRA | BAYAMON PR 00959-7605   |
| 12 | 18307 STEDMAN DR      | 3330 TRADITION    |                   | DALLAS TX 75252-5745    |
| 13 | 291 CALLE ZIRCONIA    |                   | URB COSTA BRAVA   | ISABELA PR 00662-6307   |
| 14 | 7206 INDIAN DIVIDE RD | FAMILY TRUST      |                   | SPICEWOOD TX 79169-1646 |
| 15 | 4992 CALLE HIGUERILLO |                   | URB FAJARDO GDNS  | FAJARDO PR 75149-3088   |

To prevent a duplicate normalized address from being added to the table, we're going to create a composite hashed index of all four address parts:ALTER TABLE addresses ADD COLUMN address_hash BINARY(16) GENERATED ALWAYS AS (
  UNHEX(MD5(
    CONCAT_WS('|', primary_line, secondary_line, urbanization, last_line)
  ))
);

Now we have a compact hash that represents the entirety of the address in one discrete value:| id | primary_line          | secondary_line    | urbanization      | last_line               | address_hash                     |
|----|-----------------------|-------------------|-------------------|-------------------------|----------------------------------|
|  1 | 457 HAZELWOOD CV      | KATHLEEN LAMARRE  |                   | COPPELL TX 78759-2042   | 25EE1343804992671067D73BDD50B27E |
|  2 | 5322 MUSKET RDG       | MARK E STEKLE     |                   | AUSTIN TX 75019-2042    | 5492C5B33E3F769FD1CAA5EBB2FF56F8 |
|  3 | M543 PASEO DEL MONTE  |                   | URB MONTE CLARO   | BAYAMON PR 00961-5806   | EEE4D086755BEAB36171054F08BA0436 |
|  4 | PO BOX 990            | 3-D SIGNS         |                   | CYPRESS TX 77410-0852   | F19757F95985A0F6105AA4E781F46E14 |
|  5 | 949 CALLE PORTO MAYOR | STE D3            | URB PORTOBELLO    | TOA ALTA PR 00953-5407  | 2ADBA74C2D742420C60CAE46D42FFA7D |
|  6 | 621 CARR 382          |                   | PARC SOLEDAD      | MAYAGUEZ PR 00682-7603  | 99033D25A9C3D0109A7370CE0885EB77 |
|  7 | 1602 CIMARRON TRL     | TEN C ENTERPRISES |                   | HURST TX 76053-3921     | 153FE5967FB62138DE23F518B3600C4A |
|  8 | 45C CALLE REINA       |                   | PARC CARMEN       | VEGA ALTA PR 00951-5806 | CCF810BCD560CD0142D09EFCC2349D7F |
|  9 | 1910 ROSEMONT ST      | CABILLO DANIEL    |                   | MESQUITE TX 75149-1549  | 48BA2E351F5C58A1B976CEC55ABDB270 |
| 10 | 110 N FM 3083 RD      | ATTN PROPERTY TAX |                   | CONROE TX 00738-1866    | F98D0C9EE23D25D9B40EC4EC59DAC776 |
| 11 | C2 CALLE B            |                   | VILLAS DE CAPARRA | BAYAMON PR 00959-7605   | 8577AF223E10CFF124574D97F8C35BF9 |
| 12 | 18307 STEDMAN DR      | 3330 TRADITION    |                   | DALLAS TX 75252-5745    | B082CA1BE2B30C6B9D2101565C5F5DF2 |
| 13 | 291 CALLE ZIRCONIA    |                   | URB COSTA BRAVA   | ISABELA PR 00662-6307   | 17101B8E6420B8FC93682B53BDDBD74D |
| 14 | 7206 INDIAN DIVIDE RD | FAMILY TRUST      |                   | SPICEWOOD TX 79169-1646 | 103B79EE14A330F5D2D79BA2764F738D |
| 15 | 4992 CALLE HIGUERILLO |                   | URB FAJARDO GDNS  | FAJARDO PR 75149-3088   | 75102EA91197F96E20242EEE0264F2CA |

To prevent duplicates from entering the table, we'll add a unique index to the generated column:ALTER table addresses ADD UNIQUE INDEX (address_hash);

Again, if you're worried about hash collisions, you might use a different hashing algorithm like SHA256. Note that the column size must change to a BINARY(32).ALTER TABLE addresses ADD COLUMN address_hash BINARY(32) GENERATED ALWAYS AS (
  UNHEX(SHA2(
    CONCAT_WS('|', primary_line, secondary_line, urbanization, last_line),
    256
  ))
);

Functional indexes in MySQL
All of the work we've done so far has been using generated columns to calculate the hashes for us. As of MySQL 8.0.13, we no longer have to use a generated column to create an index on the result of a function. We can use a functional index.
Functional indexes are implemented as virtual generated columns by MySQL, so there is no performance difference. It's merely a preference!
To create a functional index without a generated column, you can create an index as you usually would, but use a second set of parenthesis to denote that it's functional:ALTER TABLE addresses ADD INDEX address_hash_functional ((  -- Note the two parentheses here!
  UNHEX(SHA2(
    CONCAT_WS('|', primary_line, secondary_line, urbanization, last_line),
    256
  ))
));

Whether you use the generated column or functional index is up to you! Regardless of which method you choose, hopefully this strategy can be helpful when building your applications!]]></content>
        <summary><![CDATA[Creating generated hash columns in MySQL for faster strict equality lookups.]]></summary>
      </entry>
    
      <entry>
        <title>Using PlanetScale with Serverless Framework Node applications on AWS</title>
        <link href="https://planetscale.com/blog/using-planetscale-with-serverless-framework-node-apps-on-aws" />
        <id>https://planetscale.com/blog/using-planetscale-with-serverless-framework-node-apps-on-aws</id>
        <published>2023-06-13T17:03:57.138Z</published>
        <updated>2023-06-13T17:03:57.138Z</updated>
        
        <author>
          <name>Matthieu Napoli</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[The Serverless Framework is great for building Node applications on AWS Lambda. The only thing missing is a serverless database. In this article, we will explore how to use PlanetScale as the database for a serverless Node application.
Prerequisites
Before deploying to AWS Lambda, you will need:
an AWS account (to create one, go to aws.amazon.com and click Sign up),
the serverless CLI installed on your computer.
While the example of this article should stay under the AWS free tier (1 million free AWS Lambda invocations per month), be advised that building on AWS can incur costs.
You can install the serverless CLI using NPM:npm install -g serverless

If you don't have NPM or want to learn more, read the Serverless documentation.
Now connect the serverless CLI to your AWS account via AWS access keys. Create AWS access keys by following this guide, then set them up on your computer using the following command:serverless config credentials --provider aws --key <key> --secret <secret>

Creating a new serverless Node application
Serverless Framework is a CLI tool that helps us create and deploy serverless applications. Its configuration is stored in a serverless.yml file, which describes what will be deployed to AWS.
To deploy a Node application, we can create a simple serverless.yml file:service: demo # name of the application

provider:
  name: aws
  runtime: nodejs18.x
  region: us-east-1

functions:
  api:
    handler: index.handler
    url: true

In the configuration above, we define a single AWS Lambda function called api, running NodeJS 18, with a public URL.
Our API handler will be a handler() function returned by index.js (learn more about handlers in the AWS Lambda documentation). Let's create the index.js file:export async function handler(event) {
  return {
    hello: 'world'
  }
}

Note that we will be using ESM features (like export and import), so let's create a package.json file with "type": "module":{
  "type": "module"
}

Our simple Node example is ready to be deployed with serverless deploy, but let's add PlanetScale into the mix first!
Connecting to PlanetScale
In your PlanetScale account, start by creating a database in the same region as the AWS application (us-east-1 in our example). Then, click the Connect button and select "Connect with: @planetscale/database". That will let us retrieve the database username and password.
To connect to the database in our code, we will use the PlanetScale serverless driver. Let's install it with NPM:npm install @planetscale/database

Now that the driver is installed, we can connect to our PlanetScale database with the connect() function:import { connect } from '@planetscale/database'

const conn = connect({
  // With the serverless driver, the host is always 'aws.connect.psdb.cloud'
  host: 'aws.connect.psdb.cloud',
  username: '<user>',
  password: '<password>'
})

export async function handler(event) {
  const result = await conn.execute('SELECT * FROM records')

  return result.rows
}

Note the following details:
We are connecting to the database outside the handler() function. This is to reuse the same connection for all HTTP requests. If we were to connect inside the handler() function, a new connection would be created for each request, which would be inefficient.
We are querying the records table. This table doesn't exist yet, we will create it below.
We don't want to store the database credentials in the code. We will use environment variables instead.
Let's update our code to use environment variables. For the sake of the example, we will also create the records table on the fly:import { connect } from '@planetscale/database'

const conn = connect({
  // With the serverless driver, the host is always the same
  host: 'aws.connect.psdb.cloud',
  username: process.env.DATABASE_USERNAME,
  password: process.env.DATABASE_PASSWORD
})

// Create the table if it doesn't exist (just for demo purposes)
// In a real application, we would run database migrations outside the function
await conn.execute('CREATE TABLE IF NOT EXISTS records (id INT PRIMARY KEY auto_increment, name VARCHAR(255))')

export async function handler(event) {
  // Insert a new record
  const queryParameter = event.queryStringParameters?.name ?? 'test'
  await conn.execute('INSERT INTO records (name) VALUES (?)', [queryParameter])

  // Retrieve all records
  const result = await conn.execute('SELECT * FROM records')

  return result.rows
}

We now need to set the DATABASE_USERNAME and DATABASE_PASSWORD environment variables. We can define them in serverless.yml and use AWS SSM to store the database password securely:provider:
  name: aws
  runtime: nodejs18.x
  region: us-east-1
  environment:
    DATABASE_USERNAME: <username-here>
    DATABASE_PASSWORD: ${ssm:/planetscale/db-password}

The database password will be stored in AWS SSM (at no extra cost) so it is not visible in the code. The ${ssm:/planetscale/db-password} variable will retrieve the value from SSM on deployment. The SSM parameter can be created with the AWS CLI via the following command:aws ssm put-parameter --region us-east-1 --name '/planetscale/db-password' --type SecureString --value 'replace-me'

# replace the `replace-me` string with the database password!

If you don't use the AWS CLI, you can also create the parameter in the AWS Console:

Our application is now ready! Let's deploy it:serverless deploy

When finished, the deploy command will display the URL of our Node application. The URL should look like this: https://<id>.lambda-url.us-east-1.on.aws/. We can open it in the browser or request it with curl:curl https://<id>.lambda-url.us-east-1.on.aws/

The response should list the records in the records table. A new record will be created every time the URL is requested. We can also provide a name parameter to change the name of the record inserted in the database:curl https://<id>.lambda-url.us-east-1.on.aws/?name=hello

Stage parameters
Besides the incredible scalability provided by the combination of AWS Lambda and PlanetScale, another benefit we get from this setup is the ability to combine Serverless Framework stages and PlanetScale branches.
We could imagine, for example, a dev stage for development and a prod stage for production. The dev stage would use a development branch in PlanetScale, while the prod stage would use the main production branch.
Using stage parameters, we can set different credentials to use to connect to PlanetScale depending on the stage:provider:
  name: aws
  runtime: nodejs18.x
  region: us-east-1
  environment:
    DATABASE_USERNAME: ${param:dbUser}
    DATABASE_PASSWORD: ${param:dbPassword}

params:
  dev:
    dbUser: <dev-username-here>
    dbPassword: ${ssm:/planetscale/dev/db-password}
  prod:
    dbUser: <prod-username-here>
    dbPassword: ${ssm:/planetscale/prod/db-password}

When deploying, we can specify the stage to deploy via the --stage option:serverless deploy --stage dev

serverless deploy --stage prod

Each stage (dev and prod) will result in entirely separate infrastructures on AWS, and each one will use its own PlanetScale branch.
That setup makes it easy to test code changes and database schema changes in a development environment that is identical to and isolated from the production environment. Once approved, schema changes can be applied to the production branch with a PlanetScale deploy request, and code changes can be deployed to production via the serverless deploy command.
Next steps
In this article, we learned how to integrate PlanetScale with Node applications built using the Serverless Framework on AWS. This gives us a completely serverless stack with extreme scalability yet simple to set up and maintain.
Now that we have a basic application running we can explore more complex topics, such as:
Creating multiple HTTP routes
Setting up a complete deployment workflow for the Node application
Dive into the PlanetScale workflow for branching databases, non-blocking schema changes, and more
Feel free to explore the PlanetScale documentation as well as the Serverless Framework documentation to learn more.]]></content>
        <summary><![CDATA[Learn how to integrate PlanetScale with Node applications built using the Serverless Framework on AWS.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale joins AWS ISV Accelerate</title>
        <link href="https://planetscale.com/blog/planetscale-joins-aws-isv-accelerate" />
        <id>https://planetscale.com/blog/planetscale-joins-aws-isv-accelerate</id>
        <published>2023-06-12T09:00:00.000Z</published>
        <updated>2023-06-12T09:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[We have good news: PlanetScale has deepened its relationship with AWS to benefit customers who use AWS services, officially joining AWS ISV Accelerate Program!
The Amazon Web Services (AWS) Independent Software Vendor (ISV) Accelerate Program is a co-sell program for AWS Partners that provides software solutions that run on or integrate with AWS. Joining AWS ISV Accelerate will allow PlanetScale to more easily connect with AWS field sellers globally. For PlanetScale customers, this means a more seamless integration, expanded access to AWS services, and increased reliability.
PlanetScale has been available in the AWS Marketplace since June 2022, and this is the next step in expanding our relationship with AWS.
PlanetScale Enterprise plans
You can purchase any of our Enterprise offerings in AWS Marketplace.
With PlanetScale Enterprise, you get all of the PlanetScale features — our branching workflow, unlimited connection pooling, sharding options, and more — deployed in an isolated AWS account.
With our Managed plan, you own the AWS account, and we manage the infrastructure for you through our shared control plane. Single-tenant is similar, but instead, we create and manage the AWS account for you. Your data is isolated from our regular shared-tenancy customers, but it runs on our own resources.
With both of these options, you get:
Private database connectivity via PrivateLink or VPC Peering
Isolated Amazon Elastic Compute Cloud (Amazon EC2) instances
The ability to deploy in any AWS Region with three Availability Zones
Robust sharding available
Resource-based pricing
SSO
Business support — upgrades available
You can learn more in our Products and plans documentation.
If you’d like more information on what this could mean for your business, contact us and we’ll be happy to assist you.]]></content>
        <summary><![CDATA[PlanetScale has deepened its relationship with AWS to benefit customers who use AWS services, officially joining the AWS ISV Accelerate Program!]]></summary>
      </entry>
    
      <entry>
        <title>Announcing the Hightouch integration</title>
        <link href="https://planetscale.com/blog/announcing-the-hightouch-integration" />
        <id>https://planetscale.com/blog/announcing-the-hightouch-integration</id>
        <published>2023-06-08T15:00:00.000Z</published>
        <updated>2023-06-08T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Hightouch offers an easy way to sync data between multiple platforms using simple connections that anyone can set up. We recently partnered with Hightouch to provide direct access to your PlanetScale account, which simplifies the process of syncing data to and from your database.
The power of Hightouch + PlanetScale
Connecting your PlanetScale database to a Hightouch workflow can be used in many different ways. Here are a few ideas to get you started:
Automatically update customer records in your CRM when they are created or changed in your database.
Send a notification to your Slack workspace when data is changed in your PlanetScale database.
Execute asynchronous operations by sending data from your PlanetScale database to a data queue like AWS SQS.
Automate paid ad targeting, suppression, and conversion uploads by syncing audiences from your database to ad platforms.
Automatically sync account and user health scores to your customer success platforms.
How to use the PlanetScale Hightouch integration
In Hightouch, you set up a sync by configuring a data source, a destination to send the data, and a sync method. You can use your PlanetScale database as either a source or destination for your syncs. Let's take a look at how to use PlanetScale in both scenarios.
PlanetScale as a source
Using a PlanetScale database as a source allows you to take the data from your database and sync it to all of your downstream tools. At the time of this writing, Hightouch natively supports over 200 destinations. In the following example, I’ll take a small table named hotels from my bookings_db database and sync it to a spreadsheet in my Google Drive account.
When configuring the connection to your PlanetScale database, you'll provide a set of credentials to establish a connection between Hightouch and your database. Then you can either use a SQL query that gathers the data to sync or specify a table to scan for changes.

Opting for "Table selector" lets you search for the specific table you want and preview the results so you know you're looking at the right data.

When connecting to Google Sheets, you can choose to create a snapshot of the data with each run or mirror the data to a specific document and sheet.

You have the ability to set automated recurring syncs based on a time interval or trigger, or run them manually via the Hightouch dashboard or API.

Once the sync runs, all of the data from the hotels table in my PlanetScale database is now mirrored in Google Sheets.

PlanetScale as a destination
You can also configure data to be sent to a PlanetScale database from Google Sheets. In the following example, I created a new sheet called "New Hotels" that I will send to my bookings_db database.

Hightouch allows you to map fields between different systems, so you aren't required to make them match. If you do have matching field names, Hightouch can automatically map the fields for you.

Once this sync runs, I can validate that the data from Google Sheets was synced to my database by running a simple SELECT query on the hotels table:

Try it out
For more information on how to use PlanetScale and Hightouch, check out the Hightouch article on our documentation portal or the PlanetScale article in the Hightouch docs. You can get started with Hightouch today, as it takes users an average of only 23 minutes to start their first sync.
Do you have any interesting ways you could use Hightouch with your PlanetScale database? Share them on Twitter and tag @PlanetScale and @HightouchData!]]></content>
        <summary><![CDATA[Learn how to sync data between PlanetScale and other platforms using Hightouch.]]></summary>
      </entry>
    
      <entry>
        <title>Using redundant conditions to unlock indexes in MySQL</title>
        <link href="https://planetscale.com/blog/redundant-and-approximate-conditions" />
        <id>https://planetscale.com/blog/redundant-and-approximate-conditions</id>
        <published>2023-06-07T00:03:57.138Z</published>
        <updated>2023-06-07T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When working with MySQL (or any database!), it's essential to understand how indexes work and how they can be used to improve the efficiency of queries. An index is a separate data structure that maintains a copy of part of your data, structured to allow quick data retrieval. Usually, this structure is a B+ Tree. We have an entire post on how indexes work if you want to go into greater detail.
Obfuscated indexes
Creating indexes is only part of the battle. You must also know how to write your queries so that you allow MySQL to use your indexes. One common mistake people make when writing queries is that they obfuscate their indexes. Obfuscating an index simply means that you're hiding the indexed value from MySQL.
Let's say you have a todos table with a created_at column that records a timestamp of when the record was created.CREATE TABLE `todos` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `created_at` (`created_at`)
)

In this table, we've added an index to the created_at column to quickly filter by that timestamp. When we query against the created_at column to find records created in the last 24 hours, we see that MySQL is using the index as we'd expect:EXPLAIN SELECT * FROM todos WHERE created_at > NOW() - INTERVAL 24 HOUR;

-- | id | type  | possible_keys | key        | key_len | ref | rows | filtered | Extra                 |
-- |----|-------|---------------|------------|---------|-----|------|----------|-----------------------|
-- |  1 | range | created_at    | created_at | 4       |     |    1 |   100.00 | Using index condition |

However, if we wrap this column in a function, we're obfuscating the column from MySQL, and it can no longer use the index.EXPLAIN SELECT * FROM todos WHERE YEAR(created_at) = 2023;

-- | id | type | possible_keys | key | key_len | ref | rows  | filtered | Extra       |
-- |----|------|---------------|-----|---------|-----|-------|----------|-------------|
-- |  1 | ALL  |               |     |         |     | 39746 |   100.00 | Using where |

By wrapping the created_at column in a YEAR function, we're asking MySQL to do an index lookup on YEAR(created_at), which is not an index MySQL maintains. It is only maintaining the created_at index.
In some cases, there are ways around index obfuscation. In this example, we could use a range scan instead of the YEAR function to obtain the same result.EXPLAIN SELECT * FROM todos WHERE created_at BETWEEN '2023-01-01 00:00:00' AND '2023-12-31 23:59:59';

-- | id | type  | possible_keys | key        | key_len | ref | rows | filtered | Extra                 |
-- |----|-------|---------------|------------|---------|-----|------|----------|-----------------------|
-- |  1 | range | created_at    | created_at | 4       |     |    1 |   100.00 | Using index condition |

By unwrapping the created_at column and changing the comparison to a range scan, we've unlocked the index and allowed MySQL to use it effectively.
Unfortunately, it's not always possible to de-obfuscate your indexes. In some scenarios, you simply cannot avoid wrapping the column in a function. In these cases, you might see if there is a redundant condition that could potentially unlock an existing index.
Redundant conditions in MySQL
A redundant condition is a condition that seems superfluous, extra, or not needed. It is a condition that can be added and removed without changing the results that MySQL returns.
Let's take a look at a contrived example to illustrate the point. In this example, we're selecting the todos with an id of less than five.SELECT
  *
FROM
  todos
WHERE
  id < 5

In this case, a redundant condition might be id < 10.SELECT
  *
FROM
  todos
WHERE
  id < 5
  and
  id < 10 -- This does... nothing

This is a redundant condition because it does not change the results! Anything with an ID of less than five necessarily has an ID of less than ten also. You can add or remove this condition, and nothing will change. It's also silly to add because it doesn't provide us any benefit.
We're going to expand our todos table definition a little bit to add due_date and due_time columns. (Storing date and time separately is usually not advised, but it helps us prove the point.)CREATE TABLE `todos` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `due_date` date NOT NULL,
  `due_time` time NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `due_date` (`due_date`),
  KEY `created_at` (`created_at`)
)

Given this table, if you want to query for todos that are due in the next day, you're stuck using the ADDTIME function:SELECT
  *
FROM
  todos
WHERE
  ADDTIME(due_date, due_time) BETWEEN NOW() AND NOW() + INTERVAL 1 DAY

We do have an index on due_date, but the index cannot be used because we're performing an operation on it (adding the time). Unlike our previous example, there is no easy way to de-obfuscate this column either since the due_time is different for every row.
We can confirm that the index is not being used by running an EXPLAIN on the previous query:| id | type | possible_keys | key | key_len | ref | rows  | filtered | Extra       |
|----|------|---------------|-----|---------|-----|-------|----------|-------------|
|  1 | ALL  |               |     |         |     | 39746 |   100.00 | Using where |

To work around this, let's add a redundant condition on due_date alone. When adding the condition, we need to make sure that it's logically impossible to change the result set, which means our redundant condition should be broader than our actual condition.
Since we're looking for todos due in the next 24 hours, we can add a condition that looks for todos due today or tomorrow. That will contain the entire subset of todos that we're looking for and a few that we're not.EXPLAIN SELECT
  *
FROM
  todos
WHERE
  -- The real condition
  ADDTIME(due_date, due_time) BETWEEN NOW() AND NOW() + INTERVAL 1 DAY
  AND
  -- The redundant condition
  due_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL 1 DAY

The redundant condition here returns a broader subset of todos than we need, but importantly it allows MySQL to use the index. Running an EXPLAIN on this query and we see that the due_date index was used:| id | type  | possible_keys | key      | key_len | ref | rows | filtered | Extra                              |
|----|-------|---------------|----------|---------|-----|------|----------|------------------------------------|
|  1 | range | due_date      | due_date | 3       |     |    1 |   100.00 | Using index condition; Using where |

MySQL will first use the index to eliminate most of the table, then the slower ADDTIME will be used to eliminate the few remaining false positives. The redundant condition is doing its job perfectly!
Domain-specific redundant conditions
Until now, we've been working with redundant conditions that logically cannot change the result set. These are nice because they are easy to reason about and require no further domain knowledge. There are scenarios where you, as a human, might have more knowledge than the database does. (For now, at least.) In those situations, you might be able to add a redundant condition that is not logically incapable of changing the output, but you know, based on your knowledge, that it won't change the output.
In the case of our todos table, let's add an updated_at column that will be populated with the timestamp of the last time the record was changed.CREATE TABLE `todos` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
  PRIMARY KEY (`id`),
  KEY `created_at` (`created_at`)
)

In this scenario, we still only have an index on created_at, but if we want to query against updated_at, we might be able to add a redundant condition based on our knowledge of the application. If, given our understanding of the application, we can be sure that created_at is always equal to or earlier than updated_at, we can use this to our advantage.
This query, which looks for records that were last modified before January 1st of 2023, will scan the entire table because there is no index on updated_at:SELECT
  *
FROM
  todos
WHERE
  updated_at < '2023-01-01 00:00:00'

This query will return the same results but uses the created_at index to eliminate records and then filters out the false positives.SELECT
  *
FROM
  todos
WHERE
  updated_at < '2023-01-01 00:00:00'
  AND
  created_at < '2023-01-01 00:00:00'

The only reason this redundant condition works is because we know that a record cannot be modified before it's created. Depending on your application, you might be able to find more examples of "domain-specific" redundant conditions.
When to use a redundant condition
The optimal indexing strategy always depends on the application, but in general, it's best to have indexes on the conditions you are frequently querying against. Redundant conditions are nice because they require no changes to the database! You can modify the query or the application generating the query, and suddenly everything gets faster. This makes them useful for queries that are only sometimes run or where indexes can't be easily added to the main conditions.
If you'd like to learn more about indexing strategies, we have 17 videos on indexes as a part of our larger course on MySQL for Developers.
If you do end up using the redundant condition strategy, please let us know on Twitter how you did it. We'd love to add more examples to this article!]]></content>
        <summary><![CDATA[Using redundant conditions as a method to unlock obfuscated indexes and improve performance in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>Optimizing query planning in Vitess: a step-by-step approach</title>
        <link href="https://planetscale.com/blog/optimizing-query-planning-in-vitess-a-step-by-step-approach" />
        <id>https://planetscale.com/blog/optimizing-query-planning-in-vitess-a-step-by-step-approach</id>
        <published>2023-06-01T15:00:00.000Z</published>
        <updated>2023-06-01T15:00:00.000Z</updated>
        
        <author>
          <name>Andres Taylor</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Introduction
In this blog post, we will discuss an example of a change to the Vitess query planner and how it enhances the optimization process. The new model focuses on making every step in the optimization pipeline a runnable plan. This approach offers several benefits, including simpler understanding and reasoning, ease of testing, and the ability to use arbitrary expressions in ordering, grouping, and aggregations.
Vitess distributed query planner
VTGate is the proxy component of Vitess. It accepts queries from users and plans how to spread the query across multiple shards and/or keyspaces. The leaf level of the VTGate query plans are "routes," which are operators that will send a query to one or more shards.
When something can be pushed into the route, it means that MySQL will do the work, and we don't have to do much work on the VTGate side. The aim is always to push as much as possible down to the much faster MySQL process. This approach helps to offload processing to MySQL and keep the VTGate layer efficient. This also reduces the risk of compatibility differences between Vitess and plain MySQL, since MySQL is doing most of the work.

Changes in the query planning model
In our query planning model, the optimization process began by determining the join order between the tables. The "join order" refers to the sequence in which tables are joined to form the final result set.
Once the join order is established, the planner proceeds with horizon planning. A “horizon” operator contains the SELECT expressions, aggregations, ORDER BY, GROUP BY, and LIMIT. If we can push the entire operator to MySQL, we don’t need to plan this at all. If we can’t delegate it to MySQL in a single piece, we have to plan these components separately.
In a database query planner where everything is evaluated locally, this part of query planning is straightforward — we add the necessary Sort/GroupBy/Limit/Project operators and we are pretty much done. Naturally, there are additional optimizations one could perform, but these would typically yield only marginal improvements to the performance of the query plan. In a distributed query planner, the cost of transmitting data means that it's essential to push down as much of these operations as possible to the data.
You can read more about how we plan grouping and aggregations while pushing down work in our Grouping and Aggregations on Vitess blog post.
In this new model for our planner, we are still performing the same optimizations as before, but we are going about it in a very different way.
In the old model, we performed the optimization more in a top-down approach — we planned the full aggregation, and all ordering needed to support it, in one go. We would start with the join order tree, and do a lot of logic, and then output a new tree that performed the correct aggregations. In between the two, most of the current state was kept in arguments, local variables, and in the stack.

In the new query planning model, every step in the optimization pipeline results in a runnable plan. This means that developers working on the planner can inspect the plan at any stage, allowing for a better understanding of the optimization process at each step. By having runnable plans at every step, it becomes easier to identify potential issues, inefficiencies, or areas where further optimization is possible.
Each step is also simpler — it's a tree transformation taking two operators as input, one being the input of the other, and producing a new subtree that replaces the two inputs.
This improvement not only simplifies the optimization process but also enhances the ability to reason about the impact of each optimization step.
Other benefits of the new process
Visualization improvements
Compared to the old model, the new query planning model offers better visualization of the optimization steps. In the old model, the current optimization state was kept in the stack and local variables, making it harder to visualize and understand the process. With the new model, each step is represented as a full query plan, which provides a clearer picture of the optimization process.
Testability
Another benefit of this model is the possibility of running both the unoptimized plan and the optimized version and comparing their results. It should not matter if we have to evaluate a WHERE predicate on the VTGate side with our excellent evalengine support for most MySQL expressions, or if we can delegate it to the underlying database. The result should be the same.
Flexibility with expressions
The new query planning model allows for arbitrary expressions to be used for ordering, grouping, and aggregations. This provides greater flexibility when crafting complex queries and enables developers to write more efficient and optimized queries. In comparison, the old model had limitations in terms of the expressions that could be used in these operations.
Example query and optimization steps
To illustrate the benefits of the new query planning model, let's examine the optimization steps while planning a query. This is done using a so-called fixed point rewriter — the planner will continue rewriting the plan tree until it stops changing.
Let’s look at an example query:SELECT u.foo, ue.bar
FROM user u JOIN user_extra ue ON u.uid = ue.uid
ORDER BY u.baz

Step 1
In the first step of planning, we have an operator tree that looks like this:Horizon
└── ApplyJoin (u.uid = ue.uid)
   ├── Route (Scatter on user)
   │   └── Table (user.user)
   └── Route (Scatter on user)
       └── Filter (:u_uid = ue.uid)
           └── Table (user.user_extra)

Everything under a route will be turned into SQL and sent to MySQL.
Step 2
In the next step, we decided that we can't push the Horizon and instead need to expand it into its components.Ordering (u.baz asc)
└── Projection (u.foo, ue.bar)
   └── ApplyJoin (u.uid = ue.uid)
       ├── Route (Scatter on user)
       │   └── Table (user.user)
       └── Route (Scatter on user)
           └── Filter (:u_uid = ue.uid)
               └── Table (user.user_extra)

The Horizon is split into an Ordering and a Projection operator.
Step 3
We continue to push things down — the Projection is split and pushed to both sides of the join, and the Ordering is sent to the left side of the join.ApplyJoin (u.uid = ue.uid)
├── Ordering (u.baz asc)
│   └── Projection (u.foo)
│       └── Route (Scatter on user)
│           └── Table (user.user)
└── Projection (ue.bar)
   └── Route (Scatter on user)
       └── Filter (:u_uid = ue.uid)
           └── Table (user.user_extra)

Step 4
Finally, we are able to push both Projection and Ordering into the Route on the LHS of the join.ApplyJoin (u.uid = ue.uid)
├── Route (Scatter on user)
│   └── Ordering (u.baz asc)
│       └── Projection (u.foo)
│           └── Table (user.user)
└── Route (Scatter on user)
   └── Projection (ue.bar)
       └── Filter (:u_uid = ue.uid)
           └── Table (user.user_extra)

So the VTGate plan is ultimately just a join. One query will be sent to the left-hand side, and for each row we get from those results, we will issue a query on the right-hand side of the join.
The two queries are:-- LHS
SELECT u.foo, u.uid, u.baz, weight_string(u.baz)
FROM `user` AS u
ORDER BY u.baz ASC


-- RHS
SELECT ue.bar
FROM user_extra AS ue WHERE ue.uid = :u_uid

Conclusion
The new query planning model in Vitess brings several advantages over the previous model, making it easier for us to understand and work with one of the most complicated parts of Vitess. With runnable plans at every step, improved visualization, and increased flexibility with expressions, we hope that this will form a design that we can grow with.
As Vitess continues to evolve, we can expect even more enhancements and optimizations to its query planning capabilities.]]></content>
        <summary><![CDATA[See how Vitess acts as a database proxy that creates an illusion of a single database when technically, the query is sent to multiple MySQL instances.]]></summary>
      </entry>
    
      <entry>
        <title>Pulling back the curtain: the new database overview page</title>
        <link href="https://planetscale.com/blog/our-new-database-overview-page" />
        <id>https://planetscale.com/blog/our-new-database-overview-page</id>
        <published>2023-05-31T09:00:00.000Z</published>
        <updated>2023-05-31T09:00:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Have you ever wondered what's actually under the hood of a PlanetScale database? Starting today, you no longer have to wonder! We just released a huge update to your database overview page in the PlanetScale dashboard.
The power of PlanetScale
Our goal at PlanetScale has always been to provide the most powerful, yet easy-to-use database that "just works." We've packed a ton of power under the hood, and with this newest release, we're pulling back the curtain to make it clear exactly what you're getting with your PlanetScale database.
To start, we've moved some elements around so you now have a complete overview of the state of your database, database health, and important metrics, all on a single page:
Recent deploy requests
Query insights graph including latency, QPS, rows read/written, and errors
All queries during the last 24 hours
If safe migrations is enabled
If the selected branch is production or development
 The "Connect" button where you previously found your connection strings is now labeled "Get connection strings". 
This view is included for every branch of your database. You can select which branch you want to view from the dropdown in the top left of the interactive diagram.
In addition, we've added some new information: load balancers, primary, replicas, and shards. Let's dive into each of these.
Load balancers
In the diagram on your database overview page, you'll see the number of load balancers included with your database as well as the number of availability zones.
Click on "Load balancers" to see a more detailed breakdown of metrics and performance: number of connections, latency, and queries per second.

Primary, replicas, and read-only regions
Underneath "Load balancers" in the diagram, you'll see it branching down to your primary database. Here, you can see which region your primary database is in, and if you click on that, we show you some more metrics for your primary database: reads/writes per second, queries per second, input/output operations per second, CPU, and memory.

Below the primary database, you'll see branches down to all replicas. Every production branch in a database comes with at least one replica. You can click on each replica to check the performance of those as well.
 This diagram reveals why we recommend using a production branch for your production workloads. Production branches include additional replicas and load balancers to handle the increased workload of your production database. 
Above each replica (0s in the image below), you can see the replication lag.
Additionally, if you have read-only regions, those will appear in the diagram as well with the label "RO".

Sharded databases
If your database is sharded, we include a breakdown of this on the overview page as well. You'll see tabs at the bottom of the diagram where you can click through your different keyspaces, if you have more than one.

Within each sharded keyspace, you'll see the total number of shards, and clicking on the diagram pulls up a list of each shard with max replication lag. You can also click into each shard, allowing you to view information about the primary and replicas for that shard, including:
Region
Availability zone
Reads/writes per second
QPS
IOPs
CPU
And memory


Check it out
If you have an existing PlanetScale database, you can head to your dashboard right now to see this new update. Remember, nothing has changed with your database, we're simply revealing what has always been under the hood. And if you haven't explored PlanetScale yet, you can sign up for an account now.
If you have any questions or feedback, don't hesitate to reach out! Head to our Contact page or find us on Twitter at @planetscale.
You can learn more about PlanetScale architecture in our Architecture documentation.]]></content>
        <summary><![CDATA[Learn about the latest updates we made to our database overview page: load balancers, shards, and more.]]></summary>
      </entry>
    
      <entry>
        <title>Increase developer productivity with Database DevOps</title>
        <link href="https://planetscale.com/blog/developer-productivity-database-devops" />
        <id>https://planetscale.com/blog/developer-productivity-database-devops</id>
        <published>2023-05-25T13:00:00.000Z</published>
        <updated>2023-05-25T13:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Discover innovative approaches to schema changes, recovery, and monitoring to enhance developer productivity.]]></content>
        <summary><![CDATA[Explore the evolution of “Database DevOps” and its impact on development workflows in this tech talk.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale is now available on the Google Cloud Marketplace</title>
        <link href="https://planetscale.com/blog/planetscale-is-now-available-on-the-google-cloud-marketplace" />
        <id>https://planetscale.com/blog/planetscale-is-now-available-on-the-google-cloud-marketplace</id>
        <published>2023-05-22T09:00:00.000Z</published>
        <updated>2023-05-22T09:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Our goal at PlanetScale is to make it as easy as possible for developers around the world to adopt a serverless database, reduce their development time, cut back on the ongoing database-related maintenance costs, and provide the best workflow for managing schema changes. In an effort to make our platform even more accessible, we’re excited to share that we are officially available on the Google Cloud Marketplace.
With our availability on Google Cloud Marketplace, we are significantly extending our reach to the wider developer community. Onboarding and engaging with PlanetScale is easier than ever, and developers can increasingly reap the benefits of heightened performance, unparalleled scalability, and substantial cost savings.
PlanetScale is taking the headache out of database management. We’re on a mission to increase developer velocity while decreasing database fragility. Reduce your application’s time to market, manage your database infrastructure with less time and money invested, and enjoy the power of Vitess behind the scenes — the same open-source project used by the likes of GitHub, HubSpot, Slack, and Square to serve millions of queries per second.
Questions? We’d love to hear from you. Fill out our contact form, and we’ll be in touch. See you on the Google Cloud Marketplace!]]></content>
        <summary><![CDATA[We’re excited to announce that PlanetScale is now available on the Google Cloud Marketplace.]]></summary>
      </entry>
    
      <entry>
        <title>Character sets and collations in MySQL</title>
        <link href="https://planetscale.com/blog/mysql-charsets-collations" />
        <id>https://planetscale.com/blog/mysql-charsets-collations</id>
        <published>2023-05-18T00:03:57.138Z</published>
        <updated>2023-05-18T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Character sets and collations are fundamentally important concepts to understand when dealing with string columns in MySQL. A slight misunderstanding of either can lead to poor performance or unexpected errors when inserting data.
A character set defines the characters allowed to go in a column. A collation is a set of rules for comparing those characters. Each character set can have multiple collations, but a collation may only belong to one character set.
Character sets in MySQL
MySQL supports a wide range of character sets, which you can view by selecting from the information_schema database.SELECT * FROM information_schema.character_sets ORDER BY character_set_name

This will list out all of the character sets, along with their default collations. Every character set has one default collation.| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                     | MAXLEN |
|--------------------|----------------------|---------------------------------|--------|
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| binary             | binary               | Binary pseudo charset           |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| utf8mb4            | utf8mb4_0900_ai_ci   | UTF-8 Unicode                   |      4 |

At the bottom of this table, you'll notice two character sets described as UTF-8 Unicode. The utf8 charset has a MAXLEN of 3 while the utf8mb4 has a MAXLEN of 4. What's being described here is the maximum allowed length, in bytes, per character.
According to the UTF-8 spec, each character is allowed four bytes, meaning MySQL's utf8 charset was never actually UTF-8 since it only supported three bytes per character. In MySQL 8, utf8mb4 is the default character set and the one you will use most often. utf8 is left for backwards compatibility and should no longer be used.
How do you define a character set?
There are a few ways to define the character set of a column. If you don't specify a character set at the table or column level, the server default of utf8mb4 will be applied (unless you've explicitly declared a different server or database default).
We can prove this by creating a table with no character set information and then reading it back:CREATE TABLE no_charset (
    my_column VARCHAR(255)
);

SHOW CREATE TABLE no_charset;

The resulting CREATE TABLE statement shows that the default charset and collation have been applied.CREATE TABLE `no_charset` (
  `my_column` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Defining at the table level
Instead of allowing the database or server default to apply, you can explicitly set the character set at the table level by using the CHARSET=[charset] notation. Here, we'll create a table where all character columns have the latin1 charset:CREATE TABLE `no_charset` (
  `my_column` VARCHAR(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Defining at the column level
Finally, you can set the character set at the column level. This is the most specific and overrides any table-level settings.CREATE TABLE `mixed_collations` (
    `explicitly_set` VARCHAR(255) CHARACTER SET latin1,
    `implicitly_set` VARCHAR(255)
);

Reading this table back with the SHOW CREATE TABLE statement makes it clear that the table is utf8mb4, but the explicitly_set column is latin1:CREATE TABLE `mixed_collations` (
  `explicitly_set` VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `implicitly_set` VARCHAR(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

This is the most specific way to declare a character set.
A column-level specification will override a table-level specification, a table-level specification overrides the database default, and a database-level charset overrides the server default.
Collations in MySQL
While character sets define the legal characters that can be stored in a column, collations are rules that determine how string comparisons are made. If you are sorting or comparing strings, MySQL uses the collation to decide the order and whether the strings are the same.
You can show all the collations by querying the information_schema table again. There are a lot of collations, so we'll restrict the results to only collations that apply to the utf8mb4 charset.SELECT
  *
FROM
  information_schema.collations
WHERE
  character_set_name = 'utf8mb4'
ORDER BY
  collation_name

This query will display all the collations, related character set names, whether they are default, and a few other pieces of information. Notice that for each character set, there is one default collation. For example, utf8mb4_0900_ai_ci is the default collation for the utf8mb4 character set.| COLLATION_NAME             | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
|----------------------------|--------------------|-----|------------|-------------|---------|---------------|
| utf8mb4_0900_ai_ci         | utf8mb4            | 255 | Yes        | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_ci         | utf8mb4            | 305 |            | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_cs         | utf8mb4            | 278 |            | Yes         |       0 | NO PAD        |
| utf8mb4_0900_bin           | utf8mb4            | 309 |            | Yes         |       1 | NO PAD        |
| utf8mb4_bin                | utf8mb4            |  46 |            | Yes         |       1 | PAD SPACE     |
| utf8mb4_croatian_ci        | utf8mb4            | 245 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_cs_0900_ai_ci      | utf8mb4            | 266 |            | Yes         |       0 | NO PAD        |
| utf8mb4_cs_0900_as_cs      | utf8mb4            | 289 |            | Yes         |       0 | NO PAD        |
| utf8mb4_czech_ci           | utf8mb4            | 234 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_danish_ci          | utf8mb4            | 235 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_da_0900_ai_ci      | utf8mb4            | 267 |            | Yes         |       0 | NO PAD        |
| utf8mb4_da_0900_as_cs      | utf8mb4            | 290 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4            | 256 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_as_cs   | utf8mb4            | 279 |            | Yes         |       0 | NO PAD        |
| [omitted for brevity]      | ...                | ... |            | ...         |     ... | ...           |
| utf8mb4_vietnamese_ci      | utf8mb4            | 247 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci      | utf8mb4            | 277 |            | Yes         |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs      | utf8mb4            | 300 |            | Yes         |       0 | NO PAD        |
| utf8mb4_zh_0900_as_cs      | utf8mb4            | 308 |            | Yes         |       0 | NO PAD        |

Collations follow a naming scheme whereby the character set forms a prefix, and the suffix combines attributes of the collation.
Here is a breakdown of a few of the suffixes you might see:| Suffix | Meaning            |
|--------|--------------------|
| _ai    | Accent-insensitive |
| _as    | Accent-sensitive   |
| _ci    | Case-insensitive   |
| _cs    | Case-sensitive     |
| _ks    | Kana-sensitive     |
| _bin   | Binary             |

Let's take the default utf8mb4 collation of utf8mb4_0900_ai_ci and expand it slightly.
The utf8mb4 part declares it belongs to the utf8mb4 charset. The 0900 references the UCA 9.0.0 weight keys. _ai means the collation is accent-insensitive, while _ci declares it case-insensitive.
This allows us to confidently answer the question, "are string comparisons case-sensitive?" The answer, of course, is: it depends! It depends on the collation.
Let's prove this by explicitly casting strings using the COLLATE keyword.SELECT "MySQL" COLLATE utf8mb4_0900_ai_ci = "mysql" COLLATE utf8mb4_0900_ai_ci;

Running this statement gives us a value of 1, meaning MySQL treats the two strings as equal. If we were to rerun it with a case-sensitive collation, we'd expect (and obtain!) a different result:SELECT "MySQL" COLLATE utf8mb4_0900_as_cs = "mysql" COLLATE utf8mb4_0900_as_cs;

This query returns a value of 0, meaning MySQL sees these strings as unique because they are cased differently.
The same logic holds for accent sensitivity. With an accent-insensitive collation, résumé and resume would be deemed identical because the accents would be ignored.
How do you define a collation?
Like character sets, collations can be set at both the table and column levels. If a collation is not explicitly defined, MySQL uses the default collation of the character set.
To define a collation at the table level, you can use the COLLATE clause in the CREATE TABLE statement. For example, you can create a table where all character columns use the utf8mb4_bin collation:CREATE TABLE table_with_collation (
    my_column VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

If you want to define the collation at the column level, you can do so in the column definition. The following example creates a table with two columns: explicitly_set uses the utf8mb4_general_ci collation, and implicitly_set uses the default collation from the utf8mb4 charset, which is utf8mb4_0900_ai_ci.CREATE TABLE table_with_collation (
    `explicitly_set` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    `implicitly_set` varchar(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

You can also change the collation of a column in an existing table using the ALTER TABLE statement:ALTER TABLE table_with_collation
    CHANGE `explicitly_set` `explicitly_set` varchar(255)
        CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Summary
Understanding character sets and collations is fundamental when dealing with string data in MySQL. A character set defines the legal characters that can be stored in a column, while a collation determines how string comparisons are made.
A character set can be defined at the column level, the table level, or it can be inherited from the database or server default. The most specific level (column > table > database > server) is used.
A collation can be defined at the column level, the table level, or it can be inherited from the character set default. Again, the most specific level is used.
The character set and collation of a column affect how data is stored and how it is compared and sorted. Be mindful of these settings to ensure the correct behavior and optimal performance when designing your database.
If you are unsure which character set or collation to use, the MySQL default utf8mb4 character set and its default utf8mb4_0900_ai_ci collation are usually good choices. They support all Unicode characters and provide case-insensitive and accent-insensitive comparisons.]]></content>
        <summary><![CDATA[Understanding the differences between character sets and collations in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>MariaDB vs. MySQL</title>
        <link href="https://planetscale.com/blog/mariadb-vs-mysql" />
        <id>https://planetscale.com/blog/mariadb-vs-mysql</id>
        <published>2023-05-16T13:00:00.000Z</published>
        <updated>2023-05-16T13:00:00.000Z</updated>
        
        <author>
          <name>Matt Lord</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MariaDB began as a fork of MySQL. Since then, it has evolved into an entirely separate database. In this Tech Talk, we go over some of the main differences between MariaDB and MySQL, why you might choose one over the other, scaling options, and how Vitess fits into the picture.]]></content>
        <summary><![CDATA[Learn the main differences between MariaDB and MySQL, why you might choose one over the other, scaling options, and how Vitess fits into the picture.]]></summary>
      </entry>
    
      <entry>
        <title>Backward compatible database changes</title>
        <link href="https://planetscale.com/blog/backward-compatible-databases-changes" />
        <id>https://planetscale.com/blog/backward-compatible-databases-changes</id>
        <published>2023-05-09T16:10:00.000Z</published>
        <updated>2023-05-09T16:10:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[A common question I often hear is, “Should I make my application code changes before, after, or at the same time as my database schema changes?”
The reality is that neither our application nor database live in a bubble. In almost every case, you should never couple your database schema and application code changes together. While shipping them simultaneously might seem like a great idea, it often leads to pain for you and your users. There are five main reasons for this:
Risk: By deploying changes to two critical systems at once, such as your database and application, you double the risk of something going wrong.
Deployments: It’s impossible for application code and database schema changes to deploy together atomically. If they are ever dependent on each other going out simultaneously, the application will error briefly until the other catches up.
Migration time: As data size grows, migrations can take longer. It can go from 30 seconds to a few hours to even more than a day! You don’t want the app deployment blocked for this.
Blocking the development pipeline: If something goes wrong with the database schema change when coupled together, the deployment of the application is now blocked. A single change can stop the pipeline from going into production until it’s fixed.
Best practices: Having them separate forces database best practices for ensuring backward compatible changes, which we will discuss in this blog post.
So how should you change your application code when it also requires changes in the database schema?
This blog post will answer this question and break down the steps you need to follow to ensure you are safely making changes to your database and ensuring no downtime or disruptions for your users. A word of advice: This process can feel complex the first time you do it, but after some practice, it gets easy and you’ll be able to move quickly and confidently.
While PlanetScale can help make safe schema changes alongside the pattern, the pattern can apply to any relational database schema changes.
Different types of database schema changes
Some of the common types of database schema changes are:
Adding a table or view
Adding a column
Changing an existing column, table, or view
Removing an existing column, table, or view
Generally, adding a table, column, or view is low-risk and doesn't require much, other than deploying the schema change before your application code that might use the change. You can read more in the schema change documentation about handling each type of change.
Things are riskier when changing or removing a column or table. This is where backward compatible changes are essential. The most commonly used pattern is expand, migrate, and contract. You might see this pattern under similar names, like parallel or backward compatible changes. I like the “expand, migrate, and contract” name because it visually describes what it is doing. Let’s break that down.
The expand, migrate, and contract pattern
Backward compatible changes should be used for any operation that touches schema your production application is already using. This ensures that at any step of the process, you can rollback without data loss or significant disruptions to users. This greatly reduces the risk and allows you to move faster and with confidence.
For example, this applies when you are:
Renaming an existing column or table
Changing the data type of an existing column
Splitting and other modifications to the data of an existing column or table
If you only add a column or table that does not affect the existing schema, you do not need to follow this pattern.
Here’s a helpful diagram to help you think about the pattern and where the changes are occurring through the steps:

Let’s break down the pattern.
Expand
Step 1 - Expand the existing schema
The first step in the pattern is to add to the schema. You will create a new column or table, depending on the change needed in the application.
As I describe in my previous blog post about safely making schema changes, you should consider making smaller, incremental changes to your database schema to ensure your changes are safe. Big changes are riskier.
In most cases, adding a new column will not affect your existing application if you make the column nullable and/or provide default values. If you don’t do this, when the application creates a new row, you could potentially cause a database error.
You can test locally or with a database branch alongside your application and then deploy the changes.
If you are using PlanetScale branching, you can make the change in a development branch, open a deploy request, and deploy it to your production database, which will increase the safety of each step.
Step 2 - Expand the application code

The second step in the pattern is to update the application code to write to both the old and new schema. Before this step, your application only wrote to the old column or table.
You want it to write to both the old and new schema because you want to make sure it can safely write to the new schema without error. If you plan on changing how the data is stored — e.g., if you want to store the user ID instead of the username — you will write the new form of the data to the new column or table while continuing to write the old form of the data to the old column or table. If there are issues, the application can continue to write and read from the old column without any user impact.
After you deploy this change, the application should continue to behave as before.
Migrate
Step 3 - Migrate the data
The third step in the pattern is data migration. At this step, you know that your new schema is successfully writing data to it, but what about the data that came before you started writing to both the old and new schema?
You will need to run a data migration script that migrates the data from before the double writes started to backfill the data from the old schema to the new schema. There are two ways you might have to handle this situation:
If you are making any changes to the data, you must include this in the script. For example, if you are splitting up a string based on a product requirement, this is when you would do it before storing it in the new column.
If you are only moving existing data and making no changes, then you can have the script insert the exact data with no mutations.
Also, if there is a lot of data to move, consider spreading it over an extended period using background jobs. This will prevent it from affecting your production database performance and users.
If it isn’t clear what changes are needed to the data, this is when it would be a good idea to think it through. It can be a pain to go back and change the data again in your new schema.
Step 4 - Migrate the application code

The fourth step is to update the application code to read from the new schema. Before this step, your application was reading from only the old schema.
Before you deploy the application code changes in this step, it is the last time you can confirm that the data migration and new schema are accurate, not missing data, and ready for production read traffic. Once you deploy this change, if you notice serious production issues, it could have some user impact depending on the issue. For this reason, consider testing the performance at this step.
A nice benefit of this approach is that you can always rollback the application code after it is deployed since the schema is still in a backward compatible state.
For performance monitoring, you can use Insights or another application performance monitoring tool to make sure everything is working as expected.
Contract
Step 5 - Contract the application

In the fifth step, you will start contracting your changes and update the application to only write to the new schema.
You are ready to deploy this step when you have confirmed that everything is working as expected in the production application, and you are ready to stop writing data to the old schema.
Step 6 - Contract the schema
In the sixth step, it is finally time to delete the old column or table. Your application should work as expected for both write and read traffic, and you feel confident in safely deleting the data without permanent data loss.
This is optional, but if you do have concerns about another team or application that might be using this column, you have two options:
If it is a column you are changing, make the column invisible in MySQL from select * queries.
If it is a column or table, you can change the name of the column or table so if it is used, there is an error but no data loss. (Note: You cannot do a rename without creating a new column in PlanetScale, but PlanetScale does warn you if a table has been recently queried in a deploy request.)
You’ve reached the end of the pattern. Congrats! You made a much safer schema change than trying to deploy it simultaneously with your code.
Example walkthrough
Let’s see the expand, migrate, and contract pattern in action with the following example:
I have an application that keeps track of GitHub stars across repositories, a nice vanity metric that can be useful for different signals. Before I make any changes, I have a repo table with information about different GitHub repos with columns such as:
id
repo_name
organization
And others
I also have a star table with information about stars for GitHub repos:
id
repo_name
organization
star_count
When I first created the application, I made these separate tables, but now I want to combine them so I only have to maintain one table. It can make queries easier to write and store less data. All of the data in the star table is also in the repo table, except star_count.
So, I need to do two things:
Create a new star_count column in my repo table and migrate the data.
Delete the star table without any data loss or disruption to my users.
Step 1 - Expand the existing schema
The first step will be to expand the schema and add a star_count column to the repo table of the production database.ALTER TABLE repo
ADD COLUMN star_count INT;

After the first step, the repo table looks like:| id | repo_name   | organization | ... | star_count |
|----|-------------|--------------| ... | ---------- |
|  1 | vtprotobuf  | planetscale  | ... |            |
|  2 | beam        | planetscale  | ... |            |
|  3 | database-js | planetscale  | ... |            |

And the star table remains unchanged:| id | repo_name   | organization | star_count |
|----|-------------|--------------| ---------- |
|  1 | vtprotobuf  | planetscale  | 637        |
|  2 | beam        | planetscale  | 1837       |
|  3 | database-js | planetscale  | 854        |

Step 2 - Expand the application code
Since the star_count column now exists in the repo table, I can update my application code to write to star_count in both the repo and star tables whenever I’m writing to the database in my application code. The application code for this depends on your database client or ORM. Once this is tested locally or in a database branch to confirm that writes are successfully working, you can deploy the code to production.
Step 3 - Migrate the data
Since I am moving the data from the old star_count column in the star table to the new star_count column in the repo table, I need to write a script to backfill the column.
There’s not much data, so it is safe not to use background jobs for the inserts. Once it is done running, I will run some test queries against the database and spot-check the data to ensure nothing looks wrong that I might need to fix.
After the migration script is done, the repo table looks like:| id | repo_name   | organization | ... | star_count |
|----|-------------|--------------| ... | ---------- |
|  1 | vtprotobuf  | planetscale  | ... | 637        |
|  2 | beam        | planetscale  | ... | 1837       |
|  3 | database-js | planetscale  | ... | 854        |

Step 4 - Migrate the application code
Now that all the data is in the new column, I can update my application code to read only from the new column in the repo table. I will make sure I have tests to ensure the behavior is as expected because once I deploy, users using the production application will get data from the new column.
After I deploy, I will check out my database performance metrics to ensure everything is working as expected.
Step 5 - Contract the application
If everything looks good, I can now remove the double write to the old and new columns and only write to the new column. Again, I will test everything out and deploy it. Since I used the expand, migrate, and contract pattern, users have fully migrated to the new column and never experienced downtime or failed queries during the switch.
Step 6 - Contract the schema
Lastly, after a few days of no issues with the change, I will delete the star table since I no longer have any reads or writes going to it.
Safely making database schema changes
This pattern is part of a few techniques for safely making database schema changes. In my previous blog post, you can read more about other techniques and some PlanetScale features to ensure you safely but quickly ship needed database changes. In the PlanetScale docs, you can also see how to make other database changes and the associated risks.]]></content>
        <summary><![CDATA[Learn about safely using the expand, migrate, and contract pattern to make database schema changes without downtime and data loss.]]></summary>
      </entry>
    
      <entry>
        <title>Why isn’t MySQL using my index?</title>
        <link href="https://planetscale.com/blog/why-isnt-mysql-using-my-index" />
        <id>https://planetscale.com/blog/why-isnt-mysql-using-my-index</id>
        <published>2023-05-04T00:03:57.138Z</published>
        <updated>2023-05-04T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[One of the most frustrating experiences when dealing with databases is when you've designed the perfect index, but MySQL still doesn't use it. There are several reasons why this could be the case, and in this article, we'll explore some of the most common ones.
Throughout this article, we'll be working with a very simple people table that looks like this:CREATE TABLE `people` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50) NOT NULL,
  `last_name` varchar(50) NOT NULL,
  `state` char(2) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `first_name` (`first_name`),
  KEY `state` (`state`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

We'll be adding and dropping keys throughout to show different scenarios, but this is a good starting place.
Determining what index is being used
Before you can determine why your index isn't being used, you must first determine that your index isn't being used. You can run an EXPLAIN on your query to understand what indexes are considered and which index is ultimately used.EXPLAIN select * from people where first_name = 'Aaron';

Running the EXPLAIN gives you a look at how MySQL is planning to execute the query:| id | table  | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra |
|----|--------|------|---------------|------------|---------|-------|------|----------|-------|
|  1 | people | ref  | first_name    | first_name | 202     | const |  180 |   100.00 |       |

We can see that the first_name index is the only index considered (possible_keys) and that it is also the index that is chosen (key).
 Check out this article for more a more in-depth guide on how to read an EXPLAIN output. 
Our index on first_name was considered, and it was chosen. These are separate pieces of information, both of which are valuable! Before your index can be chosen, it must first be considered. MySQL's query optimizer, responsible for determining the best way to execute a query, looks at the query and the available indexes and decides which indexes are applicable. Having decided on the indexes that apply to the query, it must then choose between those indexes as to which one is the most efficient.1
Now that we know how to determine what index is (or isn't!) being used, let's look at some of the reasons why your index might not be used.
Another index is better
In our example above, only one possible index could satisfy the query, so the optimizer doesn't have to choose which one is best. In the case where multiple indexes might work, the optimizer must make a decision between multiple viable options.
Consider the following query, which searches for people named "Aaron" that live in Texas:SELECT * FROM people
    WHERE first_name = 'Aaron'
    AND state = 'TX'

Running an EXPLAIN on this query, we see two possible indexes that could be used: first_name and state.| id | table  | type | possible_keys    | key        | key_len | ref   | rows | filtered | Extra       |
|----|--------|------|------------------|------------|---------|-------|------|----------|-------------|
|  1 | people | ref  | first_name,state | first_name | 202     | const |  180 |    50.00 | Using where |

In this case, the optimizer has decided that the first_name index is the best choice because the first_name index is more selective.
Selectivity and cardinality of indexes
Selectivity and cardinality go hand in hand but differ slightly. Cardinality refers to the number of distinct values in a particular column, while selectivity is a percentage of how unique those values are.
To calculate the cardinality of a column, you can use the COUNT(DISTINCT column) function:SELECT
  COUNT(DISTINCT first_name) as first_name,
  COUNT(DISTINCT state) as state
FROM
  people

Running this query will tell you how many unique values are in each column:| first_name | state |
|------------|-------|
|       3009 |     2 |

We can see here that the first_name column has way more unique values than the state column. This is interesting information, but it's not that helpful without another piece of information: the total number of rows. Knowing that there are 3,009 distinct values in the first_name column is interesting, but we have no idea if that's a relatively high or low number compared to the whole table!
This is where selectivity comes into play. Selectivity is a measure of how unique the values in a column are. The higher the selectivity of an index, the better it is for optimizing query performance.
To calculate the selectivity of a column, you can use the formula COUNT(DISTINCT column) / COUNT(*). This will give you a decimal between zero and one that represents how unique the values in this column are on average.SELECT
  COUNT(DISTINCT first_name) / COUNT(*) as first_name,
  COUNT(DISTINCT state) / COUNT(*) as state
FROM
  people

By running this query, you can see that the first_name column has a higher selectivity than the state column. The state column has such poor selectivity that it rounds down to zero!| first_name | state  |
|------------|--------|
|     0.0060 | 0.0000 |

This statistic tells us that in this table, filtering by state is less useful than filtering by first_name. More people share a common state than share a common first_name, so it would be faster to use the first_name index. MySQL keeps track of this information and uses it when planning which index to use when presented with multiple options.
All unique indexes are, by their nature, perfectly selective, meaning that only one record will be returned for each value. We can prove this by adding our id to our selectivity calculation:SELECT
  COUNT(DISTINCT id) / COUNT(*) as id,
  COUNT(DISTINCT first_name) / COUNT(*) as first_name,
  COUNT(DISTINCT state) / COUNT(*) as state
FROM
  people

Here you'll see that the id has far higher selectivity than any other column, as you might expect!| id     | first_name | state  |
|--------|------------|--------|
| 1.0000 |     0.0060 | 0.0000 |

Selectivity is query-dependent
Calculating selectivity across an entire table can be misleading if the data is not evenly distributed. In some cases, an index might be highly selective for one query and not selective at all for another.
Consider a table of one million users, where 99% are of type = "user" and 1% are of type = "admin". In this case, an index on type might seem useless because it's not very selective on average. But when you're querying for admins, it is highly selective. So while checking average selectivity is a good rule of thumb, pay careful attention to unevenly distributed data.
All other things being equal, MySQL will choose the most selective index possible. To speed this choice up, MySQL keeps statistics about the data's rough shape, which can become outdated.
Outdated or inaccurate statistics
We've been calculating cardinality by running COUNT(DISTINCT column) each time. It would be inefficient for MySQL to calculate that each time, so instead, it keeps track of the cardinality over time using random sampling. You can see this stored value by running SHOW INDEXES from [table]:| Table  | Non_unique | Key_name   | Column_name | Collation | Cardinality | Index_type | Visible |
|--------|------------|------------|-------------|-----------|-------------|------------|---------|
| people |          0 | PRIMARY    | id          | A         |      491583 | BTREE      | YES     |
| people |          1 | first_name | first_name  | A         |        3028 | BTREE      | YES     |
| people |          1 | state      | state       | A         |           1 | BTREE      | YES     |

The Cardinality column shows you the stored value MySQL will use to make its selectivity decisions. These statistics are automatically updated after 10% of a table has changed. This happens in the background, and you shouldn't ever notice it. If the statistics are so outdated that it is causing problems, you can force an update by running ANALYZE TABLE [table].
If you find that the statistics need to be more accurate for a particular table and are continually causing the optimizer to make poor decisions, you can change how the sampling is done. This is beyond the scope of this article, but the MySQL documentation has a comprehensive page on the topic.
Usually, you won't need to worry about the table sampling or updating the statistics manually.
Scanning the table is faster
Paradoxically, an index is only sometimes the fastest way to access data! The MySQL optimizer will always try to pick the quickest way to get the data it needs, which sometimes means choosing a slightly counter-intuitive method. For small tables or queries that select a large portion of a table, it can be faster for MySQL to skip the index scan and scan the table directly. Sometimes the dreaded table scan is the best access method possible!
An index is a secondary data structure (a B+ tree) apart from the table that must be traversed to find the matching row IDs. Once the IDs have been found, those rows must be found and read from disk. In situations where most of the rows will be fetched, reading all of the rows in order off of the disk is faster than going to the index first.
Usually, a table scan is bad news, but sometimes, even if rarely, it's the best possible outcome.
Index limitations
Our index may have been considered but not chosen in all the preceding scenarios. In the rest of this article, we will look at instances where your favorite index isn't even considered for use.
Indexes may seem magical, but they aren't magic. Without going too deeply into how a B+ tree works, it is essential to understand that there are some queries that an index cannot satisfy.
 We have a 5 minute video overview of how B-trees work if you'd like to dive in further.
Let's look at three situations where indexes cannot be considered, due to the nature of their underlying structure.
Wildcard searching
MySQL allows you to search against string columns using wildcards, which is commonly used for searching for strings that start with a particular substring.
Searching for first_names that start with "Aa," you'll see that MySQL considers and uses our index on the first_name column.explain select * from people where first_name like 'Aa%';

-- | id | table  | type  | possible_keys | key        | key_len | ref | rows | filtered | Extra                 |
-- |----|--------|-------|---------------|------------|---------|-----|------|----------|-----------------------|
-- |  1 | people | range | first_name    | first_name | 202     |     |  356 |   100.00 | Using index condition |

However, if we search for names that end with "ron," we have no such luck.explain select * from people where first_name like '%ron';

-- | id | table  | type | possible_keys | key | key_len | ref | rows   | filtered | Extra       |
-- |----|--------|------|---------------|-----|---------|-----|--------|----------|-------------|
-- |  1 | people | ALL  |               |     |         |     | 493889 |    11.11 | Using where |

MySQL can use the index until it reaches the first wildcard character. The index is not considered if the search string starts with a wildcard character.
If you need more robust string searching, we have videos on strategies for indexing wildcard searches and an introduction to fulltext indexes that may be helpful.
Composite indexes
Like wildcard searches, there are precise rules for how composite indexes can be used. A composite index is an index that covers more than one column. Instead of creating an index on first_name, we create one that covers two columns: (first_name, state). Let's drop all of our indexes and create a new one.ALTER TABLE people drop index first_name;
ALTER TABLE people drop index state;

ALTER TABLE people ADD INDEX multi (first_name, state);

When creating a composite index, think carefully about the order you put the columns in, because MySQL will only be able to use the columns starting on the left and working toward the right. It cannot skip any columns.
This means that our new multi index is useful when we're querying for both first_name and state but useless when we're only querying against state.
Let's run an EXPLAIN both queries to prove that. The multi key is considered and chosen for the first query.explain select * from people where first_name = 'Aaron' and state = 'TX'

-- | id | table  | type | possible_keys | key   | key_len | ref         | rows | filtered | Extra |
-- |----|--------|------|---------------|-------|---------|-------------|------|----------|-------|
-- |  1 | people | ref  | multi         | multi | 210     | const,const |  178 |   100.00 |       |

The key is not considered for the second query that doesn't include a condition on first_name.explain select * from people where state = 'TX'

-- | id | table  | type | possible_keys | key | key_len | ref | rows   | filtered | Extra       |
-- |----|--------|------|---------------|-----|---------|-----|--------|----------|-------------|
-- |  1 | people | ALL  |               |     |         |     | 493889 |    10.00 | Using where |

You must form what is called a "leftmost prefix" for the index to be used. Start at the left and work your way toward the right. You don't have to use every part of a composite key, but it is only accessible from the left side. If your index isn't being considered, ensure you've formed a leftmost prefix.
Joining on mismatched columns
MySQL can use an index to speed up the operation when joining two tables. There are, again, a few rules you must pay attention to, though! If the columns are not of the same type and size, this will preclude using an index.
For this purpose, VARCHAR(10) and CHAR(10) would be considered the same type and size, but VARCHAR(10) and CHAR(15) would not be. It may, in fact, be beneficial to lengthen the VARCHAR column to match the CHAR and allow the use of an index, even if the data won't be 15 characters long.
String columns must also use the same charset for an index to be used. If one column uses utf8mb4 and the other uses latin1, this will also preclude the use of an index.
There are cases when dissimilar types can be compared using an index, but it's always best to declare columns that you plan to use in joins as the same size and type.
Index obfuscation
Index obfuscation refers to the scenario where you've wrapped your indexed column in a function, thereby hiding it from MySQL.
This can happen easily, especially when using a SQL abstraction like an ORM or query builder. Given a table of people where you want to find people created this year, you might wrap a created_at column in the year function:SELECT * FROM people WHERE YEAR(created_at) = 2023;

This is a prime example of index obfuscation. If you had an index on created_at, MySQL cannot use it! Because you've wrapped the indexed column in a function, it no longer matches the data stored in the index, which is the full timestamp value of created_at. You've hidden the indexed column, and MySQL cannot see it.
In this case, it would be better to unwrap that function and use a range scan on the index:SELECT * FROM people WHERE created_at BETWEEN '2023-01-01' AND '2023-12-31';

Not all obfuscation can be unwrapped or undone. Sometimes you need to use functions on columns, and that's perfectly ok! As a general rule of thumb, try to leave your columns untouched and move any operations to the other side of the comparison.
Invisible indexes
This is perhaps the least common reason your index wouldn't be used, but it is possible! You can explicitly make an index invisible, preventing it from being considered for query usage. Usually, this is done to test the effects of dropping an index before committing it.
When you make an index invisible, MySQL maintains it even though it won't be used for queries. If you realize that making it invisible negatively affects performance, you can turn it back on immediately without rebuilding the index.
To make an index invisible, you can alter it and add the INVISIBLE keyword:ALTER TABLE people ALTER INDEX first_name INVISIBLE;

Now, running SHOW INDEXES will show you that the first_name index is no longer Visible:| Key_name   | Column_name | Collation | Cardinality | Index_type | Visible |
|------------|-------------|-----------|-------------|------------|---------|
| PRIMARY    | id          | A         |      493889 | BTREE      | YES     |
| first_name | first_name  | A         |        2965 | BTREE      | NO      |
| state      | state       | A         |           1 | BTREE      | YES     |

If your index isn't being considered when you think it should be, double-check that it hasn't been turned invisible.
Forcing an index
The optimizer is a complicated and sophisticated piece of software written by talented people over decades. It usually makes the right decision. Usually... but not always. If you're entirely sure that the optimizer is wrong and you're right, you can force an index to be used. Forcing in an index is as easy as putting USE INDEX([name]) in your query:EXPLAIN SELECT * FROM people USE INDEX (state) WHERE first_name = 'Aaron' AND state = 'TX'

| id | table  | type | possible_keys | key   | key_len | ref   | rows   | filtered | Extra       |
|----|--------|------|---------------|-------|---------|-------|--------|----------|-------------|
|  1 | people | ref  | state         | state | 8       | const | 246944 |    10.00 | Using where |

Taking control away from the optimizer should be done with caution. If you understand what the optimizer is doing and why it's making a bad choice, telling it which index to use is a good escape hatch. Remember that as your data changes over time, you'll need to reevaluate if forcing a particular index is still the most performant option.
Learn more
Indexing is a broad and deep topic about which many books have been written. If you'd like to learn even more, here are some good resources for you:
How MySQL uses indexes (MySQL Documentation)
Video course on indexes in MySQL (PlanetScale)
High Performance MySQL Chapter 4 (O'Reilly)
Footnotes
Technically more than one index can be used to satisfy a single query, but it's not an ideal strategy. An optimization called the index merge optimization can combine the results of two index scans. It's still better to plan your indexes so that only one is used, but it's good to know this optimization exists!]]></content>
        <summary><![CDATA[There are several reasons why MySQL might not consider your index, and in this article we’ll explore some of the most common ones.]]></summary>
      </entry>
    
      <entry>
        <title>Serverless Laravel applications with AWS Lambda and PlanetScale</title>
        <link href="https://planetscale.com/blog/serverless-laravel-app-aws-lambda-bref-planetscale" />
        <id>https://planetscale.com/blog/serverless-laravel-app-aws-lambda-bref-planetscale</id>
        <published>2023-05-03T17:03:57.138Z</published>
        <updated>2023-05-03T17:03:57.138Z</updated>
        
        <author>
          <name>Matthieu Napoli</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In general, PHP-based applications, like Laravel, are deployed on servers. It is also possible to run them serverless — for example, on AWS Lambda.
This approach provides several benefits:
Instant and effortless autoscaling to handle incoming traffic.
Redundant and resilient infrastructure out of the box, without extra complexity.
Pay-per-request billing.
PlanetScale is a great database to pair with serverless Laravel applications running on Lambda. In this article, we will create a new Laravel application, run it on AWS Lambda using Bref, connect to a PlanetScale MySQL database, and do a load test to look at the performance.
Creating a new Laravel application
Let's start from scratch and create a new Laravel project with Composer:composer create-project laravel/laravel example-app

cd example-app

We can run our application locally with php artisan serve, but let's run it in the cloud on AWS Lambda instead.
Deploying Laravel to AWS Lambda
To deploy Laravel to AWS Lambda, we can use Bref. Bref is an open-source project that provides support for PHP on AWS Lambda. It also provides a Laravel integration that simplifies configuring Laravel for Lambda.
Prerequisites
Before deploying to AWS Lambda, you will need:
An AWS account (to create one, go to aws.amazon.com and click "Sign up").
The serverless CLI installed on your computer.
You can install the serverless CLI using NPM:npm install -g serverless

If you don't have NPM or want to learn more, read the Serverless documentation.
Now, connect the serverless CLI to your AWS account via AWS access keys. Create AWS access keys by following the guide, and then set them up on your computer using the following command:serverless config credentials --provider aws --key <key> --secret <secret>

Getting started with Bref and Laravel
Now that everything is ready, let's install Bref and its Laravel integration:composer require bref/bref bref/laravel-bridge --update-with-dependencies

Then, let's create a serverless.yml configuration file:php artisan vendor:publish --tag=serverless-config

This configuration file describes what will be deployed to AWS. Let's deploy now:serverless deploy

When finished, the deploy command will display the URL of our Laravel application.
Using PlanetScale as the database
Now that Laravel is running in the cloud, let's set it up with a PlanetScale database. Start in PlanetScale by creating a new database in the same region as the AWS application (us-east-1 by default).
Click the Connect button and select "Connect with: PHP (PDO)". That will let us retrieve the host, database name, user, and password.
Edit the .env configuration file to set up the database connection:DB_CONNECTION=mysql
DB_HOST=<host url>
DB_PORT=3306
DB_DATABASE=<database_name>
DB_USERNAME=<user>
DB_PASSWORD=<password>
MYSQL_ATTR_SSL_CA=/opt/bref/ssl/cert.pem

For DB_DATABASE, you can use your PlanetScale database name directly if you have a single unsharded keyspace. If you have a sharded keyspace, you'll need to use @primary. This will automatically direct incoming queries to the correct keyspace/shard. For more information, see the Targeting the correct keyspace documentation.
Don't skip the MYSQL_ATTR_SSL_CA line: SSL certificates are required to secure the connection between Laravel and PlanetScale. Note that the path in AWS Lambda (/opt/bref/ssl/cert.pem) differs from the one on your machine (likely /etc/ssl/cert.pem). If you run the application locally, you will need to change this environment variable back and forth.
Next, redeploy the application:serverless deploy

Now that Laravel is configured, we can run database migrations to set up DB tables. To do so, we can run Laravel Artisan commands in AWS Lambda using the serverless bref:cli command:serverless bref:cli --args="migrate --force"

That's it! Our database is ready to use.
Creating sample data
To test the database connection, let's create sample data in the users table created out of the box by Laravel.
Edit the database/seeders/DatabaseSeeder.php class and uncomment the following line so that we can seed our database with 10 fake users:         \App\Models\User::factory(10)->create();

Now, let's create a public API route that returns all the users from the database. Add the following code to routes/api.php:Route::get('/users', function () {
    return \App\Models\User::all();
});

Let's deploy these changes:serverless deploy

Now, let's seed the database with 10 fake users:serverless bref:cli --args="migrate:fresh --seed --force"

We can now retrieve our 10 users via the API route we created:curl https://<application url>/api/users

Performance with a simple load test using PlanetScale
The execution model of AWS Lambda gives us instant autoscaling without any configuration. To illustrate that, I have performed a simple load test against the application we deployed above.
The only change I made is to disable Laravel's default rate limiting for API calls (ThrottleRequests middleware) in app/Http/Kernel.php because it would get in the way of my load test.
Furthermore, I did not ramp up traffic progressively because I wanted to show Lambda's instant scalability. I used ab (Apache's benchmarking tool) to request the /api/users endpoint with 50 threads (50 HTTP requests made in parallel continuously):ab -c 50 -n 10000 https://<my-api-url>/api/users

When looking at the AWS Lambda and API Gateway metrics, we see the following numbers:
Laravel scaled instantly from zero to 3,800 HTTP requests/minute.
100% of HTTP requests were handled successfully.
The median PHP execution time (p50) for each HTTP request is 75ms.
95% of requests (p95) are processed in less than 130ms.
PlanetScale processed up to 180 queries/s.
The median PlanetScale query execution time is 0.3ms.

The load test was performed against a freshly deployed application. That means the first requests were cold starts: New AWS Lambda instances started and scaled up to handle the incoming traffic. The cold starts usually have a much slower execution time (one second instead of 75ms). However, we do not see them in the p50 or p95 metrics because they only impacted 1% of the requests in the first minute. After the first 50 requests (cold starts), all the other requests were warm invocations.
Note that we are looking at the AWS Lambda duration instead of HTTP response time: This is to exclude any latency related to networking (and thus have reproducible and comparable results). This is not the HTTP response time real users would see as, like on any server, the network adds latency to HTTP responses.
After a few minutes, I dropped the traffic from 50 requests in parallel to one. The PHP execution time stayed identical. This illustrates that the load did not impact the response time.

Improving performance to speed up the SSL connection
For many web applications, responding in about 100ms is more than satisfactory. However, some use cases may require lower latency.
Since Laravel connects to PlanetScale over SSL, creating the SSL connection can take longer than running the SQL query itself. PlanetScale itself can easily handle unlimited connections using built-in connection pooling, which massively improves performance by keeping those database connections open between requests.
However, PHP, by design, shares nothing across requests. This means at the end of every request, PHP will close the connection to the database.
To circumvent this problem, we can use Laravel Octane to gain performance in two ways:
Keeping the Laravel application in memory across requests using Laravel Octane.
Reusing SQL connections across requests (instead of reconnecting every time).
Bref supports Laravel Octane natively. We need to change the serverless.yml configuration to enable it. Change the web function configuration to this:web:
  handler: Bref\LaravelBridge\Http\OctaneHandler
  runtime: php-81
  environment:
    BREF_LOOP_MAX: 250
    OCTANE_PERSIST_DATABASE_SESSIONS: 1
  events:
    - httpApi: '*'

Let's redeploy with serverless deploy and run the load test again:

We notice the following improvements:
The median PHP execution time (p50) went from 75ms to 14ms.
95% of requests (p95) are processed in less than 35ms.
Laravel handled 1,000 more requests/minute, though this number is not important: We could simply send more requests in our load test to reach a higher number anytime.
Going further and next steps
Here are some next steps:
Download and run the code used in this blog post.
Learn more about using PlanetScale with Bref: MySQL compatibility, data imports, and schema changes workflow.
Learn more about running Laravel on AWS Lambda: running Artisan commands, setting up assets, queues, and more.
You can also learn more about PlanetScale and AWS Lambda with Bref in the respective documentation.]]></content>
        <summary><![CDATA[Learn how to create serverless Laravel applications by deploying them to AWS Lambda and using PlanetScale as the database.]]></summary>
      </entry>
    
      <entry>
        <title>Database branching: three-way merge for schema changes</title>
        <link href="https://planetscale.com/blog/database-branching-three-way-merge-schema-changes" />
        <id>https://planetscale.com/blog/database-branching-three-way-merge-schema-changes</id>
        <published>2023-04-26T17:03:57.138Z</published>
        <updated>2023-04-26T17:03:57.138Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[You may be familiar with Git's three-way merge as a way to resolve source code changes made by developers on their independent branches. PlanetScale offers three-way merge for your schema branches, making schema change collaboration simpler and safer. It's similar in concept, but completely different in implementation. In the remainder of this post, we illustrate the technical implementation and the nuances of diffing schemas vs. diffing code.
What does it mean to merge schema changes in the first place?
PlanetScale offers a model of schema branching and deploy requests. In short, a developer may branch the main database, creating a copy of the schema in a dev environment, where they are free to make any changes without affecting production. Multiple developers can do the same, concurrently. At some point, the developer wants to apply their schema changes to production. They create a deploy request on PlanetScale, similar to a pull request on GitHub.
The deploy request is where the developer and their team review the changes. The deploy request page presents a semantic diff of the changes made — e.g., an ALTER TABLE foo ..., a CREATE TABLE bar (...), etc. PlanetScale uses Vitess's schemadiff library to generate the semantic diff between the main (production) branch and the developer's branch. If approved, the changes are enqueued in the deploy queue, to be eventually deployed in a non-blocking fashion.
The case for three-way merge arises when multiple developers do the same, concurrently. Say Dev 1 created a branch a couple of days ago. During this time, Dev 2 created and deployed their own branch, merging it into main, the production branch. Dev 1's branch and changes now may or may not be compatible with main. Not only do they not reflect or contain the new schema in main, but they may also outright conflict with the newly made changes!
As long as Dev 1 still works on their branch, that's fine. But at some time, they will want to deploy their changes. It's time to put their changes in the deployment queue. But, are the changes at all valid? This is where three-way merge is invoked. It is essentially a mechanism that determines whether branches slated to be merged conflict with one another, overlap one another, or are completely unrelated and have no impact on one another.
Setting the database branching terminology
In Git, we use terminology such as merge-base, topic-head, etc. But we now illustrate a solution tailored to schema changes, and we may as well use different terminology. Let's use main for production: this is what everyone branches from and eventually deploys to. And let's use branch1 and branch2 as branch names created by Dev 1 and Dev 2, respectively.
It's worth pointing out that nothing tracks the changes on a development branch while it's open. Dev 1 may CREATE, ALTER and, DROP all they want. PlanetScale follows up on any changes they make, but, simplified for the purposes of this post, it's only when the developer creates a deployment request that PlanetScale examines their schema to compute the diff in their branch. The diff is one or more SQL statements (we ignore the case where the schema is unchanged here) that would get main to look like branch1. For example, consider these schemas in main and in branch1:-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

The git diff of the two would be: CREATE TABLE `customer` (
        `id` int,
+       `name` varchar(255) NOT NULL DEFAULT '',
        PRIMARY KEY (`id`)
 );

But that's not something a database can work with. Instead, the deploy request generates this semantic SQL diff:ALTER TABLE `customer` ADD COLUMN `name` varchar(255) NOT NULL DEFAULT ''

This semantic diff is generated for any deploy request.
Schema three-way merge
Assume main is the base branch, and branch1 and branch2 are both enqueued to deploy.
Three-way merge compares the two branches and uses main (hence, three branch comparison) like so:
Compute diff1 as diff(main, branch1). This is similar to main..branch1 in Git notation. We can consider diff1 as a function — i.e., diff1(main) => branch1.
Likewise, compute diff2 as diff(main, branch2).
Look at diff1(diff2(main)). If running diff1 over diff2(main) is invalid (examples to follow), there's a conflict.
Likewise, attempt diff2(diff1(main)). If that's invalid, there's a conflict.
If both are valid but diff1(diff2(main)) != diff2(diff1(main)), there is a conflict.
If both are valid and diff1(diff2(main)) == diff2(diff1(main)), there is no conflict between the two branches.
The algorithm is, in fact, more elaborate. But let's first walk through a few examples to understand how the diffs and three-way-merge work, and what SQL nuances we might hit.
Example: no conflict
Consider this simplified schema for the three branches:-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
);

The diffs are:-- diff1:
ALTER TABLE `customer` ADD COLUMN `name` varchar(255) NOT NULL DEFAULT ''

-- diff2:
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
)

Clearly, the two branches do not conflict with one another. One adds a column to customer, and the other creates delivery table. Applying the two diffs in either order ends up with the same end result:CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
);

Example: clear conflict
In the next example, both branches introduce a new column under the same name but with a different type:-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`subscription_type` int unsigned NOT NULL DEFAULT 0,
	PRIMARY KEY (`id`)
);

The diffs are:-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid')

-- diff2:
ALTER TABLE `customer` ADD COLUMN `subscription_type` int unsigned NOT NULL DEFAULT 0

Clearly, applying both diffs on top of each other is destined to fail. You cannot add two columns under the same name.
Example: subtle conflict
How about adding two completely different columns?CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`)
);

The diffs are:-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid')

-- diff2:
ALTER TABLE `customer` ADD COLUMN `joined_at` timestamp NOT NULL DEFAULT current_timestamp()

It's possible to apply both diffs, in any order. However, the resulting schema looks different depending on the order. It may look either:-- diff1(diff2(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- diff2(diff1(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`subscription_type` enum('free', 'promotional', 'paid'),
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`)
);

The order of columns in a table matters. Queries that run a SELECT * FROM customer and use positional arguments will get different columns at positions 3 and 4. The two branches conflict with each other. This is similar to a Git merge conflict where two branches append different rows to the end of a file.
We could avoid the conflict if one of the branches positioned the new column anywhere but last. For example:CREATE TABLE `customer` (
	`id` int,
	`subscription_type` enum('free', 'promotional', 'paid'),
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

The above would lead to a non-conflicting diff:-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid') AFTER `id`

Nuance: no conflict
The same cannot be said for index changes. We now add a column and a matching index in one migration, and another index in the second migration:-- main:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`),
	KEY `name_idx` (`name`(16))
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `joined_idx` (`joined_at`)
);

The diffs are:-- diff1:
ALTER TABLE `customer` ADD KEY `name_idx` (`name`(16))

-- diff2:
ALTER TABLE `customer` ADD COLUMN `joined_at` timestamp NOT NULL DEFAULT current_timestamp(), ADD KEY `joined_idx` (`joined_at`)

Strictly speaking, the table structure looks different based on the order we apply the diffs. It can be either of:-- diff1(diff2(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `joined_idx` (`joined_at`),
	KEY `name_idx` (`name`(16))
);

-- diff2(diff1(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `name_idx` (`name`(16)),
	KEY `joined_idx` (`joined_at`)
);

However, for practical purposes, the order of indexes is inconsequential. All queries against the table will both behave in the exact same way, as well as perform in the same way, irrespective of the ordering of the keys. The only change is the output of SHOW CREATE TABLE as well as INFORMATION_SCHEMA introspection.
PlanetScale disregards index ordering.
Overlapping changes
The algorithm is more elaborate than described thus far. To reduce developer friction as much as possible, it also considers identical, partial overlap between diffs. For example:-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `tbl1` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `tbl2` (
	`id` int,
	PRIMARY KEY (`id`)
);

Both branches create the same name column on customer, and then each branch proceeds to make other unrelated changes. Thanks to schemadiff, each of the changes (ALTER, CREATE, ...) is fully formalized and we can analyze the changes one by one.
Is there a conflict with the new name column? Given that both branches completely agree on that particular change, PlanetScale's three-way merge considers this as an overlap and allows it. Should branch1 merge first, branch2's diff auto-adapts and is left to the creation of tbl2 only.
Further reducing friction
Schema changes may take time to run, during which more developers will want to deploy their own changes. There is a deployment queue, first come first served, that only allows a single deploy request at a time to run.
When a developer submits their deploy request, their change is validated against all queued changes. This avoids the situation where the developer waits for hours in queue, only to learn the one deployment before theirs caused a conflict. PlanetScale shoots an early warning so that developers can better use their time in queue.
Conclusion
Schema changes and source code changes share enough similarities that we can offer developers schema lifecycle workflows they are familiar with from their source code workflows. With some adaptations to the obvious differences and challenges a schema change deployment poses, we are able to utilize familiar and trusted logic to manage developer collaboration around schema branching.]]></content>
        <summary><![CDATA[Learn how PlanetScale uses Git-like three-way diff to resolve schema change conflicts across database branches.]]></summary>
      </entry>
    
      <entry>
        <title>Query performance analysis with Insights</title>
        <link href="https://planetscale.com/blog/query-performance-analysis-with-insights" />
        <id>https://planetscale.com/blog/query-performance-analysis-with-insights</id>
        <published>2023-04-20T12:00:00.000Z</published>
        <updated>2023-04-20T12:00:00.000Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Analyzing the performance of your database can be a tricky business. Even if your database is healthy and the majority of queries are fast, a handful of slow or improperly indexed query patterns can frustrate your users or spiral into bigger problems as your dataset and traffic grow.
To help developers identify and troubleshoot problematic queries, PlanetScale Insights now shows time-series metrics on a per-query pattern basis.
Query patterns
The fundamental unit of analysis in Insights is a query pattern, so it's worth discussing how and why Insights defines this concept. As you probably know, relational databases receive queries in the form of SQL statements:select * from users where id = 123

Because PlanetScale databases often receive thousands or even millions of queries per second, reporting performance stats for every individual query isn't usually what we want. Instead, we'd like aggregate data for similar queries over time. It's better to know “This is how long id-based user lookups took over the last hour” versus “This is how long it took to look up user 123.”
To establish what queries should be grouped together, we use Vitess's SQL parser to find a normalized pattern for every query. The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders. Applied to the query above, we find the following normalized representation.select * from users where id = ?

Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses. With the normalized query in hand, we can calculate a fingerprint (a hash of the normalized SQL) and use it to group queries and emit aggregated telemetry, such as the number of queries, total execution time, and total rows read, returned, and written. We also send along a sketch of query execution times that allows us to show error-bounded percentiles (e.g., median and p95).
Time-series query data in action
To get an idea of how this data is useful, let's walk through a situation where PlanetScale engineers used this feature to troubleshoot a query in our primary production database.
As a matter of regular maintenance, we regularly review the most expensive and frequently executed queries in our production database. During this process, we noticed the following query, which deletes chunks of expired rows from a fairly large table. From the main Insights tab in PlanetScale, this query takes approximately 8 seconds to run on average.

Clicking on this query, we see a graph of performance (median and p95 latency) over the last 24 hours. Looking at the median and 95th-percentile query latency data, we see an interesting pattern:

The background jobs to delete expired data are kicked off at 10 past the hour. For the first 10-minute period after the kickoff (10–20 past the hour), the mean and p95 latencies are fairly low — a few hundred milliseconds. In the next bucket (20–30 past the hour), however, the execution times explode to almost 15 minutes. There's definitely room for improvement here!
The query in question is a delete with a where clause and a limit that specifies only the first n should be deleted. The limit is added to avoid long-running deletes that could slow down other queries. In our case, the limit is set to 500. When the hourly run starts, there are many rows that meet the conditions in the where clause, so the first 500 matches are found relatively quickly. As the job progresses, and more and more of the matching rows have already been deleted, it becomes more expensive to find deletable rows. Toward the end of an hourly run, the query approaches needing to do a full table scan for each execution. Since there are over 100 million rows in the table, this operation becomes untenably expensive.
So what can we do to speed this up? Add an index, of course! Adding an index to the minute column lets the database quickly identify and delete the rows that match the where clause. Since we're using PlanetScale, it's easy to add an index without a performance hit or downtime while the index is building, even on busy tables with hundreds of millions of rows. After opening a deploy request with the index, getting approval from our team, and clicking “deploy changes,” our database starts building the index. When it's finished, we see a deploy marker on the query latency graph, labeled #505 — our 505th deploy request to this database. From this graph, we can confirm that query latencies have dropped so dramatically that they're not even visibly discernible from zero on the graph: 
If we fast forward to the next day, we see that query latencies are consistently under a few hundred milliseconds. Success!

Try it out now!
We've found that the ability to easily analyze time-series data for query patterns is a powerful tool for identifying and troubleshooting performance issues. Having this ability automatically built into your PlanetScale database makes it easy for teams of all sizes to decide where and when to optimize their database usage.
To see query pattern metrics in your database right now, click on a query from the table in your databases' Insights tab. Try it out, and let us know what you think!
For more information, check out the Insights docs.]]></content>
        <summary><![CDATA[You can now use Insights to view time-series performance data on a per-query pattern basis.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL for application developers</title>
        <link href="https://planetscale.com/blog/mysql-application-developers" />
        <id>https://planetscale.com/blog/mysql-application-developers</id>
        <published>2023-04-20T00:03:57.138Z</published>
        <updated>2023-04-20T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Everything you need to know about MySQL as an application developer, with a focus on improving query performance. After covering the high-level overview, we’ll put the learnings to the test with some hands-on examples.]]></summary>
      </entry>
    
      <entry>
        <title>Pagination in MySQL</title>
        <link href="https://planetscale.com/blog/mysql-pagination" />
        <id>https://planetscale.com/blog/mysql-pagination</id>
        <published>2023-04-18T00:03:57.138Z</published>
        <updated>2023-04-18T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Any good DBA will tell you to "select only what you need." It's one of the most common aphorisms, and for good reason! We don't ever want to select data that we're just going to throw away. One way this advice manifests itself is to not use SELECT * if you don't need all the columns. By limiting the columns returned, you're selecting only what you need.
Pagination is another way to "select only what you need." Although, this time, we're limiting the rows instead of the columns. Instead of pulling all the records out of the database, we only pull a single page that we're going to show to the user.
There are two primary ways to paginate in MySQL: offset/limit and cursors. Which method you choose depends on your use case and your application's requirements. Neither is inherently better than the other. They each have their own strengths and weaknesses.
The importance of deterministic ordering
Before we talk about the wonders of pagination, we need to talk about deterministic ordering. When your query is ordered deterministically, it means that MySQL has enough information to order your rows in the exact same way every single time. If you sort your rows by a column that is not unique, MySQL gets to decide which order to return these rows in. Let's look at an example.
Given this table full of people named Aaron:| id | first_name | last_name |
|----|------------|-----------|
|  1 | Aaron      | Francis   |
|  2 | Aaron      | Smith     |
|  3 | Aaron      | Jones     |

Let's run a query to order those people by their first name:SELECT
  *
FROM
  people
ORDER BY
  first_name

Because all three people have the same first name, MySQL gets to decide which order to return the rows in! Depending on certain factors, the order may change. This is because the ordering is not deterministic enough.
This result set is valid because it is ordered by first_name:| id | first_name | last_name |
|----|------------|-----------|
|  2 | Aaron      | Smith     |
|  1 | Aaron      | Francis   |
|  3 | Aaron      | Jones     |

But so is this result set, because it also is ordered by first_name:| id | first_name | last_name |
|----|------------|-----------|
|  3 | Aaron      | Jones     |
|  2 | Aaron      | Smith     |
|  1 | Aaron      | Francis   |

We haven't given MySQL specific enough instructions to produce a deterministically ordered set of results. We've asked it to order the rows by first_name, and it has dutifully complied, but it may not put them in the same order every time.
The easiest way to produce deterministic ordering is to order by a unique column because every value will be distinct, and MySQL will have no choice but to return the rows in the same order every time. Of course, that's not very helpful if you need to order by a column that's not unique! In that case, appending a unique column to your ordering does the trick. In most cases, simply adding the id is the best way to go.SELECT
  *
FROM
  people
ORDER BY
  first_name, id -- Add ID to ensure deterministic ordering

Now MySQL knows that when given two first_name values that are the same, it should then look at the id column to determine the order. This is deterministic ordering, and it's a prerequisite to effective pagination.
Offset/limit pagination
Offset/limit pagination is likely the most common way to paginate in MySQL because it's the easiest to implement. With offset/limit pagination, we're taking advantage of two SQL keywords: OFFSET and LIMIT. The LIMIT keyword tells MySQL how many rows to return, while OFFSET tells MySQL how many rows to skip over.SELECT
  *
FROM
  people
ORDER BY
  first_name, id
LIMIT
  10 -- Only return 10 rows
OFFSET
  10 -- Skip the first 10 rows

In this example, we're selecting all the people from the people table, ordering them by first_name and id, and then limiting the result set to 10 rows. We're also skipping the first 10 rows, returning rows 11-20.
To construct an offset/limit query, you need to know the page size and the page number. The page size is how many records you want to show per page, and the page number is what page you want to show. The LIMIT is determined by the page size, and the OFFSET is determined by both the page size and the page number.
To calculate the correct offset, multiply the page_number - 1 by the page_size. This ensures that when your user is on the first page, the offset calculates to 0, meaning you're not skipping any rows.SELECT
  *
FROM
  people
ORDER BY
  first_name, id
LIMIT
  10 -- page_size
OFFSET
  10 -- (page_number - 1) * page_size

 We have a video overview of offset/limit pagination, if you prefer that medium. 
Strengths of offset/limit pagination
One of the great strengths of offset/limit pagination is that it's easy to implement and easy to understand. It doesn't require tracking any state over time; each request can stand alone. It doesn't matter what pages the user has visited before. The query construction is always the same. The math is simple. The query is simple.
Another strength of this method is that pages are directly addressable. Users who want to navigate from page 1 directly to page 10 can do so quite easily, provided your interface exposes page links. (This is not the case with cursor pagination.) Convincing arguments have been made that directly addressable pages shouldn't ever be exposed to users because they have no semantic meaning. For example, what does page 84 mean? Why not just expose "next" and "back" buttons? That's a decision that you'll have to make for your application! Many users are used to seeing directly addressable page numbers, and it can be helpful to skip several pages ahead instead of one page at a time. It's up to you to decide what's best for your application, but if you need directly addressable pages, you will need to use offset/limit pagination.
Offset/limit pagination and drifting pages
One weakness of offset/limit pagination is that pages can drift. This is true of cursor-based pagination as well, but it's more likely to happen with offset/limit pagination.
Let's look at an example in which your user is viewing page one with ten records. The last person they see on this page is "Judge Bins." They don't see her yet, but "Sonya Dickens" should be the first person on page 2.| id | first_name | last_name |
|----|------------|-----------|
|  1 | Phillip    | Yundt     |
|  2 | Aaron      | Francis   |
|  3 | Amelia     | West      |
|  4 | Jennifer   | Becker    |
|  5 | Macy       | Lind      |
|  6 | Simon      | Lueilwitz |
|  7 | Tyler      | Cummerata |
|  8 | Suzanne    | Skiles    |
|  9 | Zoe        | Hill      |
| 10 | Judge      | Bins      |
|----|------------|-----------| Page break
| 11 | Sonya      | Dickens   |
| 12 | Hope       | Streich   |
| 13 | Kristian   | Kerluke   |
| 14 | Stanton    | Fisher    |
| 15 | Rasheed    | Little    |
| 16 | Deron      | Koss      |
| 17 | Trevor     | Daniel    |
| 18 | Vernie     | Friesen   |
| 19 | Jody       | Littel    |
| 20 | Jorge      | Nienow    |

While your user is viewing the page, the person with the id of 2 (Aaron Francis) is deleted.| id | first_name | last_name |
|----|------------|-----------|
|  1 | Phillip    | Yundt     |
|  3 | Amelia     | West      |
|  4 | Jennifer   | Becker    |
|  5 | Macy       | Lind      |
|  6 | Simon      | Lueilwitz |
|  7 | Tyler      | Cummerata |
|  8 | Suzanne    | Skiles    |
|  9 | Zoe        | Hill      |
| 10 | Judge      | Bins      |
| 11 | Sonya      | Dickens   | <-- Sonya is now on page one!
|----|------------|-----------| Page break
| 12 | Hope       | Streich   | <-- This is now the first person on page two
| 13 | Kristian   | Kerluke   |
| 14 | Stanton    | Fisher    |
| 15 | Rasheed    | Little    |
| 16 | Deron      | Koss      |
| 17 | Trevor     | Daniel    |
| 18 | Vernie     | Friesen   |
| 19 | Jody       | Littel    |
| 20 | Jorge      | Nienow    |
| 21 | Mara       | Grady     |

The user navigates to page two, and the first person they see is Hope Streich. Because we're naively skipping over the first ten rows, Sonya Dickens has been skipped altogether. Sorry Sonya. Your user never sees her unless they navigate back to page one.
Paginating ever-changing data is not an easy problem to solve, and this may be an acceptable tradeoff for you. Even cursor-based pagination is prone to some of these movements, but it's less likely to happen.
Performance drawbacks of offset/limit pagination
The way that the OFFSET keyword works is that it discards the first n rows from the result set. It doesn't simply skip over them. Instead, it reads the rows and then discards them. This means that as you work into deeper and deeper pages of your result set, the performance of your query will degrade. This is because the database must read and discard more rows as you move through the result set.
Very deep pages can take multiple seconds to load. This is a big issue with offset/limit pagination, and it's one reason cursor-based pagination is so popular. Cursor-based pagination doesn't have this performance drawback because it doesn't use the OFFSET keyword.
Deferred joins for faster offset/limit pagination
There is a technique known as a "deferred join" that can optimize offset/limit pagination.
The deferred join technique is an optimization solution that enables more efficient pagination. It performs the pagination on a subset of the data instead of the entire table. This subset is generated by a subquery, which is joined with the original table later. The technique is called "deferred" because the join operation is postponed until after the pagination is done.SELECT * FROM people
    INNER JOIN (
      -- Paginate the narrow subquery instead of the entire table
      SELECT id FROM people ORDER BY first_name, id LIMIT 10 OFFSET 450000
    ) AS tmp USING (id)
ORDER BY
  first_name, id

This technique has been widely adopted, and there are libraries available for popular web frameworks such as Rails (FastPage) and Laravel (Fast Paginate).
Here is a graph showing the performance of a deferred join vs. the standard offset/limit pagination method, taken from our blog post introducing the FastPage Rails gem.

As you can see, the deferred join method is much faster than the standard offset/limit pagination method, especially for deeper pages.
If you do decide that offset/limit is the right choice for your application, then you should consider using a deferred join technique to optimize your queries.
Cursor pagination
Now that we're thoroughly versed on the offset/limit method let's talk about cursor-based pagination. Cursor-based pagination is a method of pagination that uses a "cursor" to determine the next page of results. It's important to note that this differs from a database cursor, which is a different concept. When discussing cursors in the context of pagination, we're using the word to mean a pointer, an identifier, a token, or a locator.
 We also have a video overview of cursor pagination, if you prefer that medium. 
The idea behind cursor-based pagination is that you have a cursor that points to the last record that the user saw. When the user requests the next page of results, they must send along the cursor, which we use to determine where to start the next page of results.
Instead of using the OFFSET keyword, we use the cursor to construct a WHERE clause that filters out all the rows that the user has already seen.
Let's start with a simple example. Let's say we have a table of people and want to paginate the results by the id. When the user requests the first page of results, there is no cursor, so we return the first ten rows.SELECT
  *
FROM
  people
ORDER BY
  id
LIMIT
  10

MySQL returns the following result set:| id | first_name | last_name |
|----|------------|-----------|
|  1 | Phillip    | Yundt     |
|  2 | Aaron      | Francis   |
|  3 | Amelia     | West      |
|  4 | Jennifer   | Becker    |
|  5 | Macy       | Lind      |
|  6 | Simon      | Lueilwitz |
|  7 | Tyler      | Cummerata |
|  8 | Suzanne    | Skiles    |
|  9 | Zoe        | Hill      |
| 10 | Judge      | Bins      |

Here is where cursor and offset-based pagination begin to diverge. With cursor-based pagination, we must construct and send the cursor out to the frontend. The cursor is a pointer to the last record that the user has seen. Since we are only sorting by id, the cursor is the id of the last record in the result set. Usually, it would be base64 encoded, but for simplicity, we'll just leave it unencoded.
The backend sends out the results and a cursor of id=10, usually called next_page or something similar.{
  "next_page": "(id=10)",
  "records": [
    // ...
  ]
}

When the user requests the next page of results, they must return the cursor to the server. The cursor is used to construct a WHERE clause that filters out all the rows the user has already seen.SELECT
  *
FROM
  people
WHERE
  id > 10 -- The last id that the user saw was 10, so we start at the next id after 10
ORDER BY
  id
LIMIT
  10

You can see that in this query, we're not using the OFFSET keyword at all, but instead, we're jumping straight to the next record after the last record that the user saw. This is the key difference between cursor and offset-based pagination!
It gets a bit more complicated if we go back to our original example of sorting by first_name and then id. Since we're sorting by both columns, the cursor must contain both values for the last record that the user has seen.
Let's take this example set of records, which is 20 people sorted by first name, and then ID.| id    | first_name | last_name  |
|-------|------------|------------|
|     2 | Aaron      | Francis    |
|   589 | Aaron      | Streich    |
|  3896 | Aaron      | Corkery    |
|  8441 | Aaron      | Kreiger    |
|  9179 | Aaron      | Wolf       |
| 10970 | Aaron      | Reichert   |
| 13082 | Aaron      | Collier    |
| 13704 | Aaron      | Braun      |
| 19399 | Aaron      | Watsica    |
| 25995 | Aaron      | Runte      |
|-------|------------|------------| Page break
| 26794 | Aaron      | Mayer      |
| 32075 | Aaron      | Hahn       |
| 32471 | Aaron      | Bahringer  |
| 40612 | Aaron      | Abbott     |
| 41202 | Aaron      | Willms     |
| 41571 | Aaron      | Nienow     |
| 46556 | Aaron      | Glover     |
| 48501 | Aaron      | Boyle      |
| 50628 | Aaron      | Schmeler   |
| 51656 | Aaron      | Williamson |

In this case, the last record the user sees on page 1 has an id of 25995. This information alone is not enough for the cursor! We must also add the first_name since it is part of the sort order. The cursor for the last record on page 1 is (first_name=Aaron, id=25995).
When the user sends back the cursor, we can construct a WHERE clause that filters out all the rows the user has already seen. This time, it requires a little more thought because we're sorting by two columns. We'll add a first_name filter to show any names after "Aaron," but since first_name has many duplicates, we'll also add an id filter to show any "Aaron"s that have an id after the last id that the user saw.SELECT
  *
FROM
  people
WHERE
  (
    (first_name > 'Aaron')                -- Names after Aaron
    OR
    (first_name = 'Aaron' AND id > 25995) -- Aarons, but after the last id that the user saw
  )
ORDER BY
  first_name, id
LIMIT
    10

As you add more columns to the sort order, you'll need to add more filters to the WHERE clause.
Drawbacks to cursor-based pagination
As you've seen, cursor-based pagination is more complicated to implement than offset-based pagination. Constructing the cursor and the WHERE clause requires more thought. You also have to keep track of that little piece of state: the cursor. This isn't inherently bad, and not all complexity is reducible, but it's something to keep in mind. Most frameworks have cursor-based pagination built in, so you may not have to implement it manually.
Another drawback to cursor-based pagination is that it's impossible to address a specific page directly. For instance, if the requirement is to jump directly to page five, it's not possible to do so since the pages themselves are not explicitly numbered, and there is no way to create a cursor without knowing the last record that has been seen. You can only navigate to the next page.
Benefits of cursor-based pagination
One of the advantages of cursor-based pagination is its resilience to shifting rows. For example, if a record is deleted, the next record that would have followed is still displayed since the query is working off of the cursor rather than a specific offset.
Let's go back to our Sonya Dickens example. The last person they see on this page is "Judge Bins." They don't see her yet, but "Sonya Dickens" should be the first person on page 2.| id | first_name | last_name |
|----|------------|-----------|
|  1 | Phillip    | Yundt     |
|  2 | Aaron      | Francis   |
|  3 | Amelia     | West      |
|  4 | Jennifer   | Becker    |
|  5 | Macy       | Lind      |
|  6 | Simon      | Lueilwitz |
|  7 | Tyler      | Cummerata |
|  8 | Suzanne    | Skiles    |
|  9 | Zoe        | Hill      |
| 10 | Judge      | Bins      | <-- The cursor points here
|----|------------|-----------| Page break
| 11 | Sonya      | Dickens   |
| 12 | Hope       | Streich   |
| 13 | Kristian   | Kerluke   |
| 14 | Stanton    | Fisher    |
| 15 | Rasheed    | Little    |
| 16 | Deron      | Koss      |
| 17 | Trevor     | Daniel    |
| 18 | Vernie     | Friesen   |
| 19 | Jody       | Littel    |
| 20 | Jorge      | Nienow    |

While they are viewing page one, "Aaron Francis" is deleted.| id | first_name | last_name |
|----|------------|-----------|
|  1 | Phillip    | Yundt     |
|  3 | Amelia     | West      | <-- Aaron Francis is deleted
|  4 | Jennifer   | Becker    |
|  5 | Macy       | Lind      |
|  6 | Simon      | Lueilwitz |
|  7 | Tyler      | Cummerata |
|  8 | Suzanne    | Skiles    |
|  9 | Zoe        | Hill      |
| 10 | Judge      | Bins      | <-- The cursor *still* points here
|----|------------|-----------| Page break
| 11 | Sonya      | Dickens   | <-- Sonya is the first person after the cursor
| 12 | Hope       | Streich   |
| 13 | Kristian   | Kerluke   |
| 14 | Stanton    | Fisher    |
| 15 | Rasheed    | Little    |
| 16 | Deron      | Koss      |
| 17 | Trevor     | Daniel    |
| 18 | Vernie     | Friesen   |
| 19 | Jody       | Littel    |
| 20 | Jorge      | Nienow    |

This time, it doesn't matter! The cursor points to the last record that the user saw, and the next record is still Sonya Dickens. We tell the database, "the last record I saw was ID 10, and I want to see the next ten records." The database doesn't care that some records were deleted. It just knows that the next record is Sonya Dickens.
This is true even if the cursor is pointing to a record that was deleted. If the cursor points to a record that was deleted, we're still telling the database, "the last record I saw was ID 10, and I want to see the next ten records." Again, the database doesn't care that the record was deleted. It just knows that the next record is Sonya Dickens.
Cursor based pagination performance
Cursor-based pagination can be much more performant than offset/limit simply because it accesses much less data. Instead of generating a result set and throwing away everything before the offset, the database can start at the offset and return the next N records. This is especially true if the offset is large. You will need to consider a proper indexing strategy to ensure the database can efficiently find the necessary records.
Conclusion
Pagination is a common requirement for almost every web application or API. Now you understand the different types of pagination and the tradeoffs that come with each.
Offset/limit is nice because it's easy to implement and understand, and you can directly address pages. Some downsides are that it can be slower as you navigate deeper into the pages, and it is more prone to drift.
Cursor-based pagination is nice because it is more performant and more resilient to shifting rows. Some of the downsides are that it is more complicated to implement, and you cannot directly address pages.
Which method you choose is up to you, but hopefully, this article has given you a better understanding of the tradeoffs, and you can now make an informed decision.]]></content>
        <summary><![CDATA[An overview of the different ways to paginate in MySQL including limit/offset pagination and cursor pagination plus the pros and cons of each.]]></summary>
      </entry>
    
      <entry>
        <title>Safely making database schema changes</title>
        <link href="https://planetscale.com/blog/safely-making-database-schema-changes" />
        <id>https://planetscale.com/blog/safely-making-database-schema-changes</id>
        <published>2023-04-13T14:00:00.000Z</published>
        <updated>2023-04-13T14:00:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Back in the day, as a junior software developer, I was terrified to make database changes. We all hear the horror stories of one wrong command being run, and everything is wiped out. Or a database change took longer to release than expected, and users were severely affected.
At the same time, I was never pushed to get “good” at databases. I worked at a small startup. I had product features to release. I wanted to be better at Python and JavaScript, not databases. The database was just not my focus, so it stayed scary.
As many of you know, not making changes in the database is never an option. Your database has to grow and change with your application and product. The velocity of our database changes cannot be slow if we want to ship new features regularly, but historically moving safely has meant moving slowly. These bottlenecks can come from different sources, like having a limited number of database experts on the team or being restrictive of what and when changes can occur.
In this blog post, I want to focus on safely making database schema changes without these blockers. I'll touch on both database best practices and PlanetScale features that ensure safe database schema changes.
Smaller, frequent changes
Before we get into PlanetScale-specific features, I want to start with the size of schema changes. Modern software development teaches us to release code early and often. This should apply to our database changes too.
We aim for smaller, more frequent schema changes instead of big, complex changes. Smaller changes are easier to test, verify, and are safer for the whole system. It's hard to verify large changes that span across the schema and into many different tables. A small SQL change is easier to understand and verify than one with hundreds of lines of SQL.
You and your team might be the only database consumers from an application code perspective when you are small. However, as the company grows, multiple teams or departments will interact with the database. This can make bigger changes even riskier. (If this includes an Analytics or Data team, you should check out PlanetScale Connect.)
Backwards compatible changes
Anytime you change an existing schema used by application code, you should ensure it is backward compatible at all steps of the process. You may have also heard of this referred to as the “expand and contract” model. An oversimplification of this model is that you:
Expand the schema: Update the existing schema with new changes
Update the application code to write to both old and new schema
Migrate data from the old schema to the new schema
Update the application code to only read from the new schema
Update the application code to only write to the new schema
Contract the schema: Remove the old schema that is no longer used
The expand and contract model ensures that at any time throughout a schema change, your application code will always work and not break while you are changing the existing schema. You can learn more about making backwards compatiable database changes in this blog post.
Feature flags
You might be wondering, what do feature flags have anything to do with database schema changes? The reality is that they can help us make incremental changes to the database more safely. Using feature flagging, we can incrementally roll out new features built on top of the updated database schema. If something goes wrong in that process, we can roll back in many cases. Of course, not all changes can be rolled back, and you must implement feature flags properly. Still, they are a helpful tool to leverage alongside your database when you want to make safer changes incrementally.
Branching
A core part of safely making database schema changes with PlanetScale is branching. A database branch provides an isolated copy of your production database schema, where you can make changes, experiment, and test. With safe migrations turned on in PlanetScale, branching enables you to have zero-downtime schema migrations, the ability to revert a schema, protection against accidental schema changes, and more with deploy requests.
Deploy requests
When using safe migrations to deploy a database branch to production, you must open a deploy request in PlanetScale. Behind the scenes, though, PlanetScale is running automated linting for you to ensure your schema change is compatible and ensuring it can be safely deployed.
PlanetScale will even check that tables are truly unused during deploy requests and warn you if the table to be dropped was recently queried.
A deploy request automatically checks for:
Conflicts that would prevent the schema from being migrated, such as an invalid column or table charsets, non-null unique keys, and more
Data loss from proposed schema changes
Conflicts with other schema changes from teammates
This removes the burden of a human needing to catch these in a deploy request. It saves time, mental energy, and prevents possible errors in a schema migration. Making schema reviews easier helps reduce the workload of schema changes. It does not require a whole team of DBAs to review developers' schema changes.

Schema reviews
Historically, schema changes have not gotten the same treatment as code changes. Doing a schema review in a PlanetScale deploy request helps give you a transparent schema review process. It is visual and deeply rooted in how we review pull requests for code.
It clearly shows what tables have been created, altered, or dropped. It also shows the exact SQL that will run for the change. This is also useful if the SQL migration was created by an ORM where you only sometimes see the actual SQL being run. Like in a pull request, you can see what is being added and removed from the existing schema.

Non-blocking schema changes
Non-blocking schema changes allow you to update database tables without locking or causing downtime for production databases. For example, a direct ALTER TABLE is a blocking operation, which renders the table completely inaccessible, even for reads. One way to ensure you do not cause downtime for your application is to use an online schema change tool. You can read this comparison of online schema change tools or how online schema change tools work. With deploy requests in PlanetScale, an online schema change tool is built-in.
Historically, adding a column to a large table has been seen as a possibly problematic change, but it is safe in PlanetScale with non-blocking schema changes. Database maintenance windows are a thing of the past. Not only are they bad for users and businesses, but they also cause changes to be often batched up, which gets us in trouble because we are releasing multiple schema changes at once that are harder to test and verify.
Gated deployments
As part of our non-blocking schema change process, instead of directly modifying tables when you deploy a deploy request, we make a copy of the affected tables and apply changes to the copy. We get the data from the original and copy tables in sync. Once complete, we initiate a quick cutover where we swap the tables. Some schema migrations can take less than a minute, and some can take a more extended period of time. Deploying a schema change can take several hours for very large or complex databases.
In some situations, you might not want a schema change to cutover immediately when the deployment process completes, such as when a schema migration completes outside your work hours. You want to be a part of the process to monitor the change and ensure no issues. With Gated Deployments, you can start the deployment process by adding your deploy request to the queue. Once it is done, you can hold off on the cutover. Instead, when it is safer to make the change, you can click a button to swap the tables and complete the deployment.

Revert
What happens when you change a schema and things don't go as planned? We know as developers that nothing happens as we expected 100% of the time. Bad migrations occur all the time. We shouldn't be scared of them. Instead, we should have tools that help us deal with bad migrations.
The revert feature in PlanetScale allows us to go back in time to revert a schema migration deployed to your production database and, in many cases, even retain lost data.
This is possible because of VReplication in Vitess, the database clustering and management system that powers PlanetScale databases alongside MySQL. VReplication uses a lossless sync in the background between valid states of the database. It copies data from the source to the destination table in a consistent fashion. VReplication's implementation is unique because it allows us to go down to the MySQL transaction level, ensuring no data is lost and that your database schema returns to its previous state before the schema change. If you want the behind-the-scenes look at how this is possible, check out this blog about how schema reverts work.
Insights
Some tools use abstractions to improve the developer experience. Insights is the opposite; it is a complexity revealing tool. Insights is PlanetScale's in-dashboard query performance analytics tool. It is built into the platform and doesn't require extra setup to monitor the performance of every database query.
This is useful for safety because it allows you to monitor query performance anytime. What if you release a schema change and notice your users experiencing degraded performance? There could be many causes for this up and down the stack. With Insights, you can quickly confirm if one of your database queries is the cause. It gives you the safety to make schema changes and perform queries while ensuring they do not have problematic database performance.

Insights will surface SQL comments on queries, so you can tag your queries with additional information to track down where they came from. You can see how to add SQL comments in Laravel or Ruby on Rails.
Unless you deeply understand the inner workings of a database, it can be hard to know precisely how some queries will perform. There are the basics, like understanding how an index can improve performance or that you should only SELECT the information you need. But often, our ORMs create queries that we might not fully understand. Having built-in query performance metrics gives us the safety of knowing we do not have problematic queries in our applications.
For example, a developer noticed they were reading a lot of data. The data in Insights clearly showed that full table scans were occurring on specific queries, which made it clear to the developer that their schema could benefit from adding an index. Insights allows you to focus your database performance optimization work.
Automatic backups
Having to use a database backup is the last resort, but it is a necessary one. PlanetScale allows you to create a branch with the data from a backup, test the branch, and when you are ready, promote it to production. The following checklist helps you walk through this process:

Base plans include a backup every 12 hours, but you can easily change the schedule with a few clicks and create manual backups from the UI.
Going forward with safe database schema changes
As a core tenant of developer experience, you should never have to decide between safety and shipping more features. Hopefully, this blog post informed you how database schema changes could safely be made inside and outside PlanetScale. If you are interested in safety features in different frameworks with excellent developer experience, check out our blog posts on Laravel and Ruby on Rails safety features.]]></content>
        <summary><![CDATA[How to prevent schema changes from being scary with database best practices and PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>What is database sharding and how does it work?</title>
        <link href="https://planetscale.com/blog/what-is-database-sharding-and-how-does-it-work" />
        <id>https://planetscale.com/blog/what-is-database-sharding-and-how-does-it-work</id>
        <published>2023-04-06T09:00:00.000Z</published>
        <updated>2023-04-06T09:00:00.000Z</updated>
        
        <author>
          <name>Justin Gage</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[What is sharding?
If you’ve used Google or YouTube, you’ve probably accessed sharded data.
Sharding is a strategy for scaling out your database by storing partitions of your data across multiple servers instead of putting everything on a single giant one. Each partition of data is called a shard. Splitting your database out into shards can help reduce the load on your database, leading to improved performance.
This post will help you understand exactly what database sharding is by walking through how sharding works, how to think about implementing your own sharded database, and some useful tools out there that can help, with a particular focus on MySQL and Postgres.
Sharding to scale out relational databases
Scene: you’ve upsized your MySQL on RDS instance for the 3rd time this quarter and your CFO just put 30 minutes on your calendar to “chat budget.” It might be time to scale out instead of scaling up! [1] Read replicas in RDS seem straightforward enough, but reading data is only half of the problem. What is an overwhelmed developer to do?
Sharding — a term that probably originally came from a video game — is how you scale out relational databases. You’ve probably seen this table before, about how scaling out helps you take this users table, all stored on a single server:
user_id
first_name
last_name
email
...
ZpaDr20TTD4ZL7Wma
Peter
Gibbons
peter@initech.net
...
bI32htQ1PsEQioC7G
Bill
Lumbergh
bill@initech.net
...
99J3x257SGP7J4IkF
Milton
Waddams
stapler@initech.net
...
0SH0pyi9bO5RM4I03
Lawrence

two@onetime.com

...
...
...
...
...
And turn it into this users table, stored across 2 (or 1,000) servers:
user_id
first_name
last_name
email
Server
ZpaDr20TTD4ZL7Wma
Peter
Gibbons
peter@initech.net
Server A
bI32htQ1PsEQioC7G
Bill
Lumbergh
bill@initech.net
Server B
99J3x257SGP7J4IkF
Milton
Waddams
stapler@initech.net
Server B
0SH0pyi9bO5RM4I03
Lawrence

two@onetime.com
Server A
...
...
...

...
But that’s only one type of sharding (row level, or horizontal). There are tons of different ways to split up your data across servers to best match how your business and data model works. Vertical sharding, for example, is when you split things at the schema or table level. More on this later!
Partitioning has existed – especially in OLAP setups – for a long time, primarily as a mechanism for improving query speed. Nightmares of sifting through HDFS partitions to find the missing snapshot pervade my sleep schedule... Anyway, sharding takes that concept and applies it to distributed systems: in addition to splitting up data into logical groups, let’s put those groups across multiple servers that talk to each other. Even Oracle does it!
For as long as relational databases have existed, they’ve been designed to run on a single server. Partially because of that, and partially because of fundamental laws of physics, sharding your data properly is, uh, not very easy.
How database sharding works under the hood
To shard your database, you’ll need to do a few things:
Decide on a sharding scheme — What data gets split up, and how? How is it organized?
Organize your target infrastructure — How many servers are you sharding to? How much data will be on each one?
Create a routing layer — How does your application know where to store new data, and query existing data?
Planning and executing the migration — How do you migrate from a single database to many with minimal downtime?
There’s no hard and fast playbook for each; everyone’s data model and business constraints are different. Let’s dive in.
Sharding schemes and algorithms
How you decide to split up your data into shards – also referred to as your partition strategy – should be a direct function of how your business runs, and where your query load is concentrated. For a B2B SaaS company where every user belongs to an organization, sharding by splitting up organization-level data probably makes sense. If you’re a consumer company, you may want to shard based on a random hash. Notion manually sharded their Postgres database by simply splitting on team ID. All of this is to say that sharding can be as simple or as complicated as you make it.
With that in mind, there are a few popular “algorithms” to decide which rows are stored together and on which servers:
Hash based sharding (also known as key based) – Take a value from the row, hash it, and send buckets of hashes to the same server. Whichever column you choose to hash is your shard key.
Range based sharding – Pick a column, create ranges, and allocate shards based on those ranges. Most useful for numerical columns that are (somewhat) randomly or evenly distributed.
Directory based sharding – Pick a column, allocate shards manually, and maintain a lookup table so you know where each row is stored.
If your sharding scheme isn’t random (e.g. hash based), you can begin to see why query profiling and understanding how your load is distributed can be useful.
Imagine you’re Amazon, and you want to shard your MySQL database that stores customer orders. On the surface, there seems to be no meaningful clustering: sure, you’ve got customers who order a lot of stuff, but that volume (and the associated reads during the shopping process) are basically random. It might make sense to use hash based sharding, and use the order ID as the shard key.
A big part of your sharding scheme is considering which tables are stored together. Joins across databases in a distributed system are difficult and costly, so ideally all of the data you need to answer a particular query exists on the same physical machine. For Amazon, that means the orders table and the products table containing the products in the orders table need to be physically colocated. This also requires incremental maintenance: if a customer makes a new order, the product data for that order needs to be included in the new shard so it can be read quickly later on.
Sharding maintenance is an oft underappreciated piece of scaling out your relational database. Depending on what your partition strategy is, you’ll likely end up with hotspots, where a particular server in your cluster is either storing too much data or handling too much throughput. In our Amazon example, it could be because a large business started ordering a metric-ton of stuff, and all of their data is on one server. Managing those hotspots, redistributing data and load, and reorganizing your partition strategy to prevent future issues is part of what you’re signing up for when you shard.
Deciding on what servers to use
With your sharding scheme set, it’s time to decide on how many machines you want to store data on, and how big you need them to be. There’s no formula here; this decision depends on your budget, projections for future database load, cloud provider, etc.
A common approach is maximizing flexibility. Start with a small number of hosts, and add more as needed. To maintain an even distribution of shards across your servers, you’ll need to re-balance every time you add a host. This is why companies like to choose a number of shards that’s divisible by a lot of smaller numbers; it allows you to scale out the number of servers incrementally while maintaining that smooth, even distribution.
Routing your sharded queries to the right databases
With your data distributed across multiple databases (imagine 20 of them), how does your application know which database to query? You need to build some sort of routing layer that decides. But how?
For those building sharding from scratch, the most common answer is in the application layer. You need to build logic into your application code that decides which database (and schema) to connect to for a particular query, conditional on the data inside that query and where it belongs in your sharding scheme. The logic looks something like:if data.sharding_key in database_1.sharding_keys:
  …connect to database_1
else if data.sharding_key in database_2.sharding_keys:
  …connect to database_2

Depending on how you’ve partitioned your data and the number of physical machines / databases you’re working with, this logic can be relatively simple and stored in a JSON blob, config file, etc. More commonly, teams will use some sort of key value store or a lookup table in a database. The important thing is to have the information that ties a piece of data to its destination encoded somewhere so your application knows where to issue the query.
Building this for the first time is actually not that difficult; it’s the operational maintenance that becomes the real problem over time. If you move shards from database to database, rebalance, add new machines, remove machines, change any database properties…you’ll need to update that application logic to account for it. ProxySQL isn’t a full fledged solution for this, but it could be classified as a rough “shard routing” service.
Planning and executing your migration to a sharded solution
Once you’ve taken care of all of the above and have your physical servers running with empty databases on them, plus a plan for routing in your application logic, you’re faced with the age-old problem of how to migrate without (too much) downtime. Unlike a (potentially) more straightforward migration to a single new database provider, moving to sharding introduces a lot more things that can go wrong and in more ways.
Notion’s engineering team suggested a useful framework for thinking about the migration in their post about how they implemented sharding:
Double-write: Incoming writes get applied to both the old and new databases.
Backfill: Once double-writing has begun, migrate the old data to the new database.
Verification: Ensure the integrity of data in the new database.
Switch-over: Actually switch to the new database. This can be done incrementally, e.g. double-reads, then migrate all reads.
Each of these steps still introduces the possibility of downtime; it’s just a risk you’re going to have to take for changes at this scale.
Sharding frameworks and tools
Though many teams do build sharding for their database of choice from scratch, there is an ecosystem of tools, albeit perhaps less mature than the database software they’re built on.
Vitess
Vitess was built at YouTube when they needed to shard MySQL, and is now available to you and me. It’s basically a layer on top of MySQL that gives you sharding, and a lot of other neat stuff related to really big workloads: connection pooling, dynamic re-sharding and balancing, and monitoring tools, among other things. For a technical overview of how Vitess improves on vanilla MySQL, check out their comparison here.
As far as I’m aware, Vitess is the most mature and the most popular OSS sharding layer for a relational database. It served all YouTube DB traffic for years, and is in production at Slack, GitHub, NewRelic, Pinterest, Square, etc.
PlanetScale offers fully-managed Vitess clusters. If you're looking for a pain-free sharding solution for your MySQL database, we can help. Contact us and we'll be in touch shortly. 
Citus
Citus does what Vitess does for MySQL, but for Postgres (minus some more flashy features). It’s open source, designed as a Postgres extension, and can be run as a single node or several. It’s in production at Algolia, Heap, Cisco, and a few more. Their docs have good general advice for picking your sharding scheme, Citus or otherwise.
If you don't need sharding, but are interested in ultra fast NVMe-backed instances, check out PlanetScale for Postgres.
The serverless database wave
I suppose the more fundamental question is: why are you not using a database that does sharding for you? Over the past few years the so-called “serverless” database has gotten a lot more traction. Starting with the infamous Spanner paper, many have been thinking about how running a distributed system should be native to the database itself, the foremost of which has been CockroachDB. You can even run cloud Spanner on GCP.
You’re reading this blog on the PlanetScale website. They sell a shard-native (did I just coin this?) database built on MySQL and Vitess. I’m not a PlanetScale employee, but I am a big proponent of what they’re doing, specifically shifting the focus in databases to developer experience instead of infrastructure maintenance.
The question is starting to become: if you’re paying someone like AWS to run your database for you, why are you busy figuring out how to scale out that database? And I think that’s a good question the major cloud providers should be asking themselves.
References
[1] There is no shortage of opinion pieces on the web arguing against premature sharding. This post assumes an educated reader who can judge when scaling out is the right decision and when it isn’t.
FAQs
What is database sharding?
Database sharding is a strategy for scaling out a database by splitting its data across multiple servers instead of storing everything on one. Each chunk of data is called a shard. Rather than upgrading to a bigger single server (scaling up), sharding lets you add more servers (scaling out) to handle more data and traffic.
What's the difference between database sharding and replication?
Sharding and replication are complementary but distinct strategies. Sharding splits data across multiple servers so each server holds only a subset of the total data, reducing write load and storage pressure. Replication copies the same data to multiple servers, improving read performance and redundancy. Many production systems use both: sharded databases are often replicated within each shard for high availability.
When should you not shard your database?
Sharding can add significant operational complexity — you're responsible for choosing a sharding scheme, building a routing layer, managing hotspots, and executing a careful migration. It's generally not the first option you should pursue if your database can still scale vertically (bigger machines), if read replicas solve your bottleneck, or if your data model relies heavily on cross-table joins (which become expensive across shards).
That said, there are managed databases like PlanetScale (Vitess and Neki) that handle sharding at a proxy layer, transparent to your application code. This significantly decreases the complexity of operating a sharded database and are worth considering before implementing app-level sharding yourself.]]></content>
        <summary><![CDATA[Learn what database sharding is, how sharding works, and some common sharding frameworks and tools.]]></summary>
      </entry>
    
      <entry>
        <title>An update to our workflow: safe migrations</title>
        <link href="https://planetscale.com/blog/update-to-our-workflow-safe-migrations" />
        <id>https://planetscale.com/blog/update-to-our-workflow-safe-migrations</id>
        <published>2023-04-05T15:50:00.000Z</published>
        <updated>2023-04-05T15:50:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We just released an update to our branching workflow: safe migrations.
Safe migrations give you the option to enable direct DDL on your production branches. Prior to this release, direct DDL was prohibited on all PlanetScale production branches to enable non-blocking schema changes.
 Direct DDL refers to the execution of DDL (Data Definition Language) SQL statements directly on your production database. This can lead to table locking and downtime. 
How this affects existing databases
If you have an existing production branch, safe migrations have been automatically enabled. Your database has not experienced any changes, and no action is necessary. Going forward, any time you want to enable non-blocking schema changes on a new production branch, you must turn on safe migrations by clicking the "Enable safe migrations" toggle when promoting a branch to production.

 Another consequence of disabling safe migrations is that you will no longer be able to use deploy requests, schema reverts, and gated deploys, and you will not receive data loss warnings via the deploy request workflow. For more information, see our safe migrations documentation. 
Why did we introduce safe migrations?
PlanetScale went GA on November 16th, 2021. Since then, we’ve continued to innovate by releasing new features that make it easier to manage your database. At our core, we are a customer-driven company, so we love hearing about how people are using our product, as well as feedback about how we can make it better.
We want to ensure we remain the best place to run MySQL, so listening to and acting on user feedback is important to us. This is why we’re constantly releasing improvements and tweaks like safe migrations — to ensure that we're making PlanetScale accessible to all users at all stages of their application development.
Some of our feedback indicated that a small number of users were using workarounds to run direct DDL on their production databases. In some cases, users were utilizing development branches for production workloads, which is never recommended. Stability and availability are the most important product features that PlanetScale offers. This is what led to the development of safe migrations. We want to make sure that customers can use our databases in a highly available configuration via production branches while still maintaining the ability to use their existing tools and processes.
While we still strongly recommend turning on safe migrations for production databases, we ultimately want to give you more power to utilize the workflow that makes sense for you while still providing the tools that allow for safer schema changes, should you decide to use them.
Let’s look at a few of the scenarios where safe migrations may benefit users.
Early development and/or low-traffic applications
If you're in the early "move fast and break things" stage of launching your application, then there's a good chance you're still experimenting with the schema of your database. You're optimizing for speed of change rather than safety. In this case, it makes sense to start your application with safe migrations disabled, allowing you to directly iterate on your schema in production.
Laravel, Rails, and other frameworks with built-in migrations
This update can also benefit developers who have an existing preferred schema migration workflow and would like to keep using it. For frameworks such as Rails and Laravel, it's common for developers to run their schema migrations directly against their production database as part of their deployment process. This workflow is fast and convenient, but does have the potential for downtime as your data size grows.
When your application needs more advanced schema migration tooling, you'll be able to enable safe migrations and integrate it into your deployment process.
Check out the Rails workflow we use in-house for migrating our own database with safe migrations.
WordPress
The last scenario where users may see better support is using PlanetScale with some CMS platforms like WordPress.
In many WordPress setups, you might be used to clicking a button in your WordPress dashboard to add or update a plugin, which may require running DDL on your production database. We've found that not having an option for this workflow is a barrier to entry for some WordPress developers, so you can now turn off safe migrations when you need to update a plugin.
Wrap up
Ultimately, we want to unblock and support our users at every point of their journey. PlanetScale is built to accommodate your application or business from pre-launch all the way to millions and millions of users. Our pricing plans reflect that, and now our workflow options do too.
If you have any questions, we'd love to hear from you. You can reach out to our Support team or find us on Twitter @planetscale.]]></content>
        <summary><![CDATA[Learn about our latest update safe migrations and how it affects our branching workflow.]]></summary>
      </entry>
    
      <entry>
        <title>Declarative schema migrations</title>
        <link href="https://planetscale.com/blog/declarative-schema-changes" />
        <id>https://planetscale.com/blog/declarative-schema-changes</id>
        <published>2023-04-05T14:00:00.000Z</published>
        <updated>2023-04-05T14:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[The DevOps world has embraced the concept of Infrastructure as Code (IaC) as a way to define infrastructure in configuration files. These configuration files can then be used with orchestration tools to automatically deploy and configure architecture in the hosting provider of your choice.
As an example, the following code snippet can be used by the AWS Serverless Application Model (SAM) CLI and will deploy a Lambda function to AWS, and configure an API Gateway instance to execute the function over HTTP:AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  sam-go-sample

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello-world/
      Handler: hello-world
      Runtime: go1.x
      Events:
        CatchAll:
          Type: Api
          Properties:
            Path: /hello
            Method: GET

Performing the above actions manually, while not prohibitively difficult, would certainly take more time than deploying this configuration with a simple CLI command. This is also a fairly simple example. Consider how much manual effort it would take to configure and deploy 20 Lambda functions!
Declarative SQL Schemas
Several tools can manage your database schema in a very similar way to IaC tools. Using these tools, you can define your SQL schema in a specially-crafted file that the tool can understand, and simply apply the changes using the CLI. For example, the following file can be used by the Atlas CLI to define a schema:table "hotels" {
  schema = schema.hotels_db
  column "id" {
    null           = false
    type           = int
    unsigned       = true
    auto_increment = true
  }
  column "name" {
    null = false
    type = varchar(50)
  }
  column "address" {
    null = false
    type = varchar(50)
  }
  primary_key {
    columns = [column.id]
  }
}
schema "hotels_db" {
  charset = "utf8mb4"
  collate = "utf8mb4_0900_ai_ci"
}

Making a change to the schema is as simple as modifying the file and applying the changes using the CLI tool.table "hotels" {
  schema = schema.hotels_db
  column "id" {
    null           = false
    type           = int
    unsigned       = true
    auto_increment = true
  }
  column "name" {
    null = false
    type = varchar(50)
  }
  column "address" {
    null = false
    type = varchar(50)
  }
  # Adding the "stars" column.
  column "stars" {
    null     = true
    type     = float
    unsigned = true
  }
  primary_key {
    columns = [column.id]
  }
}
schema "hotels_db" {
  charset = "utf8mb4"
  collate = "utf8mb4_0900_ai_ci"
}

Refer to our blog post on how to use the Atlas CLI with PlanetScale for more detail.
Benefits of a declarative approach
Managing schema migrations with this approach has some benefits. The first major benefit is that it fits the Single Source of Truth approach encouraged by DevOps, where there is one place that contains the main file used to control the schema.
It is also easier to read by developers in comparison to using versioned migrations. In addition to being easier to understand, it may eliminate the need to learn DDL, the language used by SQL to define the schema. This makes it a lower barrier to entry for developers that may not be experienced with SQL yet.
Finally, automating the process of applying changes is fairly simple since many of the tools used to apply changes can be scripted. This makes it easy to implement the process of upgrading your schema into your continuous deployment tools.
Drawbacks of this strategy
While eliminating the need to learn DDL can be a benefit, using tools to circumvent the process of learning may act as a crutch for developers.
Conflicting schema definitions are also a concern with this approach. If you consider that multiple developers may be making changes to the schema definition files at the same time on separate machines, you may run into a scenario where one developer's changes will overwrite another's, causing conflicts in what the database schema should be.
It’s also worth considering that databases are inherently stateful, where the data that is stored by the database is just as important as the structure of the database. Because of this, some care needs to be taken when applying changes so there are no undesired results of migrating the schema.
How to use declarative migrations with PlanetScale
The branching flow used by databases hosted in PlanetScale is a form of schema migration in itself. When making changes to a database in PlanetScale, developers will typically create a working branch of the production database branch to make changes to.
A best practice on PlanetScale is to enable safe migrations to prevent accidental changes to your database schema. Since these branches restrict the use of DDL (something that these tools ultimately use to make changes), the development branch used in the previous example would be where these tools can be used to control the schema.
One possible strategy that teams can use is to open a new branch each time code changes are required, typically at the beginning of a development cycle. When a change needs to be made to the database schema, a dedicated repository (let’s call it the db repository) can be used for developers to check in changes to the definition file. Automated tools can be used to monitor the db repository for changes, apply the schema changes to the active development branch, and notify the development team that the schema has changed so they can act accordingly.
When changes need to be applied to the production database branch, deploy requests can then be used to review and apply the changes before deploying the latest release.]]></content>
        <summary><![CDATA[Learn how the schema migrations are performed using a single state definition.]]></summary>
      </entry>
    
      <entry>
        <title>Versioned schema migrations</title>
        <link href="https://planetscale.com/blog/versioned-schema-migrations" />
        <id>https://planetscale.com/blog/versioned-schema-migrations</id>
        <published>2023-04-05T14:00:00.000Z</published>
        <updated>2023-04-05T14:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Schema versioning tools have existed long before their declarative counterparts. Instead of having a single file describing the state of the database schema, versioned schema migrations consist of multiple files or scripts that iterate on each other to describe the database as it moves through time. As changes are made to the schema, new files are added to describe those changes. It works very similarly to a system you may already be familiar with: git.
Migration files are typically stored along with the code and, using third-party tooling, are applied to the database incrementally as needed. Those files are usually numbered in the order they need to be applied. The system will use a dedicated table within your database to track which scripts have been applied, and which ones still need to be applied.
If you're already well versed in versioned schema migrations and just want to see how they work using PlanetScale, skip to the How to use versioned schema migrations with PlanetScale section.
Example with Laravel and Artisan
The following example uses the default Laravel example application with the artisan command to perform versioned migrations. When the application is scaffolded, a database/migrations folder will be created within the project that contains a base set of migration scripts.

Here are the contents of that first file. It is using PHP to define the structure of a table. When read by artisan, it will be converted to the DDL that is required to create the same structure in MySQL.# 2014_10_12_000000_create_users_table.php

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up()
    {
        Schema::create('users', function (Blueprint $table) {
            $table->id();
            $table->string('name');
            $table->string('email')->unique();
            $table->timestamp('email_verified_at')->nullable();
            $table->string('password');
            $table->rememberToken();
            $table->timestamps();
        });
    }

    public function down()
    {
        Schema::dropIfExists('users');
    }
};

To create the basic structure of the database, the following command will be run. Notice how ALL migration scripts within that folder are run sequentially based on the file name.~❯ ./vendor/bin/sail artisan migrate

# Output:
   INFO  Preparing database.

  Creating migration table .............................. 45ms DONE

   INFO  Running migrations.

  2014_10_12_000000_create_users_table .................. 45ms DONE
  2014_10_12_100000_create_password_resets_table ........ 64ms DONE
  2019_08_19_000000_create_failed_jobs_table ............ 38ms DONE
  2019_12_14_000001_create_personal_access_tokens_table . 44ms DONE

Next, we can explore the structure of the database. Notice how a migrations table exists now and it contains the name of each of the migration scripts, along with a batch number stored in the batch column to signal to artisan that it's been run previously.mysql> show tables;
+------------------------+
| Tables_in_example_app  |
+------------------------+
| failed_jobs            |
| migrations             |
| password_resets        |
| personal_access_tokens |
| users                  |
+------------------------+
5 rows in set (0.01 sec)

mysql> select * from migrations;
+----+-------------------------------------------------------+-------+
| id | migration                                             | batch |
+----+-------------------------------------------------------+-------+
|  1 | 2014_10_12_000000_create_users_table                  |     1 |
|  2 | 2014_10_12_100000_create_password_resets_table        |     1 |
|  3 | 2019_08_19_000000_create_failed_jobs_table            |     1 |
|  4 | 2019_12_14_000001_create_personal_access_tokens_table |     1 |
+----+-------------------------------------------------------+-------+
4 rows in set (0.01 sec)

Now to upgrade the schema, we can run another migration script that follows the same naming convention as the others. This script will add a nickname column to the users table.# 2023_01_13_000001_add_new_column.php

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up()
    {
        Schema::table('users', function (Blueprint $table) {
            $table->string('nickname');
        });
    }

    public function down()
    {
        Schema::table('users', function (Blueprint $table) {
            $table->dropColumn('nickname');
        });
    }
};

Now we'll run the same migrate command as was run before. The output will be much less since it is only the one script that is run.~❯ ./vendor/bin/sail artisan migrate

   INFO  Running migrations.

  2023_01_13_000001_add_new_column ...................... 32ms DONE

Reviewing the migrations table again shows that the script was run successfully.mysql> select * from migrations;
+----+-------------------------------------------------------+-------+
| id | migration                                             | batch |
+----+-------------------------------------------------------+-------+
|  1 | 2014_10_12_000000_create_users_table                  |     1 |
|  2 | 2014_10_12_100000_create_password_resets_table        |     1 |
|  3 | 2019_08_19_000000_create_failed_jobs_table            |     1 |
|  4 | 2019_12_14_000001_create_personal_access_tokens_table |     1 |
|  5 | 2023_01_13_000001_add_new_column                      |     2 |
+----+-------------------------------------------------------+-------+

And if we inspect the users table, the nickname column now exists.mysql> describe users;
+-------------------+-----------------+------+-----+---------+----------------+
| Field             | Type            | Null | Key | Default | Extra          |
+-------------------+-----------------+------+-----+---------+----------------+
| id                | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| name              | varchar(255)    | NO   |     | NULL    |                |
| email             | varchar(255)    | NO   | UNI | NULL    |                |
| email_verified_at | timestamp       | YES  |     | NULL    |                |
| password          | varchar(255)    | NO   |     | NULL    |                |
| remember_token    | varchar(100)    | YES  |     | NULL    |                |
| created_at        | timestamp       | YES  |     | NULL    |                |
| updated_at        | timestamp       | YES  |     | NULL    |                |
| nickname          | varchar(255)    | NO   |     | NULL    |                |
+-------------------+-----------------+------+-----+---------+----------------+

Now if I wanted to undo the previous migration for whatever reason, the following command can be run to essentially execute the down() function from the previous migration.~ ❯ ./vendor/bin/sail artisan migrate:rollback --step=1

   INFO  Rolling back migrations.

  2023_01_13_000001_add_new_column ...................... 41ms DONE

Reviewing the same tables one more time shows that the column has now been removed.mysql> select * from migrations;
+----+-------------------------------------------------------+-------+
| id | migration                                             | batch |
+----+-------------------------------------------------------+-------+
|  1 | 2014_10_12_000000_create_users_table                  |     1 |
|  2 | 2014_10_12_100000_create_password_resets_table        |     1 |
|  3 | 2019_08_19_000000_create_failed_jobs_table            |     1 |
|  4 | 2019_12_14_000001_create_personal_access_tokens_table |     1 |
+----+-------------------------------------------------------+-------+
4 rows in set (0.01 sec)

mysql> describe users;
+-------------------+-----------------+------+-----+---------+----------------+
| Field             | Type            | Null | Key | Default | Extra          |
+-------------------+-----------------+------+-----+---------+----------------+
| id                | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| name              | varchar(255)    | NO   |     | NULL    |                |
| email             | varchar(255)    | NO   | UNI | NULL    |                |
| email_verified_at | timestamp       | YES  |     | NULL    |                |
| password          | varchar(255)    | NO   |     | NULL    |                |
| remember_token    | varchar(100)    | YES  |     | NULL    |                |
| created_at        | timestamp       | YES  |     | NULL    |                |
| updated_at        | timestamp       | YES  |     | NULL    |                |
+-------------------+-----------------+------+-----+---------+----------------+
8 rows in set (0.01 sec)

Benefits of this strategy
As stated in the previous section, versioned schema migrations have been around for much longer than declarative migrations. This means developers are likely more familiar with how they work and may be more comfortable working in this environment.
Many tools that support versioned migrations support going both directions, upgrading and/or downgrading the schema. This makes reverting changes simpler since a single script will have instructions on performing a downgrade, assuming the developers or database administrators include those details in the migration scripts.
Finally, it's easier to track incremental changes without using a version control system. Since all of the migration scripts are stored alongside each other, diagnosing migration issues may be a bit more straightforward when compared to the declarative approach.
Drawbacks of this strategy
Since the schema is managed incrementally via scripts, it may be hard to get a full picture of what the database schema looks like at any given point in time. You’d essentially have to replay all of the previous scripts against a live system to see the schema in full.
Depending on the tool, it may not validate the current state of the schema before attempting to apply changes. This can cause major issues if the schema was modified outside of the tool and DDL was issued directly to the database.
How to use versioned schema migrations with PlanetScale
How you would use versioned migrations on PlanetScale ultimately depends on if safe migrations is enabled for your production branch.
Without safe migrations
If safe migrations is not enabled for your production branch, versioned migrations would work with PlanetScale branches just as they would with any other MySQL environment. Ideally, you would use different database branches to match your different environments. When your code is ready for production, simply run the upgrade command for your respective migration tools with the connection string for the branch you want the changes to, and your tooling should apply the changes.
That said, enabling safe migrations is a best practice to prevent unintended schema changes, among enabling other useful features.
With safe migrations
When safe migrations is enabled on your production branch, use of branching and deploy requests is enforced to enable zero-downtime migrations, and use of direct DDL is restricted as a result. In this scenario, you would create a development branch, connect your development environment to the PlanetScale development branch, and run your migrations there. Your development branch will now have the updated schema, and is ready to merge into your production database via a PlanetScale deploy request.
Typically when deploy requests are used to merge database branches, it's only the schema that is changed in the target without writing or altering any data. While this may seem like an issue at first (since a table is used to track what changes have been applied), PlanetScale offers a setting in every Vitess database to automatically copy migration data between branches. This can be set to several preconfigured ORMs, or you can provide a custom table name to sync between database branches.
For additional examples of handling versioned schema changes with PlanetScale, see the following blog posts:
Building PlanetScale with PlanetScale
Zero downtime Laravel migrations
]]></content>
        <summary><![CDATA[Learn how the schema migrations are performed iteratively by evolving change scripts.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing the PlanetScale GitHub Actions</title>
        <link href="https://planetscale.com/blog/announcing-the-planetscale-github-actions" />
        <id>https://planetscale.com/blog/announcing-the-planetscale-github-actions</id>
        <published>2023-03-31T09:00:00.000Z</published>
        <updated>2023-03-31T09:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[With our database branching and deploy request workflow, PlanetScale was built with DevOps pipelines in mind. However, anyone who's worked in DevOps long enough knows it is a very "choose your own adventure" practice. While guidelines exist, many companies build their pipelines very differently according to the needs of their products. Integrating PlanetScale into this flow is no exception. Today, we're lowering the barrier to entry by publishing the first wave of official PlanetScale GitHub Actions for you to use in your projects.
What are GitHub Actions?
Before we cover the available Actions, it's worth taking a moment to understand what GitHub Actions is. GitHub Actions allows you to automate processes directly within your repository by defining jobs in YAML that will perform operations based on the triggers you define. These YAML files are referred to as "workflows," whereas the individual operations within are called "steps." Developers are free to write their own steps using Bash or PowerShell manually, or they can search the GitHub Actions Marketplace for a pre-defined step that performs the operations they need without having to build the functionality themselves.
The setup-pscale-action GitHub Action
The planetscale/setup-pscale-action allows you to make pscale, the PlanetScale CLI tool, available within your GitHub Actions workflows.
Once installed, you are able to automate workflows via pscale on Linux, Windows, or Mac runners.
Such as:
Creating database branches
Creating branch passwords and connecting to your database
Opening deploy requests
Auto commenting schema diffs on pull requests
and more...
Let's look at some examples.
Create a new database branch when a GitHub branch is created
If your team opens a new database branch whenever a feature branch is created in your GitHub repository, you can use pscale to create a branch and a password. The branch credentials can then be usedas environment variables in a preview or staging environment.- name: Create branch
  env:
    PLANETSCALE_SERVICE_TOKEN_ID: ${{ secrets.PLANETSCALE_SERVICE_TOKEN_ID }}
    PLANETSCALE_SERVICE_TOKEN: ${{ secrets.PLANETSCALE_SERVICE_TOKEN }}
  run: |
    set +e
    pscale branch show ${{ secrets.PLANETSCALE_DATABASE_NAME }} ${{ env.PSCALE_BRANCH_NAME }}
    exit_code=$?
    set -e

    if [ $exit_code -eq 0 ]; then
      echo "Branch exists. Skipping branch creation."
    else
      echo "Branch does not exist. Creating."
      pscale branch create ${{ secrets.PLANETSCALE_DATABASE_NAME }} ${{ env.PSCALE_BRANCH_NAME }} --wait
    fi

Notice that we first check if the branch exists. If it does, we do nothing. Otherwise, we create it and pass the --wait flag.
This is useful when running in CI, as the workflow may run multiple times and you'll want the branch ready if you are running schema migrations immediately after creating the branch.
Create a password for the branch
You can use pscale password create to generate credentials for your database branch.- name: Generate password for branch
  env:
    PLANETSCALE_SERVICE_TOKEN_ID: ${{ secrets.PLANETSCALE_SERVICE_TOKEN_ID }}
    PLANETSCALE_SERVICE_TOKEN: ${{ secrets.PLANETSCALE_SERVICE_TOKEN }}
  run: |
    response=$(pscale password create ${{ secrets.PLANETSCALE_DATABASE_NAME }} ${{ env.PSCALE_BRANCH_NAME }} -f json)

    id=$(echo "$response" | jq -r '.id')
    host=$(echo "$response" | jq -r '.access_host_url')
    username=$(echo "$response" | jq -r '.username')
    password=$(echo "$response" | jq -r '.plain_text')
    ssl_mode="verify_identity"  # Assuming a default value for ssl_mode
    ssl_ca="/etc/ssl/certs/ca-certificates.crt"  # Assuming a default value for ssl_ca

    # Set the password ID, allows us to later delete it if wanted.
    echo "PASSWORD_ID=$id" >> $GITHUB_ENV

    # Create the DATABASE_URL
    database_url="mysql://$username:$password@$host/${{ secrets.PLANETSCALE_DATABASE_NAME }}?sslmode=$ssl_mode&sslca=$ssl_ca"
    echo "DATABASE_URL=$database_url" >> $GITHUB_ENV
    echo "::add-mask::$DATABASE_URL"
- name: Use the DATABASE_URL in a subsequent step
  run: |
    echo "Using DATABASE_URL: $DATABASE_URL"

This example shows creating the password and getting back a response in JSON. The JSON is then parsed to create a DATABASE_URL which can be used in later steps, such as usingthe branch as the database for a preview environment or to connect and run migrations that were included in the GitHub pull request.
Open a deploy request
You can use pscale deploy-request create to open a new deploy request from GitHub Actions.This can be useful after running migrations against a branch.- name: Open DR if migrations
  env:
    PLANETSCALE_SERVICE_TOKEN_ID: ${{ secrets.PLANETSCALE_SERVICE_TOKEN_ID }}
    PLANETSCALE_SERVICE_TOKEN: ${{ secrets.PLANETSCALE_SERVICE_TOKEN }}
  run: pscale deploy-request create ${{ secrets.PLANETSCALE_DATABASE_NAME }} ${{ env.PSCALE_BRANCH_NAME }}

Get deploy request diff and comment on pull request
We can use pscale deploy-request diff to see the full schema diff of a deploy request.
This example is useful when combined with opening a deploy request for a git branch. You can then automatically comment the diff back to the GitHub pull request.- name: Comment on PR
  env:
    PLANETSCALE_SERVICE_TOKEN_ID: ${{ secrets.PLANETSCALE_SERVICE_TOKEN_ID }}
    PLANETSCALE_SERVICE_TOKEN: ${{ secrets.PLANETSCALE_SERVICE_TOKEN }}
  run: |
    echo "Deploy request opened: https://app.planetscale.com/${{ secrets.PLANETSCALE_ORG_NAME }}/${{ secrets.PLANETSCALE_DATABASE_NAME }}/deploy-requests/${{ env.DEPLOY_REQUEST_NUMBER }}" >> migration-message.txt
    echo "" >> migration-message.txt
    echo "\`\`\`diff" >> migration-message.txt
    pscale deploy-request diff ${{ secrets.PLANETSCALE_DATABASE_NAME }} ${{ env.DEPLOY_REQUEST_NUMBER }}  -f json | jq -r '.[].raw' >> migration-message.txt
    echo "\`\`\`" >> migration-message.txt
- name: Comment PR - db migrated
  uses: thollander/actions-comment-pull-request@v2
  with:
    filePath: migration-message.txt

This writes the diff to the migration-message.txt file and then creates a comment on the pull request that triggered the workflow.
How to use the PlanetScale GitHub Actions
To get started using PlanetScale + GitHub Actions, see our full guide and complete page of examples here.]]></content>
        <summary><![CDATA[Easily integrate common PlanetScale operations directly into your GitHub Actions workflows.]]></summary>
      </entry>
    
      <entry>
        <title>Building SaaS applications with PlanetScale + Netlify</title>
        <link href="https://planetscale.com/blog/building-saas-applications-planetscale-netlify" />
        <id>https://planetscale.com/blog/building-saas-applications-planetscale-netlify</id>
        <published>2023-03-30T13:00:00.000Z</published>
        <updated>2023-03-30T13:00:00.000Z</updated>
        
        <author>
          <name>Liz van Dijk</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[PlanetScale and Netlify join forces to cover how to integrate PlanetScale into Netlify Functions for common SaaS application use cases.]]></summary>
      </entry>
    
      <entry>
        <title>How to read MySQL EXPLAINs</title>
        <link href="https://planetscale.com/blog/how-read-mysql-explains" />
        <id>https://planetscale.com/blog/how-read-mysql-explains</id>
        <published>2023-03-29T09:00:00.000Z</published>
        <updated>2023-03-29T09:00:00.000Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In the MySQL world, EXPLAIN is a keyword used to gain information about query execution. This blog post will demonstrate how to utilize MySQL EXPLAIN to remedy problematic queries.
On the Technical Solutions team here at PlanetScale, we frequently talk with users who seek advice regarding query performance. Although creating an EXPLAIN plan is relatively simple, the output isn’t exactly intuitive. It’s essential to understand its features and how to leverage it best to achieve performance goals.
EXPLAIN vs. EXPLAIN ANALYZE
When you prepend the EXPLAIN keyword to the beginning of a query, it explains how the database executes that query and the estimated costs. By leveraging this internal MySQL tool, you can observe the following:
The ID of the query — The column always contains a number, which identifies the SELECT to which the row belongs.
The SELECT_TYPE — If you are running a SELECT, MySQL divides SELECT queries into simple and primary (complex) types, as described in the table below.
SELECT_TYPE VALUE
Definition
SIMPLE
The query contains no subqueries or UNIONs
PRIMARY (complex)
Complex types can be grouped into three broad classes: simple subqueries, derived tables (subqueries in the FROM clause), and UNIONs.
DELETE
If you are explaining a DELETE, the select_type will be DELETE
The table on which your query was running
Partitions accessed by your query
Types of JOINs used (if any) — Please keep in mind that this column gets populated even on queries that don’t have joins.
Indexes from which MySQL could choose
Indexes MySQL actually used
The length of the index chosen by MySQL — When MySQL chooses a composite index, the length field is the only way you can determine how many columns from that composite index are in use.
The number of rows accessed by the query — When designing indexes inside of your database instances, keep an eye on the rows column too. This column displays how many rows MySQL accessed to complete a request, which can be useful when designing indexes. The fewer rows your query accesses, the faster your queries will be.
Columns compared to the index
The percentage of rows filtered by a specified condition — This column shows a pessimistic estimate of the percentage of rows that will satisfy some condition on the table, such as a WHERE clause or a join condition. If you multiply the rows column by this percentage, you will see the number of rows MySQL estimates it will join with the previous tables in the query plan.
Any extra information relevant to the query
To recap, by using EXPLAIN, you get the list of things expected to happen.
What is EXPLAIN ANALYZE
In MySQL 8.0.18, EXPLAIN ANALYZE was introduced, a new concept built on top of the regular EXPLAIN query plan inspection tool. In addition to the query plan and estimated costs, which a normal EXPLAIN will print, EXPLAIN ANALYZE also prints the actual costs of individual iterators in the execution plan.
EXPLAIN ANALYZE actually runs the query, so if you don’t want to run the query against your live database, do not use EXPLAIN ANALYZE.
For each iterator, the following information is provided:
Estimated execution cost (the cost model does not account for some iterators, so they aren’t included in the estimate)
Estimated number of returned rows
Time to return first row
Time spent executing this iterator (including child iterators, but not parent iterators), in milliseconds. When there are multiple loops, this figure shows the average time per loop.
Number of rows returned by the iterator
Number of loops

If you use EXPLAIN ANALYZE before a statement, you get both the estimation of what the planner expected (highlighted in yellow above) and what actually happened when the query was run (highlighted in green above).
EXPLAIN ANALYZE formats
EXPLAIN ANALYZE can be used with SELECT statements, multi-table UPDATE statements, DELETE statements, and TABLE statements.
It automatically selects FORMAT=tree and executes the query (with no output to the user). It focuses on how the query is executed in terms of the relationship between parts of the query and the order in which the parts are executed.
In this case, EXPLAIN output is organized into a series of nodes. At the lowest level, the nodes scan the tables or search indexes. Higher-level nodes take the operations from the lower-level nodes and operate on them.
Although the MySQL CLI can print EXPLAIN results in table, tabbed, vertical format, or as pretty or raw JSON output, raw JSON format is not supported for EXPLAIN ANALYZE today.
When to use MySQL EXPLAIN or EXPLAIN ANALYZE
EXPLAIN queries can (and should) be used when you are unsure whether your query is performing efficiently. So, if you think you have indexed and partitioned your tables properly, but your queries still refuse to run as fast as you want them to, it might be time to tell them to EXPLAIN themselves. Once you tell your queries to EXPLAIN themselves, the output you should keep an eye on will depend on what you want to optimize.
Keys, possible keys, and key lengths: When working with indexes in MySQL, keep an eye on the possible_keys, key, and key_len columns. The possible_keys column tells us what indexes MySQL could potentially use. The key column tells us what index was chosen. And the key_len column tells us the length of the selected key (index). This information can be handy for designing our indexes, deciding what index to use on a specific workload, and dealing with index-related challenges like choosing an appropriate length for a covering index.
Fulltext index + JOIN: If you want to ensure that your queries are participating in JOIN operations when using a FULLTEXT index, keep an eye out for the select_type column — the value of this column should be fulltext.
Partitions: If you have added partitions to your table and want to ensure that partitions are used by the query, observe the partition column. If your MySQL instance is using partitions, in most cases, MySQL deals with all of the queries itself, and you do not have to take any further action, but if you want your queries to use specific partitions, you could use queries like SELECT \* FROM TABLE_NAME PARTITION(p1,p2).
We already have some great resources about indexing best practices:
Indexing JSON in MySQL
What are the disadvantages of database indexes?
MySQL for Developers video course: Indexes
How do database indexes work
EXPLAIN limitations
EXPLAIN is an approximation. Sometimes it’s a good approximation, but at other times, it can be very far from the truth. Let's look at some of the limitations:
EXPLAIN doesn’t tell you anything about how triggers, stored functions, or UDFs will affect your query.
It doesn’t work for stored procedures.
It doesn’t tell you about the optimization MySQL does during query execution.
Some of the statistics it shows are estimates and can be very inaccurate.
It doesn’t distinguish between some things with the same name. For example, it uses “filesort” for in-memory sorts and on-disk sorts, and it displays “Using temporary” for temporary tables on disk and in memory.
PlanetScale does not support Triggers, Stored Procedures, and UDFs. More information can be found in the MySQL compatibility docs.
SHOW Warnings statement
One thing worth noting: If the query you used with EXPLAIN does not parse correctly, you can type SHOW WARNINGS; into your MySQL query editor to show information about the last statement that was run and was not diagnostic. While it cannot give a proper query execution plan like EXPLAIN, it might give hints about the query fragments it could process.
SHOW WARNINGS; includes special markers which can deliver useful information, such as:
<index_lookup>(query fragment): An index lookup would happen if the query had been properly parsed.
<if>(condition, expr1, expr2): An if condition is occurring in this specific part of the query.
<primary_index_lookup>(query fragment): An index lookup would be happening via primary key.
<temporary table>: An internal table would be created here for saving temporary results — for example, in subqueries prior to joins.
MySQL EXPLAIN join types
The MySQL manual says this column shows the “join type”, which explains how tables are joined, but it’s really more accurate to say the "access type". In other words, this “type” column lets us know how MySQL has decided to find rows in the table. Below are the most important access methods, from best to worst, in terms of performance:

Type value
Definition
🟢
NULL
This access method means MySQL can resolve the query during the optimization phase and will not even access the table or index during the execution stage.
🟢
system
The table is empty or has one row.
🟢
const
The value of the column can be treated as a constant (there is one row matching the query) Note: Primary Key Lookup, Unique Index Lookup
🟢
eq_ref
The index is clustered and is being used by the operation (either the index is a PRIMARY KEY or UNIQUE INDEX with all key columns defined as NOT NULL)
🟢
ref
The indexed column was accessed using an equality operator Note: The ref_or_null access type is a variation on ref. It means MySQL must do a second lookup to find NULL entries after doing the initial lookup.
🟡
fulltext
Operation (JOIN) is using the table’s fulltext index
🟡
index
The entire index is scanned to find a match for the query Note: The main advantage is that this avoids sorting. The biggest disadvantage is the cost of reading an entire table in index order. This usually means accessing the rows in random order, which is very expensive.
🟡
range
A range scan is a limited index scan. It begins at some point in the index and returns rows that match a range of values. Note: This is better than a full index scan because it doesn’t go through the entire index
🔴
all
MySQL scans the entire table to satisfy the query
Green indicates better performance, yellow indicates okay performance, and red indicates bad performance.
There are also a few other types that you might want to be aware of:
index_merge: This join type indicates that the Index Merge optimization is used. In this case, the key column in the output row contains a list of indexes used. It indicates a query can make limited use of multiple indexes on a single table.
unique_subquery: This type replaces eq_ref for some IN subqueries of the following form:value IN (SELECT primary_key FROM single_table WHERE some_expr)

index_subquery: This join type is similar to unique_subquery. It replaces IN subqueries, but it works for nonunique indexes in subqueries.
The EXTRA column in MySQL EXPLAIN
The EXTRA column in a MySQL EXPLAIN output contains extra information that doesn’t fit into other columns. The most important values you might frequently run into are as follows:
EXTRA column value
Definition
Using index
Indicates that MySQL will use a covering index to avoid accessing the table.
Using where
The MySQL server will post-filter rows after the storage engine retrieves them.
Using temporary
MySQL will use a temporary table while sorting the query’s result
Using filesort
MySQL will use an external sort to order the results, instead of reading the rows from the table in index order. MySQL has two filesort algorithms. Either type can be done in memory or on disk. EXPLAIN doesn’t tell you which type of filesort MySQL will use, and it doesn’t tell you whether the sort will be done in memory or on disk.
“Range checked for each record”
(index map:N). This value means there’s no good index, and the indexes will be reevaluated for each row in a join. N is a bitmap of the indexes shown in possible_keys and is redundant.
Using index condition
Tables are read by accessing index tuples and testing them first to determine whether to read full table rows.
Backward index scan
MySQL uses a descending index to complete the query
const row not found
The queried table was empty
Distinct
MySQL is scouring the database for any distinct values that might appear in the column
No tables used
The query has no FROM clause
Using index for group-by
MySQL was able to use a certain index to optimize GROUP BY operations
Hands-on example of how to use MySQL EXPLAIN
In this section, we will explore one way you can utilize MySQL EXPLAIN for query optimizations. To start, I created a database in PlanetScale and seeded it using the MySQL Employees Sample Database.
PlanetScale is a hosted MySQL database platform that makes it easy to spin up a database, connect to your application, and get running quickly. With PlanetScale, you can create branches to test schema changes before deploying to production. This development environment, paired with some of our other tools, like Insights for query monitoring, gives you a great way to test and debug queries, leading to better performance and faster application.
Sign up for a PlanetScale account.

Confirm that the database is created and seeded
Now that we have our database let’s run some queries.
First, we’ll want to confirm that our tables are in PlanetScale. We can do this by running SHOW TABLES; in the PlanetScale CLI or web UI. For this example, I will be utilizing our web UI.

Run the initial query
Using a multi-column index coupled with MySQL EXPLAIN, we will provide a way to store values for multiple columns in a single index, allowing the database engine to more quickly and efficiently execute queries using the set of columns together.
Queries that are great candidates for performance optimization often use multiple conditions in the WHERE filtering clause. An example of this kind of query is asking the database to find a person by both their first and last name: SELECT * FROM employees WHERE last_name = 'Puppo' AND first_name = 'Kendra';


Okay, so we know that this result isn’t ideal because it’s scanning 299,202 rows to complete the request, as shown under rows in the screenshot above. How do we go about optimizing it? We have a few different routes we can take, but only one is ideal for cost and performance.
Optimization approach 1: Create two individual indexes
For our first approach, let's create two individual indexes — one on the last_name column and another on the first_name column.
This may seem like an ideal route at first, but there's a problem.
If you create two separate indexes in this way, MySQL knows how to find all employees named Puppo. It also knows how to find all employees named Kendra. However, it doesn't know how to find people named Kendra Puppo.
Some other things to keep in mind:
MySQL has choices available when dealing with multiple disjointed indexes and a query asking for more than one filtering condition.
MySQL supports Index Merge optimizations to use multiple indexes jointly when running a query. However, this limitation is a good rule of thumb when building indexes. MySQL may decide not to use multiple indexes; even if it does, in many scenarios, they won’t serve the purpose as well as a dedicated index.
Optimization approach 2: Use a multi-column index
Because of the issues with the first approach, we know we need to find a way to use indexes that consider many columns in this second approach. We can do this with a multi-column index.
You can imagine this as a phone book placed inside another. First, you look up the last name Puppo, leading you to the second catalog for all the people named Kendra, organized alphabetically by first names, which you can use to find Kendra quickly.
In MySQL, to create a multi-column index for last names and first names in the employees table, execute the following:CREATE INDEX fullnames ON employees(last_name, first_name);


Now that we have successfully created an index, we will issue the SELECT query to find rows with the first name matching Kendra and the last name matching Puppo. The result is a single row with an employee named Kendra Puppo.
Now, use the EXPLAIN query to check whether the index was used:

These results show that the index was used, and only one row was accessed to fulfill this request. This is much better than the 299,202 rows we needed to access before the index.
Conclusion
The EXPLAIN statement in MySQL can be used to obtain information about query execution. It is valuable when designing schemas or indexes and ensuring that our database can use the features provided by MySQL to the greatest extent possible.
In PlanetScale, our Insights feature + EXPLAIN statement in MySQL can be of massive assistance when you need to optimize the performance of your queries.]]></content>
        <summary><![CDATA[Learn how to read the output in MySQL EXPLAIN plans so you can utilize them to improve query performance.]]></summary>
      </entry>
    
      <entry>
        <title>Connection pooling in Vitess</title>
        <link href="https://planetscale.com/blog/connection-pooling" />
        <id>https://planetscale.com/blog/connection-pooling</id>
        <published>2023-03-27T09:00:00.000Z</published>
        <updated>2023-03-27T09:00:00.000Z</updated>
        
        <author>
          <name>Harshit Gangal</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Connection pooling is a commonly used technique in modern applications to manage database connections efficiently. It involves creating a cache of database connections that the application can use as needed. Instead of creating a new connection for each request to the database, the application retrieves a connection from the pool. After the application finishes using the connection, it is returned to the pool to be reused later, rather than being closed outright.

Benefits of connection pooling
Using connection pooling in your application offers several advantages:
Performance
Connection pooling reduces the overhead of establishing new database connections. Connections are reused instead of being created and closed for each request. This is especially useful for applications that require frequent, small interactions with the database.

The diagram above illustrates a typical MySQL SSL connection establishment phase when an application connects to a database over a network. This initial handshake phase can add up to 50ms of overhead. However, by implementing connection pooling, applications can significantly reduce their response time per request by saving the 50ms overhead. This improvement in performance can greatly benefit the overall functionality of the applications.
Scalability
Connection pooling significantly improves an application’s ability to handle a large number of concurrent connections. By reusing existing connections, the overhead of connection establishment is removed, freeing up the CPU for other tasks. This results in an increased ability to handle more concurrent requests simultaneously for the application.
Traffic shaping
Connection pooling helps manage database resources by limiting the number of active connections. When there is a load spike at the application layer, the connection pool will throttle some of the requests and make them wait before allocating a connection for them to use. This prevents the database from being overwhelmed, which can lead to degraded performance or crashes. The database can continue to serve at maximum capacity utilization.
When connection pooling isn’t enough
Connection pooling at the application level is a useful tool, but it has limitations in solving all scalability problems. While it manages and reuses connections, it does not inherently scale the database to handle the increased load. When thousands of concurrent requests are made, the database’s resources, such as CPU, memory, and disk I/O, can be overwhelmed, resulting in performance degradation or database crashes.
As the scale of the application grows, its load increases, and it becomes necessary to deploy it on multiple servers. However, as the number of servers increases from a few to hundreds or thousands, this can potentially overload the database. Moreover, new applications connecting to the same database can also overwhelm it.

Inefficient use of database connections can also occur at the application-level connection pooling. The application servers may not always be equally loaded, and the application might not have correctly capped the database connections, resulting in a waste of connections. As a result, the application servers that require those connections may not be able to acquire them when the database connection limits are reached.
Connection pooling at scale
In 2010, YouTube encountered similar challenges, leading to the development of Vitess and its first component, Vttablet. Vttablet acted as a MySQL proxy and was primarily responsible for managing the connection pool. By allowing client applications to connect only to Vttablet, the need for a connection pool at the application level was eliminated. This meant that connections could be centrally managed in Vttablet, with the maximum number of allowed connections being configurable in Vttablet, rather than growing unbounded as the number of applications increased. This significantly reduced the strain on the database and improved scalability.

To handle concurrent requests at scale, the connection pool implementation in Vttablet was designed to be lockless, using atomic operations and non-blocking data structures with lock-free algorithms. This approach enables Vitess to efficiently manage large numbers of concurrent requests, further improving its scalability and performance.
Connection settings
MySQL provides a wide range of session-level system settings that can be adjusted for each connection. However, when using connection pooling, all connections in the pool share the same settings. Any modifications made to a connection will render it unusable for other requests, as it becomes "tainted". Therefore, either settings changes made to a connection must be restored to their original values or the connection should be closed after the operation is complete to ensure the stability of the connection pool. It is important to consider the potential impact of frequent modifications to connection settings, as these can degrade performance with increasing numbers of requests.
For a very long time, Vitess did not honor the modification of system settings on connections. If provided, they were ignored in order to preserve the benefits of connection pooling. However, when Vitess began supporting the MySQL protocol, different ORMs (object relation mapping) for multiple languages could connect to Vitess using the language’s default MySQL connector. These ORMs issue SET statements to change the connection settings at the beginning of the connection, and they expect these connections to behave in a certain way.
To support these connection-level settings, Vitess had to deviate from its original connection pooling method. In Vitess release v7.0, it allowed system settings to be modified on the connection. This makes the connection reserved for the application session and it cannot be returned to the connection pool. As the connection is taken out of the pool and cannot be returned, this turns off the connection pooling benefits for that session. As these kinds of reserved connections are no longer part of the connection pool, they can grow without any upper limit on their numbers, eventually leading to MySQL running out of connections and making the database unavailable for application use.
To limit the impact of reserved connections on the total number of connections to MySQL, Vitess used a few techniques before the release of v15.0:
Limiting the impact of reserved connections: Technique 1
When an application issues a SET statement on a connection to modify the system settings, Vitess first validates the current system settings for those variables. If the desired connection settings are already identical to the MySQL settings, the SET statement is ignored by Vitess, and the connection is not reserved for the session.
For instance, let’s consider an application sending a query such as set unique_checks = 0. Vitess will then send a query select 0 from dual where @@unique_checks != 0 to MySQL. If the query returns a row, it means that the connection setting is being modified, the session will be marked to use a reserved connection, and the new setting will be applied to the connection. Otherwise, a reserved connection is not required and the SET statement can be ignored.
Limiting the impact of reserved connections: Technique 2
MySQL 8.0 provides the capability to modify in-memory system settings temporarily for a query’s duration using SQL comments through the SET_VAR hint. This query hint sets the session value of a system variable temporarily and does not "taint" the connection when the settings are applied, making it possible to reuse the connection.
Building on the previous example, once Vitess recognizes that the unique_checks setting is being altered, all subsequent queries within that session are rewritten. For example, the query insert into user (id, name) values (1, ‘foo’) will be rewritten as insert /*+ SET_VAR(unique_checks=0) */ into user (id, name) values (1, ‘foo’).
However, it’s essential to note that not all settings can be used with SET_VAR. For those that are not permitted, reserved connections must still be used.
By utilizing the techniques mentioned above, we have reduced the use of reserved connections to the extent possible, thus retaining the advantages of using a connection pool. However, due to the limited number of system settings that can be used with SET_VAR, there are still system settings that must be applied to the connection, causing the connection to be pulled out of the connection pool and leading to degraded database performance.
When the connection settings feature was launched, Vitess users were advised to use it sparingly. We recommended setting MySQL default settings to align with the ORM’s SET statements to minimize the possibility of using reserved connections and to avoid the issue of running out of connections. However, over time, Vitess users discovered that this approach is not always feasible, especially when multiple applications with different ORMs are running on a single Vitess cluster. Each ORM may set a different value for the same setting, making the MySQL default settings ineffective, resulting in a high number of reserved connections being used. Therefore, this issue needs to be addressed at the Vitess level.
Settings pool
In Vitess 15, a new connection pool, the "settings pool", was introduced. The settings pool can handle modified connections without compromising the benefits of connection pooling. Vitess now tracks and manages connections in which system settings have been modified. This process is transparent to the application but provides all the advantages of connection pooling while still allowing per-connection settings. When an application submits a query to Vitess for execution, Vitess can retrieve the correct connection from the connection pool, with or without settings applied, based on the settings specified by that application on that session, and then execute the query.
Currently, this feature is behind a flag and can be enabled using queryserver-enable-settings-pool in Vttablet.
At PlanetScale, we have started to roll out this feature and are already seeing improvements in query latency and load on Vttablet for customers who previously relied on reserved connections due to their application ORMs.

]]></content>
        <summary><![CDATA[Connection pooling reduces the overhead of establishing new database connections. Learn how connection pooling works and how it is handled in Vitess.]]></summary>
      </entry>
    
      <entry>
        <title>How to Upgrade from MySQL 5.7 to 8.0</title>
        <link href="https://planetscale.com/blog/upgrading-to-mysql-8" />
        <id>https://planetscale.com/blog/upgrading-to-mysql-8</id>
        <published>2023-03-24T09:00:00.000Z</published>
        <updated>2023-03-24T09:00:00.000Z</updated>
        
        <author>
          <name>JD Lien</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Although MySQL 8 was released back in 2018, a significant share of MySQL servers out there are still running MySQL 5.x. MySQL 5 had a lengthy run from its release in 2005, and thus many organizations still have databases that were built on 5.x. But Oracle has been phasing out MySQL 5.7 support for various platforms over the past few years and end of life for MySQL 5.7 is slated for October 2023.
If you’re still running a database on MySQL 5.7, it’s time to seriously consider upgrading. You'll get several new features that give you performance improvements and security enhancements, so it is important that you do this soon — especially with the imminent end-of-life of MySQL 5.7, which means there will be no further security updates. Fortunately, this process is usually pretty straightforward, but there are several changes you may have to make. This article will cover many of the things that you should look out for when upgrading an existing database from MySQL 5.7 to 8 and walks you through the process of changing your database to be compatible with the new version.
Here's what we'll cover:
Before you upgrade
Character sets and collations
How to upgrade your database to the utf8mb4 character set and utf8mb4_0900_ai_ci collation
Obsolete data types
Authentication Changes
New reserved words
SQL mode changes
C-style operators
Server error codes
Upgrading MySQL versions with no downtime
Performing the no-downtime upgrade
Conclusion
Before you upgrade
Before you upgrade, you should make sure that you have a backup of your database. Furthermore, you should ensure that your backup works. Many seasoned IT pros have not-so-fond memories of restoring a database from a backup only to find that it was corrupted or not backing up what they thought it was. If you are using a cloud service like Amazon RDS, you can use the automated backup feature to create a snapshot of your database. If you are running your own database server, you can use the mysqldump command to create a backup of your database.
Character sets and collations
MySQL 8 has changed how character sets and collations work. The character set determines how characters are stored in the database, while the collation determines how characters are compared.
In previous versions of MySQL, latin1 and utf8 (with 3-byte characters) were commonly used. In MySQL 5.7, the default collation was utf8mb4_general_ci. In MySQL 8, however, the default character set is utf8mb4, and the default collation is utf8mb4_0900_ai_ci. utf8mb4 is a more robust version of utf8 that supports 4-byte characters. The 0900 in the collation name indicates that it is using the Unicode 9.0 standard. The ai indicates that it is using accent-insensitive collation, and the ci means that it is case-insensitive.
When upgrading to MySQL 8, it's a good idea to change your character set and collation to utf8mb4 and utf8mb4_0900_ai_ci, respectively. This will ensure that your database is compatible with the new version of MySQL, and will allow your database to support more characters, such as emojis.
If you need a refresher on charsets and collations, you can check out our free video on strings in MySQL, which covers this topic in more detail.
How to upgrade your database to the utf8mb4 character set and utf8mb4_0900_ai_ci collation
Step 1: Change the default character set and collation for the database
To change the default character set and collation for the database, you can use the ALTER DATABASE statement. For example, to change the default character set and collation for the my_database database to utf8mb4 and utf8mb4_0900_ai_ci respectively, you would use the following statement:ALTER DATABASE my_database CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Step 2 Change the character set and collation for each table
To change the character set and collation for each table, you can use the ALTER TABLE statement. For example, to change the character set and collation for the my_table table to utf8mb4 and utf8mb4_0900_ai_ci respectively, you would use the following statement:ALTER TABLE my_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Obsolete data types
While MySQL didn't remove any data types, there are a few that are no longer recommended for use.
Some of these are:
YEAR(2) — This stored a two-digit year. It is recommended to store year values as YEAR(4), which uses four digits.
ENUM — You could create a field with a defined list of allowed values using an ENUM. While still available in MySQL 8, it is no longer recommended. Instead, it is ideal to store enumerated values in a lookup table with foreign keys.
TINYTEXT, MEDIUMTEXT, and LONGTEXT — While these text types are available in MySQL 8, it is recommended to use VARCHAR with a specified length, eg VARCHAR(255) or TEXT for long strings of text (eg longer than 255 characters) where you won't need to search for a specific substring.
NATIONAL, CHARACTER SET, and COLLATE clauses — While these clauses are still available in MySQL 8, they were made obsolete and are no longer recommended for use. Instead, the recommended approach for specifying character sets and collations is to use the CHARACTER SET and COLLATE options in the column definition or table definition.
Authentication Changes
MySQL 8 has changed how authentication works. The most significant change is that the default authentication plugin is now caching_sha2_password instead of mysql_native_password. This means that if you are using the default authentication plugin, you will need to update your connection strings to use the new plugin.
Legacy accounts that use the old authentication plugin must be converted to the new one using the ALTER USER statement. It is also important to update any client applications that interact with the database to support the new authentication mechanism. Finally, thorough testing should be carried out to ensure that the database is functioning correctly with the new authentication plugin.
New reserved words
MySQL 8 has added a number of new reserved words. These are words that cannot be used as identifiers (e.g., table names, column names, etc). If you are using any of these words as identifiers, you will need to change them to something else or ensure that you are quoting them. For the full list, see the MySQL documentation new reserved words in MySQL 8. A few examples of these new reserved words are:
ACTIVE
ADMIN
ATTRIBUTE
COMPONENT
DEFINITION
DESCRIPTION
EMPTY
EXCLUDE
FINISH
GROUPS
INACTIVE
INITIAL
LEAD
LOCKED
MEMBER
NESTED
OFF
OLD
ORGANIZATION
OTHERS
OVER
PATH
PROCESS
RANDOM
RANK
RESOURCE
RETURNING
REUSE
ROLE
SKIP
SRID
STREAM
SYSTEM
TIES
URL
VISIBLE
ZONE
SQL mode changes
MySQL 8 has changed the default SQL mode, which has to do with the behavior of the server when evaluating queries. If you are using the default SQL mode, you will need to update your SQL statements to be compatible with the new mode.
In MySQL 8, the new default SQL mode is ONLY_FULL_GROUP_BY,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION. This is stricter than the default mode in earlier versions of MySQL. For instance, it has more is more specific about how GROUP BY statements are evaluated, and it will throw an error if you try to divide by zero.
Additionally, MySQL 8 has deprecated the ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, STRICT_ALL_TABLES, and TRADITIONAL SQL modes, so you should remove those from your SQL statements if you are using them.
C-style operators
MySQL 8 has deprecated the use of the C-style &&, ||, and ! operators. These operators are still available in MySQL 8, but they will be removed in a future version. It is recommended to update your SQL statements to use the standard AND, OR, and NOT operators instead.
Server error codes
MySQL 8 has changed the error codes for some server errors. If you are using these error codes in your application (for instance, to check for specific errors), you will need to update them to the new codes. For the full list of error codes, see the MySQL documentation on error codes.
Upgrading MySQL versions with no downtime
The upgrade process should be pretty straightforward for most installations. However, it is very likely that you'll need some downtime to complete the upgrade process and any required schema changes for future-proofing your database. If you're using RDS, they state that database engine upgrades require downtime, and the downtime duration depends on the size of your database instance. Even using Blue/Green deployments requires some downtime, though it is less than the traditional route.
Performing the no-downtime upgrade
Fortunately, there is a way to do zero downtime MySQL upgrades.
PlanetScale offers a free import tool that allows you to import a live, production database with no downtime or data loss. We support MySQL versions 5.7 up through 8.0, so if you have a 5.7 database, you can import it through this process, and we will automatically do the no-downtime upgrade to 8.0 for you. You may be wondering how you'll migrate platforms without downtime, but don't worry! This migration process is also no downtime. After we copy your schema and data over, the PlanetScale database will essentially act as a replica to your production database, so we'll continue syncing any incoming data changes from your production database. For more information on the full process, read through our database import guide.
Once you're on PlanetScale, you won't have to worry about upgrading versions and dealing with downtime in the future. With our managed database service, your MySQL database will always be up to date.
Additionally, our product includes a no-downtime schema change workflow using database branching and deploy requests.

 Be sure to reference our MySQL compatibility documentation before beginning the import. There are some things you may need to change before the import that we don't support from 5.7, such as reserved keywords, but most 5.7 databases shouldn't cause any issues. Reach out to our Support team for additional guidance. 
We have platform-specific guides for migrations from Amazon RDS, DigitalOcean, GCP CloudSQL, and Azure:
Migrate from Amazon RDS
Migrate from DigitalOcean
Migrate from Cloud SQL
Migrate from Azure
We offer plans to fit everybody's needs. Check out our pricing page for more information.
If you have a large or complex database and would like to hear about custom options or import assistance, we'd love to hear from you. Fill out the form and we'll be in touch shortly.
Conclusion
To recap, for most installations, upgrading to MySQL 8 should be a relatively straightforward process. However, it is important to test your database thoroughly after the upgrade to ensure that it is functioning correctly. Remember to ensure that you have working backups of your database before you start the upgrade process in case something goes wrong. During the process, it is ideal if you can ensure that you are using the utf8mb4 character set and utf8mb4_0900_ai_ci collation. This will ensure that your database is future-proofed for the next few years. Additionally, it is important to ensure that you are using the new authentication plugin and that you are not using any of the deprecated data types, reserved words, or SQL modes.
If you have any questions about upgrading to MySQL 8 or if you need help with the upgrade process, please feel free to contact us.]]></content>
        <summary><![CDATA[Learn what you should look out for when upgrading an existing database from MySQL 5.7 to 8 and how to change your database to be compatible with the new version.]]></summary>
      </entry>
    
      <entry>
        <title>Zero downtime Rails migrations with the PlanetScale Rails gem</title>
        <link href="https://planetscale.com/blog/zero-downtime-rails-migrations-planetscale-rails-gem" />
        <id>https://planetscale.com/blog/zero-downtime-rails-migrations-planetscale-rails-gem</id>
        <published>2023-03-20T17:30:00.000Z</published>
        <updated>2023-03-20T17:30:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We've recently released the planetscale_rails gem. It contains a collection of Rake tasks that we have been using internally to manage the schema of our own Rails application.
The workflow we use to build the PlanetScale Rails application allows our engineers to ship changes without the fear of downtime. This leads to faster releases and an all-around more confident team. In this blog post, I'm going to go over the risks of running Rails schema migrations directly on your production database and teach you how to set up a workflow that mitigates those risks.
 If you haven't already, you can sign up for a PlanetScale account to get a performant, scalable, easy-to-use MySQL database. Follow our Rails quickstart to connect to your Rails app in just minutes, and then come back and follow the rest of this guide to set up the workflow for no downtime Rails schema migrations. 
The ultimate production schema migration workflow
Before digging in, let's go over the high-level workflow we use for our own Rails application. We run MySQL locally to make schema changes in development and use PlanetScale branches once we're ready to deploy to production.
Our team uses GitHub for code review, and we automatically deploy code changes when a pull request is merged into main. We separate our schema changes from our code changes, and for those, we use PlanetScale deploy requests.
Separating out schema changes from code changes may be a new concept for some, but it's an essential part of shipping changes without risk of downtime.
How PlanetScale deploy requests prevent downtime
PlanetScale databases introduce a git-like workflow for making schema changes. When you're just getting started, we provide a default main branch that you can run your initial migrations on. Once you're done, you promote that branch to production and connect it to your production application. Now, whenever you need to make schema changes, you can branch off of that production schema, make your schema changes on that dev branch, and open a deploy request to deploy to production with no downtime. The deploy request is how we make safe schema changes in PlanetScale databases. Think of it as a very advanced version of rails db:migrate. Under the hood, it uses Vitess's online schema migration tooling, which allows you to make schema changes without worrying about accidentally causing a production incident by locking a table.
The best part, if something does go wrong, you can also revert it without data loss. Sounds impossible, but you can read through our How schema reverts work blog post if you want the details on how it all works.
We love using rails db:migrate for development. But when it comes to production, we prefer the safety that deploy requests provide.
Deploying schema changes before code changes
Many Rails applications are set up to run their production schema migrations during their deployment process.
While this setup feels easy, "you just merge!", we do not recommend it.
When the code and schema deploys are coupled, we've found teams eventually run into problems. Migrations are slow on large tables, or worse, fail completely. Both of these cases block the flow of code getting into production.
Deploying code to production and migrating the database are two of the highest risk actions engineers do in their work. For these tasks, we prefer to focus and take care of each important step one at a time.
It's also impossible to atomically deploy both code and a schema change at the same time. If either is dependent on the other, users will experience errors or downtime. Having engineers focus on how to get both their code and schema changes out separately forces best practices to be implemented. And, fortunately, it doesn't take too much extra work to set up this safe, no downtime workflow.
The no downtime PlanetScale + Rails workflow
Here are the details of how an engineer on our team gets a schema change into production. Again, make sure you first have your PlanetScale database set up with a production branch with safe migrations enabled. If not, follow our Rails quickstart to get connected.
Follow the instructions in the pscale README to install the PlanetScale CLI.
Follow the instructions in the planetscale_rails README to install the planetscale_rails gem
Create a new PlanetScale branch via the CLIpscale branch switch my-feature --database my-db-name --create

Notice that you're using switch here. This places a .pscale.yml file in your local directory. This is how Rails will know which branch to migrate.
Run Rails migrations on the new branch (using planetscale_rails)bundle exec rails psdb:migrate

Open a new PlanetScale deploy requestpscale deploy-request create database-name my-feature

Open a pull request on GitHub
Now, just like a normal code change, the engineer will push up their migration file and schema.rb changes in a pull request to be reviewed. They'll include a link to the deploy request in their PR description.
Once both deploy and pull request are reviewed. The PlanetScale deploy request will be shipped first. Once successful, then the code gets merged in.
planetscale_rails rake tasks
In the above example, we used rake psdb:migrate. Here's the full list of tasks made available by the gem:rake psdb:migrate                                 # Migrate the database for current environment
rake psdb:rollback                                # Rollback primary database for current environment
rake psdb:schema:load                             # Load the current schema into the database
rake psdb:setup_pscale                            # Setup a proxy to connect to PlanetScale

Migrate and rollback also work in multiple database Rails apps. For targeting a different database, you can add onto the command just like a normal db:migrate, rails psdb:migrate:primary, or rails psdb:migrate:rollback.
Managing schema_migrations table
You may be wondering what happens with the schema_migrations table when using PlanetScale deploy requests. The answer is, nothing changes! Deploy requests will keep it up to date just like db:migrate does for you. When branching, the schema_migrations table gets copied across branches.
To have this work, all you need to do is head to your PlanetScale dashboard, go to your database "Settings" page, and enable "Automatically copy migration data" for the database.
No downtime example migrations
Let's go through some common scenarios to see how we can accomplish these safely.
Adding a database column
To add a new column safely, you must always deploy the schema change before any code using the column is deployed to production.
Make the schema change via a Deploy Request
Merge in the code that uses the new column
Since we are using Deploy Requests, we do not need to worry about default values. PlanetScale's schema change tools will add the column and the new default value without any table locking.
Removing a database column
Removing a column is one case where we do want to deploy some code prior to running the migration.
Set the column as ignored in the model and deploy this change to production.class Project < ActiveRecord::Base
  self.ignored_columns += %w(category)
end

This ensures your application is not using the column in production and removes the risk of any errors when you do run the migration.
After this change is deployed, you can also check Insights to be sure it's not being used in any queries before moving on.
Run the Deploy Request to remove the column
Once the Deploy Request is complete, you can now clean up the ignored_columns code and deploy it to production
Renaming a database column
Renaming a column may seem like a simple change, but doing so without any interruption to production takes some extra care.
We cannot rename a column directly without downtime. Instead, we must add a new column and then transfer all the data to it.
Each step in this is a production deployment.
Run a migration to add the new column
Update the application to start double writing to both the old and new column
How to do this will depend on the application. ActiveRecord callbacks can be a useful tool here.
Run a backfill script that updates the new column# Example backfill script
Project.all.find_each do |project|
  project.update(:new_column, project.old_column)
  puts "updated #{project.id}"
end

Update the application to start reading from the new column. Remove the double writes
Remove the old column.
It is a bit of extra work, but taking these steps ensures a safe transition to the new column and no interuptions to production traffic.
Adding or removing indexes
When using Deploy Requests, there is no risk of table locking while adding or removing a database index. This scenario is quite simple.
Run the schema change via a Deploy Request
Merge in the code with the migration file and schema.rb changes
Removing an index can be risky if the application is actively using it. Be sure to check that it's not being used by any queries in production before removing it. One simple way to check is by running the following in your production database.select * from sys.schema_unused_indexes;

This query will return any unused indexes since MySQL's last restart. If your application runs infrequent scheduled jobs, it may be possible for an index used by that job to show up as unused when running the query.
Handling data migrations
Occasionally, we also need to backfill or migrate data as part of our schema changes. We handle this by running the data migration in production after the schema has been changed.
For example, if we need to backfill a column. We'll run the following script in a production Rails console session.# use find_each to limit data in memory
Project.all.find_each do |project|
  project.update(:new_column, project.old_column)
  # print progress to the screen
  puts "updated #{project.id}"
end

This is a simple data migration. There are cases where the migration is more complex, and we need to take more precautions. For those cases, we'll setup a Ruby class that manages the migration for us. This allows it to be tested and code reviewed before being run in production.class MigrateProjectData
  def self.run
    initialize.run
  end

  def run
    # migration logic here
  end
end

We then deploy this code and call it from a production console with: MigrateProjectData.run.
Conclusion
Hopefully this guide has helped you level up your Rails deployment workflow. While the extra steps and thoughtfulness required may seem like a lot of additional work upfront, we promise it'll become second nature very quickly. In the end, it will likely save your team more time overall because everyone will feel more confident making schema changes, leading to faster release times. And, if you ever end up in the unfortunate scenario that you accidentally shipped a bad schema change, you'll be able to revert it with the click of a button using this workflow.]]></content>
        <summary><![CDATA[Learn about the Ruby on Rails workflow that protects your database and application from accidental downtime and data loss.]]></summary>
      </entry>
    
      <entry>
        <title>Preparing for MySQL 5.7 EOL</title>
        <link href="https://planetscale.com/blog/preparing-for-mysql-5-7-eol" />
        <id>https://planetscale.com/blog/preparing-for-mysql-5-7-eol</id>
        <published>2023-03-14T13:00:00.000Z</published>
        <updated>2023-03-14T13:00:00.000Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[What does the MySQL 5.7 EOL means for your database? Learn considerations to upgrade 8.0 and how PlanetScale can help you upgrade with no downtime or data loss.]]></summary>
      </entry>
    
      <entry>
        <title>DevOps with PlanetScale</title>
        <link href="https://planetscale.com/blog/the-eight-phases-of-devops" />
        <id>https://planetscale.com/blog/the-eight-phases-of-devops</id>
        <published>2023-03-13T00:00:00.000Z</published>
        <updated>2023-03-13T00:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[PlanetScale was built with continuous integration and deployment in mind. This section of our documentation contains guides and tooling recommendations to use with PlanetScale to enhance your pipelines for a smoother database experience.
DevOps is typically broken down into eight distinguished phases as an operational model. The phases operate in a continuous loop, with each phase providing value to the phase ahead of it as shown in the following diagram in the gray text:

This blog will act as a brief introduction to these phases, with a summary of how various PlanetScale features apply to each phase. Links to the relevant PlanetScale feature documentation will be provided, and additional resources such as practical tutorials with specific products and frameworks will be included as they are built.
These sections are to act as guides.
While DevOps presents the eight phases as an operational model, it's important to understand that these phases are to act as guidelines as opposed to a rigid workflow. As such, building a workflow that fits your business's needs is the important part and your workflow may differ drastically from what others have built.
In each section, you'll find specific recommendations on what features of PlanetScale can be used within that phase, but DevOps is not a "one size fits all" process. You are encouraged to modify the flow and use any features as you see fit, to make the product work for you.
Plan phase
The Plan phase is where the enhancements, changes, bug fixes, or new features are set up for developers to start building out. It is here that project managers and product owners review the feedback gathered from the previous development cycle and use it to determine what backlog items (or items that have yet to make it to the backlog) will provide the most value to end users, be they internal or external. If the team follows an agile development process, it's where sprints are planned and individual issues are assigned to various members of the team.
While PlanetScale doesn't have tools to directly assist with the planning process, there is one key way that our platform can be utilized in this phase, and it ultimately depends on the source control strategy used throughout the remainder of the process. As you'll learn throughout this series of guides, having a good method to automate processes utilizing source control as the backbone for the overall process is critical in determining how simple or difficult it will be to use DevOps within your team.
Database schema branching
If your process dictates that a branch be created for the active iteration before any code is submitted (the focus of the next phase), this is the perfect opportunity to create a new PlanetScale database development branch for the applications you will be working on.
A branch of a PlanetScale database is a completely isolated MySQL instance that has an exact copy of the schema of the upstream database. It allows developers to make changes to the schema of the branch without worrying about affecting the production database. Credentials required to connect to database branches are also unique to each branch. In other words, even branches within the same database environment have unique connection strings. Refer to the documentation on branching to learn more.
Data Branching®
PlanetScale also offers Data Branching® in some pricing tiers. With Data Branching®, PlanetScale will automatically restore the latest version of the backup of a production branch into the new branch being created. This can help create a sandboxed environment for developers to work with that exactly mimics your production environment without affecting production data.
Automate branching via the PlanetScale CLI
If your project management tools allow for automation, you can use either the PlanetScale CLI or our public API to automatically create a branch and credential set whenever a sprint or iteration is created. The pscale CLI is a cross-platform command line application that can be used to interact with PlanetScale databases, and the API is a set of HTTP endpoints that can be used to automate certain functions of a PlanetScale database. Refer to either the pscale documentation to learn more about the CLI, or the API documentation to learn more about how to use the PlanetScale API.
In this guide, we touched on the planning process and how project management software can utilize the pscale CLI to automatically create branches when a new iteration is kicked off. The next step in the DevOps cycle is the code phase, where developers build the functionality set forth by the planning phase.
Code phase
This is the phase developers are likely most fond of. The next iteration has been planned, tasks assigned, and the code is being written for improvements to come. As code is being written by developers on their workstations, changes will be frequently checked into a dedicated source control system that acts as the single source of truth for the codebase. While several different systems can be used, git is the most popular one. Many tools integrate with git, which will be important as we progress through the DevOps phases to follow.
Companies that follow good DevOps practices should have isolated environments outside of production so when new code is being developed, there won't be any impact on the users actively working with the application. This also includes the storage layer of the application, which is where the database lives. There are several ways in which PlanetScale can help developers.
Database Branching
The first way that PlanetScale tooling can help is with the schema branching flow at the heart of PlanetScale databases. As described in the Plan phase, a branch of a database in PlanetScale is a completely isolated database with a copy of the production branch. Any operations performed on a branch will not affect any other branch of that database.
Since branches are separate from one another, a dedicated database branch should be created for developers to build on that is completely isolated from the production database. This same branch can be used later in the DevOps process to simplify merging changes into the production branch using a feature called Deploy requests, which will be expanded on in the Release phase. If a branch wasn't created in the previous step, it should be created now before developers start writing code.
Added security with the pscale CLI
Local configurations can be simplified as well by using the PlanetScale CLI to create a tunneled connection to that branch using the pscale connect command. This could be done in place of sharing connection strings. Account administrators in PlanetScale have complete control over the users who can connect in this manner, reducing the security risk of those connection strings being obtained by a bad actor.
This tunneling process can be run at the same time the local development environment starts by scripting the creation of the tunnel before starting any other dev tools. Since the process uses access-based authentication over connection strings, developers who work at companies where secrets are regularly rotated will not be affected in situations where connection strings for development databases need to be regenerated.
In this guide, we discussed the Code phase and how database branching and the PlanetScale CLI can help developers work a bit easier. The next step in the DevOps cycle is the build phase, where the new code built in this phase is compiled or otherwise set up.
Build phase
Now that the code is written and tested locally, it's time to build it for a production-like environment. Generally, this is an automated process that is triggered by the source control management software used throughout the DevOps pipeline.
When a build is triggered, a subprocess is kicked off based on that event to use a dedicated build server or containers to ensure that the code compiles. The result of the build phase is the artifacts or compiled/transpile versions of the code. These artifacts should be the same files that are used throughout the remainder of the process, including the testing process and deployment to production.
There are three ways that the build process is typically triggered:
Manually
When commits are pushed
When pull requests are approved and closed
PlanetScale offers tooling and features to support many phases of the DevOps lifecycle, however since the Build phase is primarily focused on code compilation and generating artifacts for testing and deployment, there are no practical ways that our system can directly help during this process.
In this guide, we discussed the Build phase and how deploy requests can be used to merge changes to the test branch ahead of the Test phase, where the artifacts generated from this phase are run through a series of tests to ensure they are ready for production.
Test phase
By this point, new code was written and compiled into artifacts that can be used during deployments, but it was likely only tested on the developers' local machines. These development environments are likely not exactly the same as production and, as a result, may not necessarily mean that the same code will work smoothly in production. During the test phase, the artifacts are run through a series of tests, preferably using automation, to ensure the quality and performance of the code, and to maintain interoperability with any other services or infrastructure as required.
There are different types of tests that can be run, but the type that PlanetScale can help with most are integration tests. Integration testing involves testing your software with the different modules or services that also support the application, one of these being the database.
Use a dedicated test branch of the database
Since database schema branches in PlanetScale are isolated copies of your database schema, a dedicated branch can and should be created specifically for this phase. Many organizations configure their environments to move linearly (dev → test → production) throughout their pipeline. This is where the PlanetScale flow deviates from convention a bit. Since development branches can only be merged with branches with safe migrations, and the dev branch (which is development) contains the version of our schema with the most recent changes, you'll need to create a branch from the dev database branch.
Seed with production data
If your tier includes the Data Branching®, you can spin up yet another production-like environment by utilizing data from the most recent backup of the production branch database. This will ensure that your automated tests (as well as manual QA tests) will simulate production as close as possible instead of relying on pre-determined test data. This can provide a greater opportunity for the test to break and need revisiting, avoiding potential issues in production.
Seed data with the PlanetScale CLI
There are times where bringing production data over for testing is not practical. In that situation, you'd may consider seeding the test database branch with the required data, as well as generate the connection strings required by the test environment. Luckily this functionality exists within the PlanetScale CLI and can be automated. The pscale branch subcommand would be used to manage branches for a specific database. The pscale shell subcommand can actually accept a piped in .sql script in exactly the same way you'd use the mysql terminal client. And finally the pscale password subcommand can be used to generate credentials for a given database branch.
Delete the test branch
Once integration tests have completed successfully and all are passing, it is not likely that your test branch needs to remain active. Since the test branch is an isolated MySQL instance, you are safe to drop the branch without retaining any of the data it contains. This can be done manually through the PlanetScale dashboard, or you may use either the PlanetScale CLI or API to automate the process of deleting your test branch.
If you decide to script the CLI in your pipeline, you can use pscale branch delete <DATABASE_NAME> <BRANCH_NAME> to discard the branch and the data it holds. The PlanetScale API offers the /organizations/<ORGANIZATION>/databases/<DATABASE_NAME>/branches/<BRANCH_NAME> to delete branches if you want to use curl or other tools that can send HTTP requests.
In this guide, we discussed the Test phase and dedicated database branches can be used to assist with integration testing. Next up is the Release phase, where the new software is set up for a successful deployment into production.
Release phase
The Release phase is where the 'Ops' part of DevOps starts. In the release phase, the primary goal is to ensure that the infrastructure and environment are set up for a successful launch of the updated software. This can include spinning up or down servers, updating operating system configurations, or setting up any other necessary infrastructure to support the application. At this point, the updated code should have been thoroughly tested and confirmed to be working to the best of the team's ability.
Deploy requests in PlanetScale allow you to safely merge schema changes from a development branch into a production branch with safe migrations enabled. This allows for zero-downtime deployments of a new version of the database's schema. This is the phase where deploy requests will be utilized to prepare for a successful deployment.
Deploy requests
Deploy requests are used to merge schema changes from one branch to another, similar to how pull requests merge git branches. Deploy requests work by creating shadow tables that store the new version of the schema for that specific table and replicating data from the old version to the new one. This includes any writes that may occur during the process. When the data in both tables are synced up, you are presented with the option to cut over to the new version of the table, provided the auto-apply option wasn't enabled.
During this phase, a deploy request should be opened from the dev branch into the production branch of your database but not applied until the Deploy phase coming up next. This will allow PlanetScale to stage the changes that need to be applied to your production database branch without affecting the current production version of your application.
This means that as long as the deploy request is not "applied" to the target branch, PlanetScale will continuously keep the live table (old schema) and shadow table (new schema) in sync until your team is ready to deploy the new artifacts into production.
A note on blue/green deployments
If you are at the Scalar tier and above, PlanetScale databases support multiple production branches. Production branches automatically have an additional failover instance of your database ready behind the scenes to improve redundancy. While there are no tools directly within PlanetScale to assist with blue/green deployments for your database, multiple production branches can significantly reduce the administrative overhead of managing multiple MySQL environments.
In this guide, we discussed the Release phase and how your production database can be set up for a live cutover using deploy requests. Next up is the Deploy phase, where all of the work is deployed to your production environment.
Deploy phase
Everything in the previous phases has been building to this point. It's where all the hard work gets deployed to production for the world to use. The operations team will coordinate to copy release artifacts that have been built & tested to production servers. If your team follows a blue/green strategy, the load balancers will instead start redirecting traffic to the staging server, and the current production environment takes the responsibility for staging the next cycle of application updates.
Branching and deploy requests in PlanetScale are the primary features that enable PlanetScale databases for flexibility in a DevOps environment. During this phase, any open deploy requests should be closed and applied to the production branch. All of the schema changes that were configured during development should now be in production, along with the new code that requires the changes to those tables. If your organization has opted into the schema revert feature, this starts the 30-minute window where you have the option to revert the changes in case something goes catastrophically wrong.
Back out of changes with schema revert
Having a great deployment strategy is key to successfully implementing DevOps, but knowing how to properly back out of changes can be just as important. Many source control management systems can be automatically configured to retain a certain number of previous releases which can be used to roll back application code, but doing so for a database can be difficult without affecting the data it holds.
As stated previously in this series, Deploy request utilizes shadow tables to synchronize data changes between tables with the old version of the schema and tables with the new version of the schema. If schema revert is enabled, we will continue to synchronize changes between the live and shadow tables for a period AFTER the deploy request is closed and changes applied. This enables you to quickly revert the changes made by a deploy request and instantly bring back the old version of your schema. Having this capability can significantly decrease the time to revert changes, as well as reduce the potential for your application to stay in a bad state long term. This alone can increase developer confidence when it comes to applying changes to the database.
In this guide, we discussed the Deploy phase and how any open deploy requests should be closed at this point, as well as how schema revert can help back out of bad changes quickly. The next step in the DevOps cycle is the Operate phase, where the Operations team maintains the infrastructure that powers the application.
Operate phase
Now that everything is deployed into production and confirmed working, the operations team's main focus is keeping everything online. Ideally, this is done with a system that will monitor application load to detect spikes in usage and automatically scale resources up to keep up with the traffic. This can be accomplished with platforms like EC2 in AWS, but also on-premise with Kubernetes.
All PlanetScale tiers eliminate the need to maintain your MySQL infrastructure by allowing us to do it for you. Additionally, any production branches automatically have failover replicas so even if something fails with one instance internally, a backup is always available to take over while any necessary maintenance is performed by our teams.

On top of reducing the necessity of maintaining a MySQL environment, PlanetScale offers additional features that can simplify the jobs of the operations teams.
Backup and restore
Any well-run operations teams know that backing up and restoring data is a critical task that must be taken seriously. A time will inevitably come when data is lost whether that is due to bad code or mistakes during the deployment process. Having a way to retain snapshots of your data at specific points in time for recovery is critical, and this functionality is built into PlanetScale databases. All databases on our platform have a daily backup configured automatically, regardless of which tier you are on. Additional backups and retention periods can also be configured, with the only additional cost being the storage used by the backups.
One thing that can be overlooked is the fact that backups are pointless if the data within them doesn't restore properly. Since we support the concept of database branches, and those branches are isolated instances of MySQL, restoring a backup will create a dedicated branch for the data to reside. This can vastly simplify the process of performing test restores. If you can quickly configure new environments using Infrastructure as Code tools, you can easily spin up entire production-like environments to fully test your application, which can dramatically improve the confidence of the operations team.
Horizontal scaling
PlanetScale is built in Vitess, which is an open-source project that enables horizontal scaling for MySQL databases. Sharding, available on our Base plan, further reducing the load on individual nodes as well as increasing performance and resiliency.
Read-only regions
When creating databases or branches, you'll be presented with the option to select which region you'd like your database created in. After creation, you'll also have the option to create read-only regions. This adds a replica of your database in a specific geographical location to more quickly serve queries by users in that area. Traditionally this would require operations teams to set up additional data centers linked by VPN tunnels or private ISP networks to securely synchronize data, but this is all handled by PlanetScale without such complexity.
In this guide, we discussed the Operate phase and discussed features that PlanetScale offers to make the lives of Ops members easier. The last step in the DevOps cycle is the Monitor phase, where feedback and metrics are gathered for decision-making before the next iteration.
Monitor phase
The last phase of the DevOps cycle is to monitor the entire application. This can be by gathering feedback from customers that use the application, but also to monitor performance metrics that the application tracks. This feedback should be used in decision-making when the team inevitably comes together again to plan the next cycle.
One important metric of your application's performance is how quickly your queries are executed. Slow-running queries can bring an application to its knees.
PlanetScale Insights
PlanetScale offers Insights with every database that is hosted on our platform, which is a visual way to see how well your queries are performing. Performance data is automatically tracked in real-time and displayed on a graph so you can see periods of high usage. You can also see which queries are executed most frequently or are taking the longest to return data.
If your database is enrolled in the schema-revert feature, the metrics gathered by Insights could help in making a data-driven decision on if the schema you just deployed to production is experiencing issues and needs to be rolled back. While having your own logging and monitoring platform to analyze errors in your code is definitely a best-practice, this would act as an additional layer of analytics and may help in reducing downtime overall.
PlanetScale Connect
PlanetScale Connect is a feature provided to our databases that allows you to extract data from the database and safely load it into remote destinations for analytics or ETL purposes. Using Connect with our supported destinations can enable you to further process the data in any way your organization may need. This can help provide detail as to how users are using your application based on the data that's written to your database and assist in driving decisions in the next planning cycle.
We currently support loading data into Airbyte and Stitch destinations, with more planned for the future.
Datadog integration
If you are a Datadog customer and use their platform to centralize your analytical data, we offer an integration with the service. Our integration will gather similar data that is displayed in Insights and forward it to a PlanetScale dashboard that is automatically created when the integration setup is complete. Refer to the Datadog integration article for more details on how this can be configured for your PlanetScale database.
While the Monitor phase is what concludes the typical DevOps cycle, it loops back into the Plan phases for the next iteration to be set up. At this point, you should be well-equipped to make intelligent decisions on how to integrate PlanetScale into your existing pipelines, or understand how to get started with DevOps altogether!
Feel free to explore more of our documentation to further your understanding of the platform.
Real-world scenario
DevOps is very much a "choose your own adventure" set of guidelines and that can make it confusing for teams to properly implement it given the number of choices available from code language, tooling, process, etc. The following section describes a fictitious team as they implement a new feature in their codebase. Throughout the section, we'll call out specific tools that are common in the industry to implement much of the process described in the above sections. As expected, the various features available by PlanetScale will also be described as the story progresses.
This section is about a fictitious company that uses a PlanetScale database to back its application and utilizes many of the techniques discussed in the phase-specific articles in our documentation.
Background
The story follows Mechanica Logistics, a small warehousing and transportation company with a web application that their customers can use to place new shipping orders or track the status of existing orders. Since they are a small business, its tech team has a size to match. Jenny is their Architect and Lead Backend Developer. She primarily works with the other backend developer, Ricardo, when working on their API written with Go. Malik is the team’s designer and front-end developer and he is responsible for maintaining the React web application used by customers. Finally, Ainsley is the company’s sole Systems Engineer, responsible for maintaining the AWS infrastructure performing well.
Mechanica uses the following tools in its tech team:
Tool
Use case
Jira
Organize and assign work, and create development iterations.
GitHub
Source control management.
Slack
Team messaging and system notifications.
Jenkins
Builds, tests, and deploys the application updates.
Datadog
Provides a dashboard to monitor application and infrastructure performance.
PlanetScale
Hosts their MySQL databases.
Terraform
Automate AWS infrastructure management.
Atlas CLI
Perform schema migrations.
Infrastructure
Mechanica uses AWS as its primary cloud provider, with the exception of using PlanetScale for its MySQL database. Their React front end operates as a single-page application and is stored in a dedicated S3 bucket. A CloudFront instance is used in front of it to use a custom domain name, as well as cache the front end as close to end users as possible. The API is written in Go and is running on two Linux EC2 instances in production. A technique called “blue/green” is used with the API, so one instance is always live and the other is used as the staging server.
There are three environments active at any time. A development environment is used for building and testing new functionality by the developers. A test environment is used by Jenkins to run automated tests to ensure that everything is built according to spec. Finally, there is the production environment that's used by Mechanica customers. Although there are three separate environments, a single PlanetScale database is used, with a separate database branch configured for each environment. The production database branch also has safe migrations enabled. This prevents accidental changes to the schema by enforcing the use of the PlanetScale flow, requiring that schema changes be made using branching and deploy requests.
The request
One of Mechanica’s biggest partners, Empress Products, recently experienced large unexpected growth and their shipping orders likewise increased. Due to the increase in orders, the systems at Empress were struggling to continuously poll for order status using Mechanica’s API and needed another solution. The tech team at Empress submitted a request that Mechanica figures out a way to send them updates on order status whenever things change instead. Since Empress was one of their largest customers, they decided to prioritize it and address it during the next development cycle.
Plan and Code
Early Monday morning, the team at Mechanica assembled as they do every two weeks to decide what needed to get done in this development cycle. Jada, the company project manager, was also present as usual to provide insight on the feedback they’ve gathered from Mechanica customers. Jada informed the team of the request from Empress. After some brainstorming among the technical team, they settled on building a system that used webhooks, a way to allow the systems at Mechanica to submit status updates to any HTTP endpoint at the point when an order changes, in near real-time. As the planning concluded around the new system, the team identified the following required changes:
Update the front end to allow customers to register webhooks.
Update the Customers table in the database to add columns for storing the webhook endpoints and signing keys for the webhooks system.
Create a new serverless function to process outgoing webhook messages, signing the messages and sending them to the customer endpoint.
Add a message queue to offload messages to buffer messages between the API and serverless function to reduce API load.
Identify anywhere in the current API that order statuses change to submit a message to the message queue.
As soon as the Sprint was created and confirmed, a Jira automation would use the PlanetScale API to create a fresh dev database branch for the team to begin working with. The most recent backup would also be specified to seed data into that branch, giving the team an isolated environment that mirrored production.
Each member of the team was assigned work relevant to their expertise. Malik built the necessary views required for the React application. This included views to create webhook endpoints, manage and delete existing endpoints, and generate signing keys as needed.
Jenny and Ricardo worked on building the backend components. The new serverless function would be written in Go and would be responsible for using the signing key to sign webhook messages and POST them to customer endpoints. The two were also able to identify where changes in the existing API code were needed to allow the API to dispatch messages into the message queue.
Ainsley takes the security of Mechanica databases very seriously and they do not give out connection strings, even to developers in case one of their systems gets infected. Due to this policy, Jenny and Ricardo proxy connections to the PlanetScale database using the pscale connect command of the CLI. When Jenny and Ricardo are working on the backend services, they simply run pscale connect to set up a tunnel to the database before they start their local development instance of the APIs. This allows the locally running instances of their APIs to connect to localhost, where the PlanetScale CLI will redirect the queries directly into the database without having to use connection strings.
The backend team also updates the schema definition file to add the new table that was required and used the Atlas CLI to apply the database changes to the dev branch of their database. This will ensure that the state of the database is always consistent and reviewable by the team (since the definition is managed by source control) instead of having developers apply changes manually and make mistakes.
Ainsley worked to build out a Terraform definition that would be used to not only create the new infrastructure components in AWS but maintain them going forward so that they didn't have to manually tweak settings as things changed over time. Along with Jenny’s help, the two of them were able to quickly update the configuration file for the API to add credentials allowing the API to submit messages to the queue, as well as deploy the new serverless function into the development environment for some live testing by the developers.
Once everything was built and manually tested by the developers, it was time to open a pull request for the monorepo and review all of the changes as a team. Since the team had been working together for several years at this point, only minimal changes needed to be made before the pull request was closed and it could move into testing.
Build and Test
At the moment the pull request closed, GitHub used a webhook to notify the Jenkins server to build the newest version of the code. Jenkins then cloned down the repo from GitHub at that specific commit where the PR was merged and compiled the API project and the new serverless function into their respective binaries and uploaded the artifacts to a dedicated AWS S3 bucket to store for usage throughout the pipeline.
Once the build stage of the pipeline was completed, it was time to move on to testing. Ainsley had previously spent weeks ensuring that the entire testing process was also automated by Jenkins. Since the team had taken a test-driven development approach to build the code, it had plenty of unit and integration tests built to ensure that the new code met the business requirements set during planning.
The process kicked off by running a Terraform command that would spin up the necessary infrastructure in AWS for testing. This would create an SQS queue in a dedicated AWS test account that could be used during integration testing to make sure the webhooks feature was built to spec. Next up would be building out the test database infrastructure.
Using the PlanetScale CLI, Jenkins would create a replica of the main production database branch by creating a new branch called test. This would automatically create an isolated MySQL environment where integration testing could be performed without affecting production. In the past, the team used to have a .sql script that would seed test data to their test branch for running this process, but more recently they’ve been using the Data Branching® feature set to restore the most recent backup of the main branch into test, creating an identical copy of their production database. To finalize the setup of the database, Jenkins would run the Atlas CLI to sync up the new table from the dev branch into test. Now the database looks exactly as it would once all of these changes make it into production.
Before running the test, the proper credentials needed to be generated and added to the project configuration. Jenkins would again use the PlanetScale CLI to generate a connection string and store it alongside the project. Next, Jenkins would use the cloned repository and run the go test command to execute all of the tests the team had written. This would not only be the unit tests that would validate business logic, but also the integration tests that would perform CRUD logic for storing and reading webhook configuration from the database, as well as simulating an order to check that the message gets processed as expected.
Once the tests have concluded and all have passed, Jenkins would use Terraform to tear down the test infrastructure in AWS and the PlanetScale CLI to delete the test branch since it is no longer required. Finally, Jenkins would once again use the PlanetScale CLI to open a Deploy request from dev into main, then notify the team using Slack so they could prepare for deploying the latest version of their application to production.
Release and Deploy
After Jenny, Ricardo, Malik, and Ainsley reviewed the test results and confirm everything went smoothly, they approve the Deploy request so PlanetScale can start synchronizing the changes from the development environment into production. Since this process uses shadow tables to effectively stage changes without making them live, the actual process of going live happens quickly and painlessly.
At this point, the latest version of the code has been thoroughly tested and the schema changes have been staged for the database. Ainsley logs into Jenkins and approves the final phase of the pipeline to deploy all of the changes to production. This kicks off a process where Jenkins utilizes deploy agents installed on the production EC2 servers to download the latest artifacts from S3, replace the old binaries, and restart the service that keeps the API alive. The script also creates the necessary SQS queue in AWS using Terraform and uses the PlanetScale CLI to apply the schema changes from the deploy request, which effectively cuts over the application to use the new version of the schema. Finally, the load balancer is updated to reroute traffic to the newest version of the application. After a week and a half of hard work, the changes are now live and can be used by Mechanica customers.
Operate and Monitor
Although the code has already gone through a rigorous testing process, it's inevitable that certain issues can occur once the application hits production as there are certain variables that simply can't be accounted for in testing. Upon deployment, Ainsley starts to monitor the Datadog dashboard configured to store the logs forwarded from AWS as well as Insights data forwarded from PlanetScale. This was important since the window to revert schema changes is open for 30 minutes, allowing for quickly rolling back changes.
The dashboard includes metrics detailing the operating capacity of the EC2 servers, network traffic, application errors, and query performance metrics. Since moving to PlanetScale, the team hasn’t had much to worry about regarding the database infrastructure since that is completely managed for them. This has freed much of Ainsley’s time to focus on optimizing the performance of other infrastructure components, so nearly all issues have been ironed out.
As the new feature started to be utilized, Ainsley did notice that some queries weren’t performing as expected based on analytical data being forwarded to Datadog from PlanetScale. Ainsley opened the Insights tab of the database to validate the data in their dashboard and indeed notice that the query for webhook configurations was performing a scan on the entire table instead of just the necessary rows. They decided to add a new issue to the Jira board to address it in the next cycle.
Although there was minor room to improve, the feedback from Empress Products on the new feature was overwhelmingly positive, and that they wanted this same functionality built into many other areas of the application. Jada took the feedback and added yet another issue in Jira to make in the future.
Conclusion
Although this story is fictional, it demonstrates how DevOps and PlanetScale can help streamline team processes and ease the pain of deploying applications into production. After reading this, you should have a better understanding of how these practices can be used within your organization.]]></content>
        <summary><![CDATA[Learn how to use PlanetScale within DevOps pipelines.]]></summary>
      </entry>
    
      <entry>
        <title>Using MySQL with SQLAlchemy: Hands-on Examples</title>
        <link href="https://planetscale.com/blog/using-mysql-with-sql-alchemy-hands-on-examples" />
        <id>https://planetscale.com/blog/using-mysql-with-sql-alchemy-hands-on-examples</id>
        <published>2023-03-07T14:00:00.000Z</published>
        <updated>2023-03-07T14:00:00.000Z</updated>
        
        <author>
          <name>Anthony Herbert</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[SQLAlchemy is a popular Python library that gives you many tools to interact with SQL databases. With SQLAlchemy, you can do things like send raw queries to a database, programmatically construct SQL statements, and even map Python classes to database tables with the object-relational mapper (ORM). SQLAlchemy doesn't force you to use any particular features, so it has the flexibility to support many different approaches to working with databases. You can use SQLAlchemy for one-off scripts, web apps, desktop apps, and more. Anywhere that you'd use a SQL database along with Python, you can use SQLAlchemy.
This tutorial will cover setting up an engine object in Python SQLAlchemy 2.0, which is the first step to using SQLAlchemy. Then, it will cover two specific ways of interacting with a database: with raw SQL statements and by using the object-relational mapper.
 This tutorial covers the most recent version of SQLAlchemy, which is 2.0, as of writing this. After running pip install sqlalchemy, you can run pip list to verify your SQLAlchemy version is greater than 2.0. 
Set up PlanetScale database
To demonstrate the examples in this SQLAlchemy tutorial, you'll need a MySQL-compatible database. If you don't have one already, you can get a database by signing up for a PlanetScale account.
This blog post is over a year old and may be out of date.
Once you have an account and are on the dashboard, create a new database by doing the following:
Click the "Create" link at the bottom of the dashboard.
Give your database a name.
Select a region.
Click the "Create database" button.
Finally, you can click on the "Connect" button and select "General" in the dropdown to see your database credentials. You'll need these credentials to create an engine in SQLAlchemy.

 PlanetScale uses a branching workflow, similar to git, so you can branch off of your production database when you need to make schema changes. This workflow lets you easily test changes before merging them into your production schema (again, very similar to what we're used to when deploying code changes). For this tutorial, you can just use the default initial branch, main, for development. 
Set up the engine object
The first thing to do when using SQLAlchemy is to set up the engine object. The engine object is used by SQLAlchemy to manage connections to your database. Later, when you go to perform some action on the database, a connection will be requested from the engine and used to send the request.
Before creating the engine, you first need to know the credentials of your database along with the database driver you'll use to connect to the database. For MySQL, connection strings look like this:mysql+<drivername>://<username>:<password>@<server>:<port>/dbname

Install the Python MySQL database driver
Since SQLAlchemy works with many different database types, you'll need an underlying library, called a database driver, to connect to your database and communicate with it. You don't have to use this driver directly, because as long as SQLAlchemy has the correct driver, it will automatically use it for everything. The Python MySQL Connector is used as the driver in this tutorial, but other good ones are PyMySQL and MySQLdb.
You'll need to install your driver with pip.pip install mysql-connector-python

So let's say your username, password, hostname, port, and database name are user1, pscale_pw_abc123, us-east.connect.psdb.cloud, and 3306, respectively. Your connection string would look like the following if you were using mysqlconnector as your driver to connect to a database named sqlalchemy.mysql+mysqlconnector://user1:pscale_pw_abc123@us-east.connect.psdb.cloud:3306/sqlalchemy

Create SQLAlchemy engine object
Once you have your driver installed and your connection string ready to go, you can create an engine like this:from sqlalchemy import create_engine
connection_string = "mysql+mysqlconnector://user1:pscale_pw_abc123@us-east.connect.psdb.cloud:3306/sqlalchemy"
engine = create_engine(connection_string, echo=True)

Typically, you don't need echo set to True, but it's here so you can see the SQL statements that SQLAlchemy sends to your database.
By default, SSL/TLS usage in mysql-connector-python is enabled, which is required to connect to PlanetScale. This means you do not need to pass it into create_engine() as a connection arguement. See the Python connection arguments MySQL docs for more info and to see all of the possible arguments.
If you run the code and get no errors, then SQLAlchemy has no trouble connecting to your database. If you get an error like "access denied" or "server not found," then you'll need to fix your connection string before proceeding.
With the engine object working, you can then continue with SQLAlchemy in various ways. This article covers how to use it to send raw queries to your database and how to use it as an ORM.
Raw SQL statements in SQLAlchemy
Now that we have our engine object working let's use it to send raw SQL statements to the database and receive the results in return.
Create a connection object
To start sending queries over, you'll need to create a connection object. Since the engine manages connections, you need to ask the engine for a connection before you can send statements over to the database.
We need to call engine.connect() to get a connection, and because the connect method is defined with a context manager, we can use a with statement to work with the connection.with engine.connect() as connection:

Now that you have the connection, you can execute any SQL statement that works on your database by importing the text function from SQLAlchemy. Add from sqlalchemy import text in your Python file. You then pass the query as a string to the text function and then finally pass the text function to connection.execute().
Create a table
To create a table, you can run the following code:connection.execute(text("CREATE TABLE example (id INTEGER, name VARCHAR(20))"))

If you run that and get no errors, that means the table was created. If you try to run the code again, you'll get an error saying the table already exists.
You can also go back to your PlanetScale dashboard to confirm the table was added. Click on your database, click "Console", connect to your branch, and run the following:SHOW tables;


Add data to a table
Next, let's insert some data into our new table.connection.execute(text("INSERT INTO example (name) VALUES (:name)"), {"name": "Ashley"})
connection.execute(text("INSERT INTO example (name) VALUES (:name)"), [{"name": "Barry"}, {"name": "Christina"}])
connection.commit()

The first execute statement will create just one row because a single dictionary was passed. But the second statement will create two rows since a list of two dictionaries was passed. Just make sure the keys in the dictionary match the placeholders you have with a colon in front of the name.
Even though we aren't dealing with user input here, it's still a good idea to make a habit of passing in parameters instead of the data directly inside of an insert statement or select statement.
Unlike CREATE statements, INSERT statements happen in transactions, so we have to save them to the database by calling .commit() after we execute our insert statements.
Query data
Finally, now that we have some data in the database, let's go ahead and query that data so we can see it again. We can assign the result of connection.execute() to a variable called result, and if we loop over result.mappings(), we'll see that we get dictionaries for each row, where the key in each dictionary represents the column name. This makes it easy for us to retrieve the data and display it in a loop.result = connection.execute(text("SELECT * FROM example WHERE name = :name"), dict(name="Ashley"))

for row in result.mappings():
    print("Author:" , row["name"])

As you can see, you only need to know a few things to write raw queries using SQLAlchemy. If you want to use it as an ORM, you can do that as well.
Using SQLAlchemy ORM to write queries
The idea behind ORM (object-relational mapping) is to create a code representation of your database using classes and objects instead of writing raw SQL statements. The classes represent the tables in your database, and the objects of those classes represent rows. So the first step to using ORM is to define classes that map to your tables. Classes that represent tables in an ORM are called models.
Before we can do the mapping, we need something called the DeclarativeBase from SQLAlchemy. Even though our classes could inherit directly from DeclarativeBase, we will instead create our own Base class that inherits it and then call pass. This makes it easy to add additional settings to our Base class in the future since all of our models will inherit from this one.class Base(DeclarativeBase):
    pass

Create models
Now we can create our models. Let's first create an Author model which will map to an author table. The idea here is to first define a tablename, which is the attribute __tablename__.
Define the columns
Next, we need to define the columns. Starting in SQLAlchemy 2.0, we can use the Python typing system to define the columns for us. So the format of each column is the name of the column, followed by a class called Mapped with the Python type that closest matches the type you want in the database.
So for an ID column, we would have id: Mapped[int]. Next, that attribute is going to be set equal to the mapped_column function call, where we could set additional properties on our column like primary key, max length, nullable, etc.
So let's create an Author model with two fields: id and name, which means we'll have a table with two columns. SQLAlchemy requires each model to have a primary key so it can internally keep track of each object, so let's make ID the primary key.class Author(Base):
    __tablename__ = "author"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))

Handling relationships and foreign key constraints
PlanetScale now supports foreign key constraints. This information below is out of date, but it will still correctly work.
Next, let's create a Post model, which will have a relationship to the Author model. We can create an author_id column inside of Post that holds the reference to the author who created the post. For most database systems, you'd pass ForeignKey to mapped_column to create an actual foreign key constraint in the database. But with PlanetScale, we don't recommend using foreign key constraints. However, we can still use SQLAlchemy to manage the relationship for us.
Since databases with foreign key constraints are very common, variations for those databases are included in the commented-out lines.class Post(Base):
    __tablename__ = "post"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(30))
    #author_id: Mapped[int] = mapped_column(ForeignKey("author.id"))
    author_id: Mapped[int]

The advantages of having no foreign keys are we can have multiple versions of our database schema in the same way we have multiple versions of code through things like git branches. It also allows us to make schema changes to production databases without any downtime. And finally, it makes it easier to scale the database through sharding.
But even without a foreign key, we can still have a relationship between two tables. To get SQLAlchemy to manage the relationship for us, we can create a relationship attribute. Unlike the attributes for the columns, no column gets created in the database. Instead, the relationship only exists in our code while it's running.
So we can add a relationship attribute and the type will be a list of Author classes, which we can pass to the Mapped class as the type.posts: Mapped[list["Post"]] = relationship(primaryjoin='foreign(Post.author_id) == Author.id')

Since we're not using ForeignKey, we need to tell SQLAlchemy how to handle our relationship. We can do that we the primaryjoin argument to relationship. If we used a database with foreign keys, then the ForeignKey being passed to the mapped_column would be enough.
We can also create a Tag model in a similar way. This Tag model represents tags that each post could have.class Tag(Base):
    __tablename__ = "tag"

    id: Mapped[int] = mapped_column(primary_key=True)
    text: Mapped[str] = mapped_column(String(30))

Because one post can have many tags and one tag can belong to many posts, we need to create a many-to-many relationship. We can create a post_tag table to represent this relationship. Many-to-many relationships have the foreign key stored in a separate table called an association table. We can create that table directly in SQLAlchemy. You could also use a model for this, but it's better to use a table because you won't be working with this table directly. Instead, SQLAlchemy will automatically manage the data in this table for you by using the relationships you define.
You can create the table like this:post_tag = Table(
    "post_tag",
    Base.metadata,
    #Column("author_id", ForeignKey("author.id"), primary_key=True),
    #Column("tag_id", ForeignKey("tag.id"), primary_key=True),
    Column("post_id", Integer, primary_key=True),
    Column("tag_id", Integer, primary_key=True)
)

Create tables with create_all()
Now that we have the Tag table defined, to create the tables in the database, you can call create_all on your Base class. The create_all call takes an engine object, so you can reuse the one we created earlier.
Create_all will take about DeclarativeBase and instruct it to create statements for each one of our tables and add them to the database. You'll see that printed to your terminal when you run.Base.metadata.create_all(engine)

With our tables created, we can go ahead and insert data into the tables and then query the tables.
Since we're working with the ORM, the way to create new rows is first by creating objects. So for example, to create a new Author, we can instantiate an author object.
For relationships, we can set one object to be related to another when we instantiate the related object. We want to use the relationship attribute instead of the _id field directly because SQLAlchemy will take care of the ID field for us. For many-to-many relationships, we append to the relationship attribute like it's a Python list. We only need to append children of the relationship.
To add them to the database, we need to first add them to the session. Finally, we need to call commit to save them to the database. When you run the script, you'll see the actual insert statements being printed.with Session(engine) as session:
    author = Author(name="David")
    post = Post(title="Python Essentials", author=author)
    session.add(author)
    session.add(post)

    post2 = Post(title="SQL Secrets", author=author)
    post3 = Post(title="Advanced MySQL", author=author)
    session.add_all([post2, post3])

    tag1 = Tag(text="python")
    tag2 = Tag(text="sql")
    tag3 = Tag(text="mysql")
    session.add_all([tag1, tag2, tag3])

    post.tags.append(tag1)
    post2.tags.append(tag2)
    post3.tags.append(tag2)
    post3.tags.append(tag3)

    session.commit()

Query the data
Now that we have some data in the database, we can go ahead and query that data. First, we write our query statement.
Start by passing your Model call to the select function. Then you have the option to use the where attribute on the resulting object. We can then pass all of that to session.scalar to run the query. With the result of scalar, we can print out the results to the terminal. We can also look at the values in the relationship. For each post, we can look at the tags as well.
If you want to leave out the where and get all of the posts, you will use scalars instead of scalar. Then we can loop over the object returned and print out all the titles.stmt = select(Author).where(Author.name == "David")
author = session.scalar(stmt)

for post in author.posts:
    print(post.title)
    for tag in post.tags:
        print(" ", tag.text)

for post in session.scalars(select(Post)):
    print(post.title)

Conclusion
With your new knowledge of SQLAlchemy, you should have a good starting point to continue using it in any Python project that uses a SQL database. As long as you can set up the engine object, you'll be able to decide whether you want to simply send raw SQL statements, construct SQL statements using the SQLAlchemy API, or map your Python classes and objects to your database tables and data.]]></content>
        <summary><![CDATA[Learn how to use Python SQLAlchemy with MySQL by working through an example of creating tables, inserting data, and querying data with both raw SQL and SQLAlchemy ORM.]]></summary>
      </entry>
    
      <entry>
        <title>Improvements to database branch pages</title>
        <link href="https://planetscale.com/blog/improvements-to-database-branch-pages" />
        <id>https://planetscale.com/blog/improvements-to-database-branch-pages</id>
        <published>2023-03-01T14:23:00.000Z</published>
        <updated>2023-03-01T14:23:00.000Z</updated>
        
        <author>
          <name>Jason Long</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Branching is a core part of the PlanetScale workflow, and we've just released some updates to improve the experience.
All of the information about a branch is now visible on a single page. Production and development branches each have their own unique improvements.
Your schema front and center
Navigating to the schema page for your branch no longer requires an extra click. Individual tables can be expanded and collapsed and you can easily search by table name. When you're looking at a production branch, you will also see the size of each table.
Regions are shown on the side along with a button for quickly opening a web console session.

Development branch diffs
Development branches now show a diff of what has changed relative to its parent. The changed tables are shown above the other tables, making it convenient to reference while opening a deploy request from the sidebar.

These are just the beginnings of some updates we have in development for making branches more configurable for different use cases. As always, we'd love to hear your feedback on Twitter or our GitHub discussion board.]]></content>
        <summary><![CDATA[Learn about some of the latest enhancements we made to the Branching page in the PlanetScale dashboard.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 16</title>
        <link href="https://planetscale.com/blog/announcing-vitess-16" />
        <id>https://planetscale.com/blog/announcing-vitess-16</id>
        <published>2023-02-28T15:50:00.000Z</published>
        <updated>2023-02-28T15:50:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We are pleased to announce the general availability of Vitess 16.
Major themes in Vitess 16
Documentation improvements
In this release, the maintainer team has decided to put an emphasis on reviewing, editing, and rewriting the website documentation to ensure it's current with the code. With help from CNCF, we have also improved the search experience. We welcome feedback on the current iteration of the docs.
GA announcements
We are marking VDiff v2 as Generally Available or production-ready in v16. We now recommend that you use v2 rather than v1 going forward. Version 1 will be deprecated and eventually removed in future releases.
This new version of VDiff offers a much improved overall user experience, especially when migrating very large tables. You can read more about VDiff v2 in the Introducing VDiff V2 blog post.
VTOrc is now mandatory
VTOrc is a required component of Vitess starting from this release. You must run at least one instance of VTOrc in order for Vitess to automatically manage the backing MySQL clusters.
MySQL compatibility improvements
We have been making steady progress on adding query support for more MySQL constructs. In this release, we have added support for Views in Vitess. It is now possible to create views that access data across shards, and they will work as intended in Vitess. Note that this is considered an experimental feature. It will move to GA in a future release.
Other improvements
Support for native incremental backups and point-in-time recoveries has been added. It is now possible to take an incremental backup, starting with the last known (full or incremental) backup, up to either a specified (GTID) position or the current ("auto") position. Using these incremental backups, you can restore a backup up to a given point in time (GTID position) without relying on a binlog server. Note that this is only supported for the file-based builtin backup method, not for xtrabackup.
A new VEXPLAIN command has been introduced to help users gain more insight into query planning in Vitess. This gives users the ability to inspect the query plan produced by VTGate, all the queries executed on the MySQL instances, and the MySQL EXPLAIN output for the executed queries.
Try it out
We are very pleased with the great strides we have made with v16 and hope that you will be as well. We encourage all current users of Vitess and everyone who has been considering it to try this new release! We also look forward to your feedback, which can be provided via Vitess GitHub issues or Vitess Slack.]]></content>
        <summary><![CDATA[Vitess 16 is now generally available with updates to VDiff v2, VTOrc, MySQL compatibility, and more.]]></summary>
      </entry>
    
      <entry>
        <title>What are the disadvantages of database indexes?</title>
        <link href="https://planetscale.com/blog/what-are-the-disadvantages-of-database-indexes" />
        <id>https://planetscale.com/blog/what-are-the-disadvantages-of-database-indexes</id>
        <published>2023-02-17T14:45:00.000Z</published>
        <updated>2023-02-17T14:45:00.000Z</updated>
        
        <author>
          <name>JD Lien</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[If you've worked with databases for a while, you've probably learned that adding indexes can improve performance. This is especially true for large tables when you are querying with JOINs, GROUP BY, WHERE, or ORDER BY clauses.
An index basically works by storing a copy of part of the data in a different order, so that it can be accessed more quickly — kind of like adding a table of contents to a book.
For a more detailed explanation of how indexes work and how you can use them, check out this article: How do database indexes work? If you want to dive even further to indexes, we have 17 videos on indexing that cover everything from how indexes and B+Trees work to knowing where and when to add indexes.
Making good use of indexes can reduce query run time from seconds to milliseconds. The first time you get a performance boost like that, you might feel inclined to add indexes to every column of every table in your database just because you can. But this is not always a good idea, as there can be drawbacks to adding too many secondary indexes.
 You should always include a primary index on every table in your database. However, too many secondary indexes can begin to cause issues in some instances. This article covers issues that come with too many secondary indexes. 
Downsides of database indexes
Let's go over some of the possible downsides of using too many database secondary indexes.
Additional storage
The first and perhaps most obvious drawback of adding indexes is that they take up additional storage space. The exact amount of space depends on the size of the table and the number of columns in the index, but it's usually a small percentage of the total size of the table. A basic index only needs to store the values of the indexed columns as well as a pointer to the row in the table. So for a column that contains integers, the index will only need to store the integer values. This space will increase if the column contains strings because the index will need to store the string values as well as the length of each string.
This is important to consider if you have large datasets, as adding multiple indexes to a table can quickly use a significant amount of additional storage space.
Slower writes
When you add an index, it has to be updated whenever a row is inserted, updated, or deleted. This means that writes will be slower. Before you add an index, you should consider whether you will be doing a lot of writes to the table and whether or not you can afford to slow down the writes.
As an example, in one application I worked on, doing a bulk insert of about a million records only took around 10-15 seconds without any indexes. Unfortunately, the performance of certain frequently used queries was quite slow, taking a few seconds to run and causing a bad user experience. Adding several indexes for such queries improved the performance significantly, but the bulk insert now takes closer to two minutes. That is a significant difference in write performance, but in this particular case, it was an acceptable trade-off, as the bulk insert is done infrequently and can be done during off-peak hours when the application is not used heavily.
If something like this bulk insert was triggered by users who had to sit and wait for it, then it might be a different story, and I may have weighted the impact of the slower writes differently.
Finding and removing unused indexes
To keep your database efficient, it's important to find and remove any unused indexes. In MySQL, you can use the following query to find indexes that are not being used (replace your_database_name with the name of your database):SELECT table_name, index_name, non_unique, seq_in_index, column_name, collation, cardinality, sub_part, packed, index_type, comment, index_comment
FROM information_schema.STATISTICS
WHERE table_schema = 'your_database_name'
AND index_name != 'PRIMARY'
AND (cardinality IS NULL OR cardinality = 0)
ORDER BY table_name, index_name, seq_in_index;

This query checks the cardinality of each index, which is the number of unique values in the index. If this value is 0, then the index is not being used.
If you find an unused index, your_index_name, in a table called your_table_name, you could remove it with the following query:ALTER TABLE your_table_name DROP INDEX your_index_name;

Auditing all indexes in a database
If you have some indexes that are in use, but after reading this article, you think some of the trade-offs may not be worth it, you can audit each of these individually to see if you want to keep or remove them.
To get a list of all indexes for all tables in your database, run:SELECT * FROM information_schema.statistics;

Now that you've identified all of your indexes, you can use MySQL invisible indexes to determine which ones you may wish to drop.
Using invisible indexes to test dropping an index
One way to test the outcome of dropping an index before actually dropping it is to utilize MySQL's invisible indexes.
(If you prefer video, you can watch our video on invisible indexes.)
With invisible indexes, you can keep the index intact but essentially hide the index from MySQL so that queries do not use the index. This gives you a way to quickly test the impact of removing an index without completely destroying it.
 You can use PlanetScale Insights to quickly see the performance (rows read, rows returned, total time, time per query, etc.) of any query in your database. This is a quick and easy way to test performance before and after making an index invisible. 
To make an index invisible, run the following query:ALTER TABLE your_table_name;
ALTER INDEX your_index_name INVISIBLE;

You can now run any applicable queries to see how performance is impacted. If you realize you still need this index, you can make it visible again with:ALTER TABLE your_table_name;
ALTER INDEX your_index_name VISIBLE;

 With PlanetScale, we don't allow direct DDL on production branches, unless they have safe migrations disabled (not recommended). So, you'll have to go through the deploy request process to test using invisible indexes. However, with our Revert feature, you can simply click the "Revert" button if you decide you want to undo an altered or dropped index and it will be reverted near instantaneously. 
When doing this, ensure that you test the performance of any affected queries before and after removing the index to make sure that you are not inadvertently making things worse.
Conclusion
Adding indexes can be a great way to improve performance, but it's important to be aware that they do come with a cost. Every index takes up additsional storage, can slow down write operations, and can complicate the query optimizer's job, so they aren't always guaranteed to improve performance. Ultimately the decision to add indexes should be based on the specific needs of your application and the trade-offs you are willing to make. You should always measure the performance of your queries before and after adding indexes to see if they are actually improving performance, and if you don't seem to be seeing significant improvement for your desired use, then it may be better to leave the indexes out.]]></content>
        <summary><![CDATA[Learn about some of the possible downsides of using database indexes and how to remove unused database indexes in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>Faster MySQL with HTTP/3</title>
        <link href="https://planetscale.com/blog/faster-mysql-with-http3-video" />
        <id>https://planetscale.com/blog/faster-mysql-with-http3-video</id>
        <published>2023-02-16T13:00:00.000Z</published>
        <updated>2023-02-16T13:00:00.000Z</updated>
        
        <author>
          <name>Matt Robenolt</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Join PlanetScale’s Lead Infrastructure Engineer for Edge connectivity for an explainer on our findings regarding HTTP/3 and the MySQL protocol.]]></summary>
      </entry>
    
      <entry>
        <title>Migrating from Postgres to MySQL</title>
        <link href="https://planetscale.com/blog/migrating-from-postgres-to-mysql" />
        <id>https://planetscale.com/blog/migrating-from-postgres-to-mysql</id>
        <published>2023-02-09T15:29:55.100Z</published>
        <updated>2023-02-09T15:29:55.100Z</updated>
        
        <author>
          <name>Adnan Kukic</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[PlanetScale now supports PostgreSQL.
Choosing a data store is one of the most important decisions you'll make when building software applications. The good and bad news though, is that there is an abundance of options. Depending on the type of application you're building, you may opt for a relational database like MySQL or Postgres, a non-relational database like MongoDB or CouchDB, a graph database like Neo4j, or one of the many other alternatives that each bring their own benefits and drawbacks.
It is a difficult choice and sometimes you realize that the database you initially went with no longer serves the needs of your application and you have to make the choice to migrate.
In this blog post, we'll take a look at how you can migrate your database schema from PostgreSQL to MySQL. MySQL and PostgreSQL or Postgres are both relational databases that have a lot of similarities, but also have a fair amount of differences that can make migration a challenge.
A detailed list of the differences between PostgreSQL and MySQL can be found below:
Metric
Postgres
MySQL
Licensing
Released under PostgreSQL license, a free open-source license similar to BSD or MIT licenses
Source code is available under the terms of the GNU General Public License
ACID
Yes
Yes
Triggers
Supports AFTER, BEFORE, and INSTEAD OF triggers for INSERT, UPDATE, and DELETE statements
Supports AFTER and BEFORE triggers for INSERT, UPDATE, and DELETE statements
Unsigned integer
No support
A column can be made not to accept negative values
Materialized views
Supported
Not supported
ANSI/ISO SQL compliance
Fully compliant
Mostly compliant
Drop temporary table
No TEMP or TEMPORARY keyword in DROP TABLE statement
Supports the TEMP or TEMPORARY keyword in the DROP TABLE statement, allowing you to remove just the temporary table
Table partitioning
Supports RANGE, LIST, and HASH
Supports RANGE, COLUMN, LIST, HASH, KEY, and either HASH or KEY for composite partitioning
These differences are important to keep in mind when making the decision to migrate from Postgres to MySQL and vice versa. With that said, let's dive into how we can migrate an existing Postgres database to MySQL.
An example migration from Postgres to MySQL
In this article, we'll take the approach of migrating from PostgreSQL to MySQL manually. While there are software tools, ORMs, and other approaches we can take that might abstract the migration from one database to another, and we will look at those approaches in subsequent tutorials, here we want to explore what you'll need to consider at a low level.
For our sample migration, we are going to compare an instance of PostgreSQL and what a migration to MySQL might look like. For PostgreSQL, we'll use a locally hosted server. For MySQL, rather than setting up a local database, we'll test using PlanetScale's hosted MySQL offering. If you don't already have a PlanetScale account, you can sign up now.
Differences between our Postgres and MySQL schema
In our PostgreSQL schema, we have three tables: a products table that holds information about our inventory, a customers table that holds information about our customers, and an orders table that keeps records of orders placed by our customers.
The products table in PostgreSQL looks like the following:CREATE TABLE products
(
    id          SERIAL,
    name        VARCHAR,
    description VARCHAR,
    price       INTEGER
);

So the data could be represented as:
id (SERIAL)
name (VARCHAR)
description (VARCHAR)
price (INTEGER)
1
Achieving PlanetScale
Achieving PlanetScale teaches you how to think bigger and more horizontally!
50
2
The Database that Could
Follow the adventures of the database that could accomplish anything it wanted
100
VARCHAR in Postgres vs MySQL
Looking at the data types within this table, the migration to MySQL should be pretty straightforward. All of the data types match up with what we have available in MySQL. While on the surface it does look pretty straightforward, there are a few things to consider. In MySQL for the varchar type, we have to set a max value, whereas in Postgres we did not. We can instead opt to use the text type instead in MySQL, or keep the same varchar type, but specify a max length. If we do opt for the varchar option, we'll have to know what the max value is, otherwise, we'll run into issues when we actually migrate the data over.
SERIAL in Postgres vs MySQL
Additionally, if we set the serial type in MySQL, it'll default the underlying data type to a bigint unsigned auto_increment. In the grand scheme of things, this won't affect how we store the data too much, but is worth noting because depending on how many records we have, a different data type may be more suitable.
The customers table in PostgreSQL looks like the following:CREATE TABLE customers
(
    id        SERIAL,
    full_name VARCHAR,
    address   VARCHAR,
    location  POINT
);

If we look at our sample data for this table, we'll get the following:
id (SERIAL)
full_name (VARCHAR)
address (VARCHAR)
location (POINT)
1
Robert California
123 Sunny St, AZ
(34.411275716904406,-111.6783709992531)
2
Nick Claus
785 North Pole, AL
(69.72578389209082,-153.14940161799086)
Now things get a little bit more interesting. While the first three columns are something we dealt with in the previous example, the fourth column represents a data type that is available in Postgres as well as in MySQL, but behaves quite differently in practice.
Spatial data in Postgres vs MySQL: POINT
In Postgres, working with spatial data is fairly straightforward. When we define the location column as a type of Point, we can insert spatial data into it by giving it the coordinates we want. So if we wanted to add a new record into our database, we could do the following:INSERT INTO customers (id, full_name, address, location) VALUES (3, 'Michael West', '532 Alexander St, WA', POINT(47.490272897328325, -122.27293296965925));

And our data would be properly reflected in the database as:
id (SERIAL)
full_name (VARCHAR)
address (VARCHAR)
location (POINT)
1
Robert California
123 Sunny St, AZ
(34.411275716904406,-111.6783709992531)
2
Nick Claus
785 North Pole, AL
(69.72578389209082,-153.14940161799086)
3
Michael West
532 Alexander St, WA
(47.490272897328325, -122.27293296965925)
In MySQL, however, spatial data types behave differently, and depending on the version of MySQL you're using, you may have to use a different approach to storing spatial data. If we updated our MySQL schema to align with our customers schema in our Postgres database, our table might look something like this:CREATE TABLE customers
(
    id        INT NOT NULL AUTO_INCREMENT,
    full_name TEXT,
    address   TEXT,
    location  POINT,
    PRIMARY KEY (id)
);

And to migrate the above data into our MySQL database, we could write a query such as:INSERT INTO customers VALUES (1, 'Robert California', '123 Sunny St, AZ', POINT(34.411275716904406,-111.6783709992531));

This will add the first record for Robert California into our database. But if we go and run a SELECT statement on that data, our output will look like the following:
id (SERIAL)
full_name (TEXT)
address (TEXT)
location (POINT)
1
Robert California
123 Sunny St, AZ
0x00000000010100000027DFC4AEA43441403416326E6AEB5BC0
The reason for this is because MySQL can store spatial data in multiple different formats. If we wanted to get the actual coordinates for our location, we'd have to use a spatial operator function like ST_asText, which would provide the results we want:SELECT id, full_name, address, ST_asText(location) FROM customers;

id (SERIAL)
full_name (TEXT)
address (TEXT)
location (POINT)
1
Robert California
123 Sunny St, AZ
POINT(34.411275716904406,-111.6783709992531)
If we didn't want to work with spatial functions like this in MySQL, we could always break out the location into two columns. For example, location_latitude and location_longitude, and store the individual points as decimal data types. But the drawback here is that we would be required to do a little more work on our client to keep track of the faux spatial data types, and we also couldn't utilize any of the spatial functions provided by MySQL.
The orders table in PostgreSQL looks like the following:CREATE TABLE orders
(
    id       UUID default gen_random_uuid(),
    customer INTEGER,
    products JSONB
);

And our data in the orders collection looks like:
id (UUID)
customer (INT)
product (JSONB)
10d4e52d-68fd-44f5-bf9b-a960cfb03de1
1
[{"product": "Achieving PlanetScale", "price": 50}]
3de3f1a5-7a90-4273-a845-7129183edd47
2
[{"product": "Achieving PlanetScale", "price": 50}, {"product": "The Database that Could" "price": 100}]
Handling the UUID Postgres type in MySQL
In this table, we have two fields that MySQL does not natively support, but that doesn't mean we can't have a successful migration. The UUID data type is essentially a 36 character long string. So we can set this data type to a varchar(36) in MySQL. MySQL also supports generating UUID's via a UUID() function, so when it comes to adding new records, we could utilize the UUID() function that behaves similarly to Postgres's gen_random_uuid() function.
Handling the Postgres jsonb type in MySQL
The products column, on the other hand, is of type jsonb, or binary JSON. While MySQL at the moment does not have support for the jsonb data type, it does have support for json, so if we set the data type for the products column as json, we'll get a very similar experience. While jsonb may be a more efficient way to store JSON data in a SQL database, know that even with MySQL, you can still leverage the JSON data type and do things like index JSON columns, update and modify the JSON values, return subsets of the JSON data, and much more.
The above examples explore just a few scenarios that you may run into when deciding to migrate from Postgres to MySQL. At the end of the day, migration from one database to another is not impossible, you just have to understand the differences between the two, and also understand the benefits you'll gain when you do migrate. Once you do migrate, ensure that you are following that database's best practices to get the most out of the migration. Below are a couple of other migration factors to consider before making the switch.
Other migration factors going from Postgres to MySQL
Before migrating from Postgres to MySQL, it’s important to be aware of some differences between the two database systems that may pose an issue. We’ve listed some of these below.
Complications arising from certain data models
MySQL and Postgres both share support for many data types. This ranges from traditional SQL types like String, Boolean, Integer, and Timestamp to complex data structures such as JSON, XML, and TEXT. However, it's good to keep in mind there are some Postgres data types that MySQL does not support. You can see some more information about these in the table at the top of this article.
So, even though MySQL supports the various traditional SQL types required by a variety of applications to store and process different kinds of data, such as Date, Timestamp, Character, Long Text, Float and Decimal, and Blob, potential complications may arise when trying to migrate complex data structures that may not yet be supported in MySQL. Fortunately, these issues are being addressed in newer MySQL releases.
Differences in database and SQL capabilities
Certain operations are carried out differently in MySQL and Postgres databases. Also, some features may not be supported in one while supported in the other. Knowing these may help you avoid some common pitfalls.
DROP a temporary table
Though CREATE TEMPORARY TABLE is used to create a temporary table in both MySQL and Postgres databases, only MySQL has the TEMPORARY keyword in the DROP TABLE statement. This makes it possible to drop only the temporary table in MySQL without affecting the main table.
In Postgres, this omission requires you to be more careful with your naming convention because a temporary table may have the same name as a regular table. And since you can't specify TEMPORARY in your DROP TABLE statement, you might unintentionally delete important tables.
CASCADE when truncating a table
Postgres’ TRUNCATE TABLE statement for truncating a table supports the CASCADE keyword. It also supports features like RESTART IDENTITY, CONTINUE IDENTITY, transaction-safe, and so on. RESTART IDENTITY tells Postgres to reset all identity columns when truncating a table automatically, and transaction-safe means that the truncation will be safely rolled back if the surrounding transaction doesn’t commit.
In contrast, MySQL’s TRUNCATE TABLE feature supports neither CASCADE nor transaction safe, which means that data can’t be rolled back once it’s deleted. These features are likely to be added in future MySQL releases.
Stored procedures
MySQL requires that the procedure is written in the standard SQL syntax. This contrasts with Postgres, where the Procedures are based on functions and can be written in other languages, such as Ruby, Perl, SQL, Python, JavaScript, and others.
Postgres extensions
If you're using extensions in Postgres, it can make a migration even trickier, so make sure to audit each of these individually before migrating.
Case sensitivity and IF/IFNULL support
Postgres table and column names are case sensitive if in double quotes. MySQL table and column names, on the other hand, aren't case sensitive. So when doing a migration, make sure to keep this in mind.
Moreover, MySQL allows for IF and IFNULL statements for evaluating conditions, while Postgres doesn’t.
Conclusion
Migrating between one SQL database to another, such as going from Postgres to MySQL, on the surface seems like it would be very straightforward, but the two database technologies are quite different in how they handle and store certain data types, features they do and don't support, and benefits they provide to your application.
While Postgres may have more features out of the box, MySQL is typically more performant for common use cases. Additionally, MySQL has some fantastic options, such as Vitess and PlanetScale, for scaling massive databases.
Understanding the differences between the two technologies and how they fit into your specific use case is the number one thing you should consider before choosing either, or migrating from one to the other. I hope that this article gave you some insight into what a migration from PostgreSQL to MySQL might look like.
Happy migrating! :)]]></content>
        <summary><![CDATA[Learn how to migrate from Postgres to MySQL, Postgres vs MySQL incompatibilities, and more.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing the PlanetScale API and OAuth applications</title>
        <link href="https://planetscale.com/blog/introducing-planetscale-api-and-oauth-applications" />
        <id>https://planetscale.com/blog/introducing-planetscale-api-and-oauth-applications</id>
        <published>2023-01-31T14:00:00.000Z</published>
        <updated>2023-01-31T14:00:00.000Z</updated>
        
        <author>
          <name>Frances Thai</name>
        </author>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today, we are releasing a new way to manage your PlanetScale databases programmatically: The PlanetScale API.
The API opens up new ways to interact with PlanetScale through automation and other developer tools, like CI/CD, infrastructure as code, deployment tools, and application platforms. Alongside the PlanetScale web app and CLI, the PlanetScale API allows you to closely integrate PlanetScale branches, deploy requests, passwords, and other features into your existing workflow.
In addition to the API, we are also launching OAuth applications in limited beta. OAuth applications alongside the PlanetScale API enable your users to interact with their PlanetScale databases from your application.
PlanetScale API
Through the PlanetScale API, users can do simple tasks like updating and creating databases, as well as more complex tasks like managing the lifecycle of a deploy request.
If it can communicate via HTTP, it can be integrated with the PlanetScale API. For example, you can:
Automatically create and delete database branches from CI/CD pipelines or data migration tooling
Programmatically build out new environments that connect to PlanetScale database branches for testing
Get information about a PlanetScale user, database, branch, organization, and deploy request
Check the status of deploy requests in the deploy queue
Automate creating and deleting database connection strings for internal users or tools
Create, update, approve, deploy, and delete deploy requests programmatically from tooling outside of PlanetScale
By making these actions accessible through the API, it will empower you to automate processes and create powerful workflows that will ultimately drive faster and richer developer experiences with PlanetScale.
 Be sure to read through to the end to see how Netlify and Resmo are already using the PlanetScale API and OAuth applications with their newest integrations. 
Using the PlanetScale API
Authorization
To get started with the PlanetScale API, you only need to create a service token, and grant it access based on the endpoints you want to use in the API. The accesses a service token needs to use are described in each endpoint's documentation.
For further instructions on using the PlanetScale API with a service token, refer to our service tokens API documentation.
Requests and responses
Once you have a service token, you can fill out the request parameters in the API reference and copy and paste code directly from the documentation:

OAuth applications
Take the PlanetScale API one step further with PlanetScale OAuth Applications. OAuth applications enable you to seamlessly integrate your platform with PlanetScale, and allow your users to give granular PlanetScale account access to your platform in return.
Enrolling in the limited beta
If you are interested in creating your own PlanetScale OAuth application, you can enroll on the waitlist through your PlanetScale organization's Settings > Beta features page. Once we've received your enrollment request, a PlanetScale team member will be in touch about your OAuth use case.
Refer to our OAuth documentation for further instructions on creating an OAuth application and completing our authorization flow.
PlanetScale API + OAuth application demo
We've created a Next.js-based demo called PlanetPets that uses PlanetScale OAuth and API to access users' organizations, databases, branches, and create new branches. The user's organizations are then presented as "gardens" where their databases are "trees." Within PlanetPets, users can water their "trees" to grow new branches.
This sample app shows you how to implement OAuth authentication with PlanetScale in a Next.js application. Set it up yourself using the code in the PlanetPets GitHub repo.

PlanetScale integration examples
Two fantastic community partners have already built integrations using the powerful combination of OAuth applications and the PlanetScale API. These integrations are available for use today.
Netlify

Netlify is launching a new PlanetScale integration into Netlify Labs. Netlify's new integration allows Netlify users to closely integrate PlanetScale branches, deploy requests, passwords, and other features into the Netlify workflow. Additional benefits for Netlify users include more easily connecting PlanetScale databases to Netlify sites, assigning database branches to different deploy contexts, and using the withPlanetScale function in Netlify Functions to seamlessly insert a connection into the database call. You can read more about the integration in the Netlify integration docs.
Resmo

Resmo uses an OAuth application and the PlanetScale API to connect to PlanetScale in a few clicks to bring asset visibility, continuous security, and compliance of PlanetScale databases to their users. Resmo collects directory assets like databases, organizations, and database branches from users' PlanetScale accounts through the API for users to query and set up custom security rules to automate security checks. You can read more about the integration in the Resmo integration docs.
Feedback
We can't wait to see what you'll build with the new PlanetScale API and OAuth applications! If you have feedback on your experience using the API, we would love to hear it. You can open up a new discussion topic in the PlanetScale discussion repo with your feedback.]]></content>
        <summary><![CDATA[Manage your databases programmatically with the PlanetScale API.]]></summary>
      </entry>
    
      <entry>
        <title>Common MySQL errors and how to fix them</title>
        <link href="https://planetscale.com/blog/common-mysql-errors-how-to-fix-them" />
        <id>https://planetscale.com/blog/common-mysql-errors-how-to-fix-them</id>
        <published>2023-01-27T14:00:35.694Z</published>
        <updated>2023-01-27T14:00:35.694Z</updated>
        
        <author>
          <name>Mary Gathoni</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[While MySQL error codes are useful, understanding what they mean can be difficult. This article looks at common MySQL error codes, non-coded errors, and how to fix them.
MySQL error code format
Each MySQL error includes:
An error number: A MySQL-specific number corresponding to a particular error.
An SQLSTATE value: A five-character string that indicates the error condition.
An error message: A textual description of the error.
MySQL error example
Here’s an example of what a MySQL error code looks like:ERROR 1146 (42S02): Table 'test.no_such_table' doesn't exist

In the error above:
1146 is the error number.
42S02 is the SQLSTATE value.
Table test.no_such_table doesn't exist is the error message.
Common MySQL error codes
Let’s explore some common error codes, what they mean, and how to resolve them.
Error 1040: Too many connections
Error 1040 occurs when MySQL reaches the maximum number of client connections, forcing you to close connections so the server can accept new connections.
By default, MySQL can handle up to 151 connections. If needed, you can change this by editing the value held by the max_connections variable. One approach to fixing this error is setting the max_connections value to a number corresponding to connection usage.
For instance, if you think you need around 200 connections, you can set it to 250.SET GLOBAL max_connections = 250;

Note that the more you increase the number of connections, the more memory-intensive MySQL gets, increasing the chance of the server crashing.
 PlanetScale, a MySQL-compatible database, supports nearly unlimited connections. You can sign up for an account today to get increased connections, no downtime schema changes, an advanced query monitoring dashboard, and more. 
Error 1045: Access denied
Error 1045 occurs when a user is denied permission to perform operations such as SELECT, INSERT, UPDATE, and DELETE on the database. Below is a list of some reasons why MySQL denies access and possible fixes.
The user doesn’t exist. Check if the user exists in the database, and if they don’t, create a new user.
The password is incorrect. To fix this, reset the MySQL password.
Connecting to the wrong host. Double check that the host you’re connecting to is correct.
Error 1064: Syntax error
SQL syntax issues are often to blame for this error, such as reserved keywords, missing data, or mistyped commands. The best way of identifying the syntax issue is by comparing the query with the error message to identify the specific point in the query that raised the error. To troubleshoot this error:
Proofread your code and correct mistyped commands.
If you have to use reserved keywords, place them in backticks like this: INSERT.
Replace obsolete commands with current ones.
Add missing data to the database.
Use an automated syntax checker, like EverSQL.
Error 1114: Table is full
Error 1114 occurs when you try to insert data into a table due to a shortage of disk memory. Disk full issues can also occur when creating a backup for large databases alongside the original database.
To fix this error, check the partition in which the MySQL server is installed and ensure it’s less than 80% full.
Error 2006: MySQL server connection closed
Error 2006 occurs when the MySQL server connection has timed out and closed the connection. The <code>wait_timeout</code> value determines how long the server waits before closing a connection due to inactivity. To fix this, check the <code>wait_timeout</code> value (28800 seconds by default) and increase it if it’s too low.
Error 2008: Client ran out of memory
This error message means there’s not enough memory to store the entire query result. To solve this problem, check the specifics of the query. Do you need to return this many results from the database? If not, modify the query to return only the necessary rows.
Error 2013: Lost connection during query
Error 2013 occurs when the connection drops between the MySQL client and the database server, usually because the database took too long to respond.
To fix the error, first, check that your internet connection is stable. Delayed results may be due to network connectivity issues. Also, try increasing the <code>net-read-timeout</code> value to give more time for the query to complete.
Non-coded errors
In addition to coded errors, there are several common non-coded MySQL errors that you might encounter.
Packet too large
The maximum possible size of a packet transmitted to or from a MySQL 8.0 server or client is 1GB. The <code>max_allowed_packet</code> variable stores the allowable packet size.
For the client, the default max_allowed_packet value is 16MB and for the server is 64MB. To fix this error, increase the max_allowed_packet value for the client and the server.
For example, increase the max_allowed_packet for the client to 32MB:mysql --max_allowed_packet=32M

Note that MySQL needs to restart for the change to take effect.
Can’t create/write file
You’ll get this error when MySQL can’t create the temporary file for the result in the temporary directory. This might be because there’s no memory left in the /tmp folder, or if there’s an incorrect configuration that doesn’t allow MySQL to write in the /tmp folder.
To solve the memory issue, try starting the MySQL server with the --tmpdir option and specifying a directory for the server to write to. For example, to specify C:/temp:tmpdir=C:/temp

If the configuration is incorrect, make sure MySQL has permission to write to the directory specified by tmpdir.
Commands out of sync
The commands out of sync error occurs when you call client functions in the wrong order. For example, using mysql_use_result() before calling mysql_free_result() will raise this error.
To fix this error, check your functions and make sure you are calling them in the correct order.
Hostname is blocked
This error occurs when the MySQL server receives too many connections that have been interrupted by the host. The server assumes something is wrong, like someone trying to break in, and blocks the hostname until you execute the flush-hosts command.
The number of interrupted connect requests is determined by the max_connect_errors variable, 10 by default. Modify the value by starting the MySQL server like this:mysqld_safe --max_connect_errors=10000

Aborted connections
This error occurs when clients attempt and fail to connect to the MySQL server, often due to the client using incorrect credentials or lacking access privileges.
To fix this error, start by checking the error logs and general logs at /var/log/mysql/ to determine the cause of the aborted connections.
Conclusion
Error handling can be exhausting and time-consuming, so it's important to understand how to fix the common MySQL errors. If you're looking for a straightforward, developer-friendly way to run MySQL, try PlanetScale. You can import your application's existing database with no downtime using our Import tool and be up and running in no time.]]></content>
        <summary><![CDATA[An overview of some common MySQL error codes you may run into, what they mean, and how to solve them.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL scaling made easy</title>
        <link href="https://planetscale.com/blog/mysql-scaling-made-easy" />
        <id>https://planetscale.com/blog/mysql-scaling-made-easy</id>
        <published>2023-01-26T13:00:00.000Z</published>
        <updated>2023-01-26T13:00:00.000Z</updated>
        
        <author>
          <name>Jonah Berquist</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Learn about sharding, connection pooling, and more from PlanetScale Technical Solutions Architect Jonah Berquist.]]></summary>
      </entry>
    
      <entry>
        <title>What is the N+1 Query Problem and How to Solve it?</title>
        <link href="https://planetscale.com/blog/what-is-n-1-query-problem-and-how-to-solve-it" />
        <id>https://planetscale.com/blog/what-is-n-1-query-problem-and-how-to-solve-it</id>
        <published>2023-01-18T08:01:46.798Z</published>
        <updated>2023-01-18T08:01:46.798Z</updated>
        
        <author>
          <name>JD Lien</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Have you ever been working on an app, staring at your screen waiting for it to load, wondering what on Earth is going on? There are a lot of reasons why you could be experiencing performance issues, but a classic cause of performance issues in database-driven applications is the dreaded N+1 query problem.
 If you're wondering if you have an N+1 problem, you can sign up for a PlanetScale account to access our Insights query monitoring dashboard. More information about identifying N+1s with Insights at the end of this article. 
What is the N+1 query problem?
The chief symptom of this problem is that there are many, many queries being performed. Typically, this happens when you structure your code so that you first do a query to get a list of records, then subsequently do another query for each of those records.
You might expect that many small queries would be fast and one large, complex query will be slow. This is rarely the case. In practice, the opposite is true. Each query has to be sent to the database, the database has to perform the query, then it sends the results back to your app. The more queries you perform, the more time it takes to get the results back, with each trip to the database server taking time and resources. In contrast, a single query, even if it's complex, can be optimized by the database server and only requires one trip to the database, which will usually be much faster than many small queries.
An example N+1 query
Let's look at an example. Applications typically query several related records from the same database tables. Let's take an example of grocery items and categories from my previous article on joins. In this example scenario, we have a PlanetScale database with an items table and a categories table. The items table contains a list of grocery store items with their corresponding categories in the categories table. The examples are in PHP, but the same principles apply to any language.
categories table:
id
name
1
Produce
2
Deli
3
Dairy
items table:
id
name
category_id
1
Apples
1
2
Cheese
2
3
Bread
NULL
Let's say we want our application to list all of the items, including the name of the category they belong to. One straightforward way we could do this is by first querying a list of categories, and then looping over each of the categories, querying for each category's items.
First query — Grabbing the categories:<?php
    $dbh = new Dbh();
    $conn = $dbh->connect();
    $sql = "SELECT * FROM categories;";
    $stmt = $conn->prepare($sql);
    $stmt->execute();

Second query — Looping over each category and grabbing the items:<?php
while ($row = $stmt->fetch()) {
    // Show category name
    echo $row['name'];

    // Now query for the items for this category
    $sql = "
        SELECT id, name FROM items
        WHERE category_id = :category_id
        ORDER BY name;
    ";

    $stmt2 = $conn->prepare($sql);
    $stmt2->bindParam(':category_id', $row['id']);
    $stmt2->execute();
    $rowCount += $stmt2->rowCount();

    while ($row2 = $stmt2->fetch()) {
        // Show item ID and name
        echo $row2['id'];
        echo $row2['name'];
    }
}

This approach has the benefits of having two simple queries and clear, procedural code. Unfortunately, this approach is flawed, and you should avoid this situation where you are executing many database queries in a loop.
What caused the N+1 query problem?
This type of query execution is often called "N+1 queries" because instead of doing the work in a single query, you are running one query to get the list of categories, then another query for every N categories. Hence the term "N+1 queries".
In the above example, our database contains about 800 items across 17 categories. It takes over 1 second to run the 18 simple queries involved in this! That's pretty slow. If you have more complex queries with a lot of data, it will take even longer.
For this simple example, it's possible to perform the exact same job 10× faster by using only one query that uses a JOIN clause. We could refactor the above code to look something like this:<?php
    $dbh = new Dbh();
    $conn = $dbh->connect();
    // Record the time before the query is executed
    $timeStart = microtime(true);

    $sql = "
        SELECT
            c.id AS category_id,
            c.name AS category_name,
            i.id AS item_id,
            i.name AS item_name
        FROM categories c
        LEFT JOIN items i ON c.id = i.category_id
        ORDER BY c.name, i.name;
    ";
    $stmt = $conn->prepare($sql);

    $stmt->execute();
    $rowCount = $stmt->rowCount();

    $lastCategoryId = null;

    while ($row = $stmt->fetch()) {
        // Render the heading for each category if this category is new
        if ($row['category_id'] != $lastCategoryId) {
            echo $row['category_name'];
        }

        // Display the row for each item
        if (!is_null($row['item_id'])) {
            echo $row['item_id'];
            echo $row['item_name'];
        }

        $lastCategoryId = $row['category_id'];
    }

With this update, we accomplished much the same work with a single, slightly more complicated query. Attempting our demo of this again, we can observe a significant performance difference between the original page and this one! The page loads in about 0.16 seconds, instead of 1.4 seconds.
In this simple example, with a database that isn't very large, the n+1 approach takes about 10 times longer!
Imagine you had thousands, or millions of records. The performance delta could be the difference between a reasonable load time and a page that takes so long to load that it causes a timeout on the server.
Creating data structures for more complicated queries
Sometimes you may have a more complicated operation in mind. Say you wanted to show the categories along with the count of each item in each category. You could use an aggregate query (GROUP BY), as shown below:SELECT c.id, c.name, count(i.id) AS item_count FROM categories c
LEFT JOIN items i ON c.id = i.category_id
GROUP BY c.id, c.name
ORDER BY c.name;

But then how would we also get the list of items from a query like this where we are grouping?
While it's often most efficient to let the database server do a lot of the heavy lifting instead of our server-side code, for something like a simple count of items, it may not be necessary. If we actually just queried for the items, it's pretty easy to let the server-side code (PHP, in our example) do the count for us!
We can refactor this such that we do the job with a single query, then turn that query into a clean data structure.<?php
$dbh = new Dbh();
$conn = $dbh->connect();
// Record the time before the query is executed
$timeStart = microtime(true);

$sql = "
    SELECT
        c.id AS category_id,
        c.name AS category_name,
        i.id AS item_id,
        i.name AS item_name
    FROM categories c
    -- Using a normal JOIN would not get the categories with 0 items
    LEFT JOIN items i ON c.id = i.category_id
    ORDER BY c.name, i.name;
";
$stmt = $conn->prepare($sql);

$stmt->execute();
$rowCount = $stmt->rowCount();

$lastCategoryId = null;
$lastCategoryName = null;

// Build a 2D array of categories with their items
$categories = [];
// A categoryItems array will become the value for each category
$categoryItems = [];

// Alternative approach: build a data structure with the data we want as a 2D array.
while ($row = $stmt->fetch()) {
    // Render the heading for each category if this category is new
    if (!is_null($lastCategoryId) && $row['category_id'] != $lastCategoryId) {
        $categories[$lastCategoryName] = $categoryItems;
        // Reset the categoryItems array
        $categoryItems = array();
    }

    // Create an array of all the non-null items
    if (!is_null($row['item_id'])) $categoryItems[$row['item_id']] = $row['item_name'];

    $lastCategoryId = $row['category_id'];
    $lastCategoryName = $row['category_name'];
}
// Add the last category to the array with its items
$categories[$lastCategoryName] = $categoryItems;

Now that we have this $categories array with arrays of items within, we can do a nested loop to render the data in the way we see fit. When we want the count of items, you can simply run count($items) to get the quantity.<?php
foreach ($categories as $categoryName => $items) {
    echo $categoryName;
    // Show the count of items in the category
    echo count($items) . ' items';

    if (count($items)) {
        // Loop through all the items in the category and display them
        foreach($items as $itemId => $itemName) {
            echo $itemId;
            echo $itemName;
        }
    }

Using techniques like this, you can keep your page load times quite fast by being efficient with your use of the database. Instead of writing your code such that you have 1 query plus another for each record of that query, it is well-worth the effort to write your code such that you have 1 query that returns all the data you need.
Using this approach, you can also create data structures that are more useful for your application. For example, you may want to create a data structure that is keyed by the category ID, and then have the items as sub-arrays. This would allow you to easily access the items for a specific category by its ID.
Identifying N+1 queries
If you have a more complex application, you may have a lot of N+1 queries and not know it. There are a few ways to identify these queries and fix them.
If you're working on a Laravel app you can use Laravel Debug Bar. Laravel also allows you to fully disable N+1 queries by adding the following line to your AppServiceProvider inside the boot method:Model::preventLazyLoading(!app()->isProduction());

This will cause the application to throw an exception if it detects an N+1 query when not in production, allowing you to detect and fix these issues.
PlanetScale Insights
PlanetScale also offers an analytics and monitoring solution called PlanetScale Insights. This is accessible from your PlanetScale dashboard and allows you to see the queries that are being run on your database. Using this, you can identify many types of issues with your queries, including N+1 queries and long-running queries. The screenshot below is from the demo database we've been using in this article.

The first query is our more complex but efficient JOIN query, which read 834 rows, returned 815 rows, and took a total of 14ms.
The two queries below that are inefficient queries that resulted in the N+1 problem. Together, they took a total of 42ms and 13,889 rows read to give us the same results as the more complex query.
Overall, this shows us right away that our N+1 queries:
Ran way too many times
Read way more rows than returned
And performance was relatively slow
Now you know how to identify N+1 queries, how to fix them, and how to use PlanetScale Insights to monitor your queries and identify performance issues so you can get out there and write some fast, lean code!]]></content>
        <summary><![CDATA[Learn what the N+1 queries problem is by working through an example N+1 query updating it to a JOIN statement and going over how to identify them in the future.]]></summary>
      </entry>
    
      <entry>
        <title>Support’s notes from the field</title>
        <link href="https://planetscale.com/blog/supports-notes-from-the-field" />
        <id>https://planetscale.com/blog/supports-notes-from-the-field</id>
        <published>2023-01-11T14:45:00.000Z</published>
        <updated>2023-01-11T14:45:00.000Z</updated>
        
        <author>
          <name>Mike Stojan</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Support at PlanetScale
The Support team at PlanetScale is a multi-cultural, multi-national team distributed around the globe, enabling us to support our users in almost any time zone on this planet. Internally, we are organized into two separate groups: Our Customer Engineering team working closely with our Enterprise customers and our Cloud Support team supporting our Base plan customers.
All of us have a technical background. Some of us develop software and contribute to FOSS software or have been building and maintaining infrastructure at scale for years. Others have worked on large support teams at companies such as GitHub, Google, and Microsoft.
Our combined collective experience in the industry racks up to a couple hundred years of knowledge. When you approach us with your database-centric problems, chances are we will be able to help swiftly resolve the issue or at least put you on the right track to get there yourself. And if it's something totally new, well, kudos to you! We are about to learn something new, and we'll be learning together.
If you want to get in touch with us, the easiest way is to open a ticket in our support portal or to start a new conversation in our public discussions board on GitHub. For our Enterprise customers, we also offer phone-based escalations and joint Slack channels where we can collaborate.
For a good overview about the different support plans, support targets, and service-level agreements (SLAs), please review our official documentation and our support portal.
In the rest of this article, we'll take a look at the most common places we see folks hit a wall.
Common issues users run into on PlanetScale
SSL/TLS certificate errors with PlanetScale
SSL/TLS certificate verification errors come in various forms, but what they all have in common is that they prevent you from connecting to your PlanetScale database, so it is immediately noticeable and a hard blocker for some. Most of the time, however, these errors are straightforward to resolve.
To understand what leads to these errors, it is good to have a basic understanding of how SSL/TLS certificates work.
How SSL/TLS certificates work
It all starts with a certificate authority (CA). A CA issues, signs, and stores SSL/TLS certificates. They exist so that a client can connect to a server securely and that network traffic between the two entities gets encrypted. You can set up your own CA and issue your own certificates, but for your certificates to be automatically verified in modern browsers, companies such as ours need to use a certificate that was issued by a common third-party CA.
PlanetScale's SSL/TLS certificate was issued by Let's Encrypt, a nonprofit CA run by the Internet Security Research Group (ISRG), which has issued certificates for more than 300 million websites to date.
When connecting to PlanetScale, your computer validates our SSL/TLS certificate by looking up the issuer of our certificate and comparing it against the trusted root certificate authorities store on your computer. When it finds a match, it checks if our certificate is valid and signed by the CA. If that turns out to be OK, it lets you connect securely.
Most modern operating systems have a root CA store, and your software usually knows where it is:
On Mac OS X, the store is in /etc/ssl/cert.pem
On Linux, it depends on the distribution, but it almost always is either in /etc/ssl or /etc/pki
On Windows, the root CA store is an internal database available via the CryptoAPI
Solving SSL/TLS errors
Most SSL/TLS errors that we see occur because of these trusted CA root certificates not being installed on the computer and, therefore, the software being unable to verify our certificate. On Linux, you can often solve this by installing the ca-certificates package. This package bundles the CA certificates from the Mozilla CA Certificate Program.
Most database drivers know where to find these locally installed CA certificates, but sometimes you may need to point the database driver to it. We have the most common paths for the root CA store listed in our documentation, along with a more thorough explanation of how SSL/TLS works.
Other SSL/TLS-related errors can occur due to libssl being too old, or other libraries in use which have not been compiled with SSL support or that simply do not offer any SSL/TLS support at all. If you're interested to learn more about such a scenario, we had an interesting case with Debian Buster in our public discussions board a few weeks ago.
In any case, things that you should NOT be doing:
Disable SSL/TLS certificate verification
Issue and sign your own SSL certificate
Disabling certificate verification will make your connection susceptible to man-in-the-middle attacks. This is a cyberattack where an attacker secretly relays and possibly alters the communication between two parties. In order for your communication with our servers to be secure, you must not disable it!
We also sometimes see users creating their own CA and SSL certificates and then sending us their certificates as well as the matching private key. You will never ever need to do that when using PlanetScale! It's an easy trap to fall into if you don't know enough about SSL/TLS and, from experience, it can get overwhelming and confusing quickly.
Whatever the nature of your SSL/TLS issue is, we will be happy to help get to the bottom of it if you open a ticket with us or if you open an issue in our discussions board.
Integrating third-party platforms
Sometimes, you need to integrate third-party platforms such as Google's Data Studio or Retool, but when your preferred platform does not support the common SSL/TLS certificate authorities' root certificates, you can run into trouble establishing a connection to your PlanetScale database.
When you enable SSL/TLS in your preferred tool, you sometimes stumble upon fields such as CA Cert, Client Key, or Client Cert, which normally only exist for the user to add their organization's self-signed SSL/TLS certificate. Remember, that's the case where you act as your own certificate authority and issue your own SSL/TLS certificates, which is not needed in the context of PlanetScale.
If you leave these fields blank and hope for the best, you normally encounter an error message such as this one:
Unable to get local issuer's certificate.
We can make use of these fields, though, to work around the issue. When there is a field where you can upload a CA Cert, you can upload Let's Encrypt's CA root certificate or, if it's a text field, paste it in. It is a regular text file after all.

The root certificates for Let's Encrypt can be downloaded from their website. It is called ISRG Root X1 and you will need to download it in .pem format.
For your convenience, below you will find Let's Encrypt's current root certificate:-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----

Please note that if we ever change our SSL provider or update aspects of our SSL/TLS certificate as part of its regular renewal, you would need to update the certificate you have uploaded as well. It's not something that happens often and we do not have plans to change it anytime soon, but if you see a third-party platform having trouble connecting after a few months, make sure to check if our SSL/TLS certificate has changed and update when necessary.
If you think, after having read through all that SSL/TLS-related information, there's got to be an easier way, then I have got something for you!
PlanetScale supports data integration engines such as Airbyte and Stitch. And for measuring usage and performance metrics, we also have a Datadog integration.
If you're having trouble connecting your preferred third-party tool, let us know either by creating a ticket or opening an issue in our discussions board, and we'll look into it together.
Data imports
When ramping up PlanetScale, you will eventually get to the point where you want to import your data, and sometimes, this turns out to be more complex than initially thought. Some features, such as stored procedures, are not supported. You can see the complete MySQL/Vitess compatibility list to be aware of in the documentation. Still, hardware can also be a limiting factor.
So that there is no misunderstanding: We take great pride in our ability to scale along with your business, and we will do our absolute best so that you will never have to worry about your database again, but sometimes all it takes to take a database cluster to its limits is a simple, but effective:cat backup.sql | pscale shell

It is easy to quickly overwhelm your database with heavy IO operations if you're not careful. A huge influx in concurrent writes to your database will lead to a degraded service, and while our database clusters heal themselves automatically, it's obviously not a great first impression.
This is why we have built an importer.
Our importer uses MySQL's proven, reliable replication system under the hood. If you go through the import process, your PlanetScale database registers as a replica, making the data copying process trivial and giving you full control over when to make the cut-over.
You can point your application to your PlanetScale database once it's been registered as a replica. We will route any writes back to the primary until the point where you have elevated PlanetScale to become the primary itself. This gives you the ability to test performance and make any necessary updates before you fully switch over to us, and it is often overlooked.
One important caveat is that schema changes do not get replicated between databases in either direction. Please make sure not to execute any DDL (Data Definition Language) operations such as CREATE, DROP, ALTER or TRUNCATE while the import is ongoing.
Sometimes, we also see users having issues with their MySQL server's configuration. You will need to be able to change your MySQL server's gtid_mode, its binlog_format, as well as its expiration times (either expire_logs_days or binlog_expire_logs_seconds or both), otherwise you will not be able to use our importer.
If you cannot use our Importer, there are other tools you could use, such as mysqldump or MySQL Workbench. Another great option is aquarapid/go-mydumper which is based on the excellent gomydumper utility and which was adapted to work with Vitess by my colleague Jacques.
Whatever you end up choosing, if the tool gives you control over concurrency, a good ground rule is to start with a low concurrency rate and to be conservative with increases. Otherwise, depending on the size of your backup, you may end up in the situation described earlier in this section.
Last but not least, we sometimes also see our users wanting to start over with the import and then running into the following error returned by our connection test:Error checking server configuration: found existing Vitess state on the external database

This happens because to be able to import your database, we create a temporary database named _vt on your current primary. We clean it up automatically when finishing up the import, but if you decide to start over, we cannot automatically detect that. In that case, you will need to drop the _vt database from the current primary first before you can reattempt the import.
Long story short, we know that importing large amounts of data can be daunting and it can be especially frustrating if there's a break somewhere in the process. If you're planning to import terabytes of data, chances are we are already working closely with you. In any other case, if you're running into trouble with your import, please open a ticket with us or open an issue in our discussions board, and we're happy to get involved.
Frequently hit timeouts
One last common issue you may run into is our configured hard timeouts, mainly our 20 seconds transaction and our 900 seconds query timeout. These are deliberately set timeouts that exist for performance reasons and to encourage good application design. We are looking into ways to lift or at least extend these, but for the time being, these need to be considered hard timeouts.
When you hit our 20 seconds transaction timeout, you will usually see the following error message:vttablet: rpc error: code = Aborted desc = transaction <transaction>: in use: in use: for tx killer rollback (CallerID: planetscale-admin)

When you hit the 900 seconds query timeout instead, you will see an error message similar to this one:target: example-db.-.primary: vttablet: rpc error: code = Canceled desc = (errno 2013) due to context deadline exceeded, elapsed time: 15m0.002989349s, killing query ID 65535 (CallerID: <id>)

These timeouts can be reached with complex transactions or queries, but most of the time, it's rather the user's application keeping the transaction open while handling other tasks such as data manipulation or sorting instead of closing the transaction first. Loops such as while <expression> or until <expression>, or for loops are particularly susceptible to that.
There is a workaround to lift our configured timeouts and that is to change the workload mode from OLTP (Online transactional processing) to OLAP (Online analytical processing).
We generally recommend against using it, though, as it can cause rather drastic side effects such as a workload consuming all available resources or blocking other important, short-lived queries or transactions from completing, or overloading a database up to a point where it goes into an unrecoverable state and where manual intervention is needed. It can also block planned failovers or critical updates and will make it easier to hit other intentional limits or timeouts dictated by MySQL.
Again, we do not recommend changing your database's workload mode. There is almost always a better solution.
However, if you still want to try this out, you can switch to OLAP by issuing a set workload='olap'; on a per-session basis, meaning you would have to directly execute it before running the affected transaction. The workload cannot be changed globally, and it will reset to OLTP after you have closed the session.
The best long-term solution still is to optimize your database's schema and your application's transactions and control structures to make its workloads fit into the 20 seconds time window. For simple workloads, consider using optimistic locking instead of transactions, and for more complex workloads, consider adopting Sagas. For large ETL workloads, we support data integration engines such as Airbyte and Stitch, with which you can offload these processes to other platforms that are more specialized in this field.
To help you with optimizing your queries and transactions, PlanetScale provides you with additional tools such as Insights. And, if you need a hand with any of this, please open a ticket with us or open an issue in our discussions board.]]></content>
        <summary><![CDATA[A quick glimpse on Support at PlanetScale and the issues we see the most often.]]></summary>
      </entry>
    
      <entry>
        <title>Solving N+1’s with Rails `exists?` queries</title>
        <link href="https://planetscale.com/blog/rails-n1-exists" />
        <id>https://planetscale.com/blog/rails-n1-exists</id>
        <published>2023-01-10T17:30:00.000Z</published>
        <updated>2023-01-10T17:30:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We recently had a performance issue in our Rails application where we had several N+1 .exists? queries on a single API endpoint. In the following query, we check to see if the "data_imports" feature is enabled for a user:user.beta_feature.where(name: "data_imports").enabled.exists?

Output:BetaFeature Exists? (0.6ms)  SELECT 1 AS one FROM `beta_feature` WHERE `beta_feature`.`name` = 'data_imports' AND `beta_feature`.`target_type` = 'User' AND `beta_feature`.`target_id` = 1 AND `beta_feature`.`enabled_at` IS NOT NULL LIMIT 1

This pattern initially worked well, but became an issue as we started to add more beta features. Each new beta feature added an additional query, and it started to impact the speed of the API endpoint.
Generally, in Rails applications, you can solve N+1's by using includes to preload the data. This, however, doesn't work with exists? queries. Rails will still execute the query.
Solving the N+1
Let's look at how we went about solving the N+1 problem for our Rails exists? query. On our user model, we originally had this method for checking beta features:def beta_feature_enabled?(name)
  beta_features.where(name: name).enabled.exists?
end

This will always execute a query whether or not beta_features is already loaded.
One way to avoid the query is by preloading all the records and then checking them in memory rather than executing a query.# New method, allows us to preload beta_features
def beta_feature_enabled?(name)
  if beta_features.loaded?
    beta_features.any? { |f| f.name == name.to_s && f.enabled? }
  else
    beta_features.where(name: name.to_s).enabled.exists?
  end
end

Now, in our controllers, if we preload beta_features using includes, it will already be loaded. Any calls to beta_feature_enabled? won't execute additional queries.# Executes 2 queries
@users = User.all.includes(:beta_features)

You can also use this technique for reducing queries when loading a single record.@user = User.find(params[:id])
@user.beta_features.load # preload all beta features for the user

Preloading with a scope
With the above solution, we're loading every beta_feature for every user. For our use case, this is what we want.
This could result in loading unnecessary records if your application is only checking a few though.
If that's you, here's a solution. You can set up a new association that only loads the records you need:has_many :beta_features, as: :target, dependent: :destroy_async
PRELOADED_FLAGS = %w[dark_mode insights data_imports]
has_many :preloaded_beta_features, -> { where(name: PRELOADED_FLAGS) }, as: :target, class_name: "BetaFeature"

You can now replace beta_features with preloaded_beta_features to load in only the records you need.]]></content>
        <summary><![CDATA[Learn how to solve your Rails applications N+1’s caused by `exists?` queries.]]></summary>
      </entry>
    
      <entry>
        <title>Faster MySQL with HTTP/3</title>
        <link href="https://planetscale.com/blog/faster-mysql-with-http3" />
        <id>https://planetscale.com/blog/faster-mysql-with-http3</id>
        <published>2023-01-04T14:45:00.000Z</published>
        <updated>2023-01-04T14:45:00.000Z</updated>
        
        <author>
          <name>Matt Robenolt</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Over here at PlanetScale, we offer you a MySQL database. As a part of this offering, it is critical that we offer you a MySQL protocol-compatible interface to access. This enables using mysql-client as well as any MySQL-compatible driver for your favorite language.
But what if we weren’t constrained by this? Could we provide an alternative interface and API?
Most of what I will be discussing is not publicly documented and is entirely experimental.
The background
As a part of some of our infrastructure initiatives, we demanded new APIs and connectivity features for our database. To support features that weren’t available over the MySQL protocol, we decided to start bolting on a publicly accessible HTTP API. This API is not documented for public consumption just yet (it will be, I promise), but it is gRPC compatible.
This HTTP interface led to the development of our Serverless driver for JavaScript and PlanetScale Connect.
In serverless compute contexts, your code is fundamentally not able to open up a TCP socket and speak the MySQL binary protocol to us. The platforms require communication through HTTP(S), so this ended up being a nice fit.
Having this API now opens the door to my question:
Can HTTP be faster than the MySQL protocol?
Our new APIs aren’t just gRPC. Specifically, on our end, we use connect-go, which is gRPC-compatible and gives us a bunch of other features. One of these features is the ability to potentially use HTTP/3 as a transport. HTTP/3, to me, started making things very interesting. If you’re not familiar with HTTP/3, I suggest taking a detour and doing a bit of research, then come back. But the gist is that HTTP/3 is built on top of UDP rather than TCP, using a different transport called QUIC.
My theory was that the benefits of our new API would start to yield tangible benefits in most scenarios when compared to a traditional MySQL client. The results were even pretty surprising to me!
The setup
The experiments here are confined to Go. I developed a proof of concept database/sql-compatible driver that speaks to our HTTP API using protobuf for the encoding and Snappy for the compression. I then compared this with the standard MySQL driver.
For HTTP/2, we use the stdlib Go HTTP client, and for HTTP/3, we use the experimental quic-go library.
Our PlanetScale database is provisioned in us-west, which is AWS region us-west-2.
I tested in a bunch of scenarios, but the two that we’re going to highlight here are:
High latency, low bandwidth, geographically far from my database. This is from my personal laptop, which is in Reno, NV.
Low latency, high bandwidth, geographically close. This is from an EC2 instance in AWS us-west-2.
We chose these environments to attempt to prove when, where, and if using HTTP, and even HTTP/3, become beneficial over the MySQL protocol.
Running the tests
From each environment, I run seven different tests with three different clients:
MySQL binary protocol client
an HTTP client speaking HTTP/2 (psdb)
an HTTP client speaking HTTP/3 (psdb + h3)
In the following graphs, psdb refers to the PlanetScale database.
We chose these because we wanted to see, fundamentally, if HTTP can compare to MySQL and if HTTP/3 yields any tangible benefits on top of HTTP/2. We’re ignoring HTTP/1.1 since it’s going to be objectively worse than both HTTP/2 and HTTP/3.
Each client runs the following tests:
Connect + SELECT 1. This is attempting to test a "cold start". It establishes a connection to us and runs a SELECT 1, using a new connection each run serially.
Parallel SELECT 1. This test simply warms up a connection pool ahead of time, then runs SELECT 1 in parallel.
Medium SELECT. This test reads 250 rows from a table with 2 columns, with a SELECT * FROM medium. The total result size is approx 50kb.
Medium INSERT. This test is doing the inverse and writing the same dataset in a bulk INSERT INTO medium (...) VALUES (...).
Large SELECT. This test reads from a much larger table, 10000 rows with 11 columns. Total result size is approx 27.5mb.
Large INSERT. Similarly, this is the inverse of the select, but instead, inserting 2000 rows each time.
These tests were chosen to test a decent spread of results without trying to actually benchmark PlanetScale and the underlying mysqld processes we use.
The results
The raw data and all of the graphs are contained in this Google Sheet.
Connect + SELECT 1
 
This test result genuinely surprised me in both scenarios. Results significantly favor HTTP over MySQL, with marginal improvements between HTTP/3 and HTTP/2.
A few highlights:
From my laptop, I expected a major improvement, but the min went from 162ms to 35-ish ms over HTTP, while the max also stays steady for HTTP and jumps up quite a bit for MySQL.
I’d suspect the biggest win here is fundamentally because of the difference in TLS. Both HTTP/2 and HTTP/3 in this case are using TLS 1.3, which supports a 0-RTT handshake. While, in theory, MySQL clients could also support TLS 1.3, TLS support in clients is typically not great and, in this case, negotiated with TLS 1.2. This saves a full round trip when establishing a new connection. This could also be the case with HTTP/2 as well, though support for TLS 1.3 is much more plentiful there. HTTP/3 requires TLS 1.3.
What was surprising to me, I expected this to only be reflected on higher latency networks and geographic distances. But this also was surprisingly better on our EC2 instance in us-west. From 11ms down to 3-4ms.
Overall, it’s very clear that HTTP, both HTTP/2 and 3, are substantially better for a cold start.
Parallel SELECT 1

While these results all look relatively similar, to me that’s a good thing. We can see some improvements with higher latency, but over low latency aren’t statistically significant. What this does prove is that we aren’t adding anything measurably worse to the connection in the fastest scenarios. We’d expect the weight of the protocol itself to overpower the actual data being transferred.
Medium SELECT
 
At this test size, we can start to see the tail-end latency start to improve while maintaining a relatively consistent average across the board.
My hypothesis is that the dataset isn’t large enough for compression to start to make an impact, and the reliability of the transport protocol and network are what make up the upper percentiles.
On the low latency network, we started to bottleneck on the underlying mysqld, which is also a good result since, again, it indicates that there’s no tangible overhead in using HTTP.
Medium INSERT
 
Unlike the SELECT case, this is an opportunity for our HTTP API to use client-side compression, which we cannot do with the MySQL protocol. The effects of this can be drastically seen in the high latency network case since we are uploading a decent amount of data per query.
Large SELECT
 
Large INSERT
 
On the extremely large queries, both INSERT and SELECT have some interesting characteristics. As predicted, these excel over high latency networks. On top of HTTP/2 bringing some minor improvements, HTTP/3 starts to pull a big lead. I suspect this is because, with this size of a payload, we’re potentially able to exhibit packet loss and other warts of TCP, which QUIC smooths over.
I also suspect these are a bit skewed here with bottlenecking on mysqld and the underlying disks. This might be worth revisiting again with a more capable backend so we’re not as limited.
The results that stand out, oddly, are that on a low latency network, the HTTP/3 variants are measurably slower than HTTP/2. I haven’t dug into this, but my hypothesis is that this is a performance limitation in the underlying quic-go implementation due to it being a bit more immature and less battle-tested. At these larger payloads, we might be beginning to stress the underlying QUIC implementation as well as the underlying mysqld and hardware. All around, I think this test is pushing limits elsewhere and isn’t fully testing our protocol. But I still think these are valid conclusions that both show what we can work on to improve and see how the protocol handles the stresses.
Summary
These results are rather interesting and prove a few things:
An HTTP API is actually really good. In most tests, any version of HTTP was superior compared to the binary MySQL protocol. The higher the latency and less reliable your network, the greater the benefits will be amplified. In best-case scenarios, the new APIs aren’t measurably slower, which is about the best we can ask for. In the larger payloads, though, HTTP still stands out as a winner due to the ability to compress the data over the wire.
Cold starting is where the improvements really shine without a doubt, which is super critical for anything that isn’t backed by a long-running process, such as serverless, runtimes like PHP, and periodic jobs. This is a bit amplified since the HTTP API multiplexes many traditional MySQL connections over a single HTTP connection, reducing the need to open many connections and maintain a connection pool.
HTTP/3 is even more interesting. While the benefits of using even HTTP/2 are the biggest tangible improvements, I think there are some additional benefits HTTP/3 yields that weren’t fully tested. HTTP/3 being over UDP solves a few issues with TCP when it comes to unreliable networks, which I didn’t explicitly test for. HTTP/3 support is also rather rare and immature, so I think fundamentally, there’s a lot of improvement that can be squeaked out in the libraries being used here. Overall, I think HTTP/3 is an objective improvement over the worst-case HTTP/2 scenario, and best-case, similar performance. Comparing both HTTP/2 and HTTP/3 to MySQL though, and it’s pretty clear that both have the potential to be very competitive.
As of right now, we support HTTP/3! If you use the in-product PlanetScale web console and a modern browser, you will connect over HTTP/3 and not even be aware of it. Unfortunately, HTTP/3 lacks a lot of adoption outside of web browsers due to being radically different, but we hope that we may inspire and help drive adoption where possible. We’ll continue to figure out how we may best leverage HTTP/3 transparently wherever we can.
While this is just one experiment focusing on latency and comparing the protocols, there are other benefits that aren’t discussed here that come with an HTTP API that I will be talking about when we start to publicly document these in the coming months.]]></content>
        <summary><![CDATA[In this article we explore how our HTTP/3 API compares to the latency of a traditional MySQL client.]]></summary>
      </entry>
    
      <entry>
        <title>What is a query planner?</title>
        <link href="https://planetscale.com/blog/what-is-a-query-planner" />
        <id>https://planetscale.com/blog/what-is-a-query-planner</id>
        <published>2022-12-15T14:45:00.000Z</published>
        <updated>2022-12-15T14:45:00.000Z</updated>
        
        <author>
          <name>Andres Taylor</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Anyone that has worked with large databases can testify how slow queries can get. This is often due to the necessary indexes not being there, or something in the query that stops the database system from using the index. Choosing the right indexes to use, and the right order to fetch data in, proves to be the difference between a 10ms and 5s query.
Choosing the indexes and join order is called query planning. The output of this process is a query plan that tells the database system how to answer a query from a user. For simple queries with a single table, it’s often trivial to find the optimal query plan. But for large queries with lots of tables and lots of indexes, the available options can quickly run into the thousands and even millions of alternatives. Most of these alternatives are really slow, so the planner's job is to find the best possible query plan among all possibilities.
How query planning works
Most people are more familiar with compilers than with query planners, so I thought I should compare the work of a query planner with the work of a compiler.
A compiler is a program that takes source code written in a programming language and translates it into machine code that can be executed by a computer's processor. A query planner does something similar. The input is code written in SQL (or some other database query language), and the output is a query plan that describes which indexes will be used, and in which order to access tables.
The typical phases of a compiler/planner are: lexing and parsing, semantic analysis, optimization, and code generation. Let’s look at each of these individually to understand the similarities and the differences between a compiler and a query planner.
Lexing and parsing
In the first phase, lexing and parsing, the source code is analyzed and divided into a sequence of tokens, which are basic units such as keywords, operators, and identifiers. The sequence of tokens generated by the lexical analyzer is analyzed and checked for correctness according to the rules of the programming language. This phase typically involves building a syntax tree, which is a hierarchical representation of the structure of the source code. The output of this step is an abstract syntax tree (AST). There is no interesting difference between a compiler and a planner here.
As an example, let’s look at the following query:SELECT name, avg(salary) FROM employees JOIN salary_info ON id = empid

The AST would look something like this:

It’s the same query, but instead of a string, it’s now this tree data structure. All the unnecessary parts have been stripped away — the planner doesn’t care if the user wrote “SELECT” or “select”, or any whitespaces in the query.
Semantic analysis
The semantic analysis phase of compilation is where the compiler checks for semantic errors in the input source code. Semantic errors are errors that are not detected during the lexical analysis or syntax analysis phases, but which can only be detected by analyzing the meaning of the source code.
During semantic analysis, the compiler performs a variety of checks to ensure that the source code is semantically correct. For example, the compiler may check for type mismatches, in which a value of one type is used in a context where a value of a different type is expected. The compiler may also check for undefined variables or other entities, such as functions or classes, and may perform additional checks and transformations on the syntax tree generated during syntax analysis.
A query planner does almost exactly the same thing here. Instead of searching for classes and methods, it would bind to tables and columns, but the idea is the same.
After semantic analysis, the data structures representing the query will be enriched with information such as which table a column comes from, what types the columns and expressions in the query have, etc.
Optimization
During the optimization phase, the compiler will now take all the information gathered during parsing and semantic analysis and iteratively change it to a more optimal form. This is often done using an intermediate representation of the query. Instead of staying in a shape that is close to the input language, the intermediate representation is custom made to make optimisations easier and faster to do.
In this step, the query planner uses a variety of algorithms and techniques to determine the most efficient way to execute the query, considering factors such as the available indexes, the data distribution, and the overall structure of the database. This may involve selecting the most efficient algorithms for operations such as joins and sorting, and choosing the most appropriate indexes to use. It usually also does some of the optimizations that a compiler would perform, such as constant folding. These types of optimizations are about rewriting the input into an equivalent form that is easier for the planner to optimize.
An example of this is how the Vitess planner massages predicates into a shape that can be solved using an index. Given a predicate such as:WHERE (id = 5 AND name = 'Toto') OR (id = 5 AND name = 'Mumin')

The OR in the middle here makes it hard for the planner to use an index on id to find the correct row. The optimizer will rewrite the predicate into something that is easier to optimize but still means the same thing.WHERE id = 5 AND (name = 'Toto' OR name = 'Mumin')

Let us pause here and talk about why the order of table access is so important. Say we want to join three tables: A with B, and B with C. We could start by joining A with B, and take the output of that and join it with C. Or we can start from the other side — join B with C and then join that result with A. The intermediate state needed is where the big difference comes in. If AxB is very large, joining that with C will be very slow, compared to if we start with BxC that happens to be pretty small. It’s a path finding problem.
Here is a diagram of the tables used in the TPC-H query #8. The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. It’s a well known dataset used to test the strength of database systems, in particular the query planner.

Source: https://www.sqlite.org/queryplanner-ng.html
All the tables need to be visited, and the connections between tables have different costs. The planner can start at any (*) node. What is the path that touches all tables with the least cost? This is why the join order is so important.
Optimization in Vitess query planner
In Vitess, our query plans are partly executed on the SQL proxy layer, called VTGate, and partly on the individual shards. Probably the most important optimization we perform is to push down as much work as possible to MySQL. If we can perform a join or a filter in MySQL, that is always going to be faster than fetching all the individual rows and performing the same operation on the VTGate side. So, during query planning, we are searching for the query plan that has the least number of network calls.
When planning aggregations, our strategy is to do as much aggregation as possible in MySQL, and then aggregate the aggregates. The planner rewrites the aggregation that the user asked for into smaller aggregations and sends those to MySQL. The results of these queries are then used as inputs and summarized into the final aggregation result. You can read more about grouping and aggregations in an earlier blog post.
Code generation
In the code generation phase, the compiler generates machine code based on the input source code and the analysis performed in previous phases. This machine code can then be executed by the computer's processor.
The query planner generates a plan that specifies the exact steps that the database engine should take to execute the query. This plan may include operations such as index scans, join algorithms, and sorting algorithms, as well as other details such as the order in which the operations should be performed.
The importance of a good query planner
Query planners are an essential component of database management systems, and the work that query planner developers do plays a crucial role for database systems. The field of query planning is an active area of research, with new algorithms and techniques being developed all the time. A good query planner can have a direct impact on the performance and efficiency of databases, which can have real-world benefits for the organizations and users that rely on those databases.]]></content>
        <summary><![CDATA[Learn how query planning works and why query planners are important.]]></summary>
      </entry>
    
      <entry>
        <title>Temporal workflows at scale: Part 2 — Sharding in production</title>
        <link href="https://planetscale.com/blog/temporal-workflows-at-scale-sharding-in-production" />
        <id>https://planetscale.com/blog/temporal-workflows-at-scale-sharding-in-production</id>
        <published>2022-12-14T00:03:57.138Z</published>
        <updated>2022-12-14T00:03:57.138Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In the previous post of this two-part series, we familiarize ourselves with Temporal, a deterministic workflow processing framework that lets you run stateful workflows in a more fault-tolerant way.
One of the most significant decisions Temporal users make when running Temporal in production is determining the appropriate persistence layer. People often assume that they can only scale in Temporal by using a No-SQL option like Cassandra. This post will address why PlanetScale doesn't require you to make scalability trade-offs for operational complexity so you can take full advantage of PlanetScale's scalability improvements over a single MySQL instance.
Sharding MySQL with PlanetScale

We built PlanetScale on top of Vitess so that we could harness its ability to scale massively — through horizontal sharding. Vitess provides an abstraction over MySQL and retains the benefits of a relational database by enabling missing functionalities.
To visualize how PlanetScale delivers resiliency, scalability, and performance improvements over a single MySQL instance, let’s examine what’s happening behind the scenes with Vitess.

Vitess is a database technology that can create the illusion of a single MySQL database when, in reality, the database may be composed of multiple shards, each backed by their own MySQL instance. In the case of sharding, the VTGate layer transparently routes queries to the necessary shards.
VTGate is a stateless entry point to your cluster that knows all application logic coded into vSchema.
VSchema is an abstraction layer that presents a unified view of the underlying keyspaces and shards. It contains the information about the sharding key for a sharded table.
Vitess adds a sidecar process that scales alongside your cluster and allows for connection requests to be queued up. This sidecar process is also known as VTTablet and keeps the underlying MySQL processes safe from a memory management standpoint, allowing you to keep adding workers as needed to scale the application.
A shard is a subset of a keyspace. A keyspace is a logical database. If you’re using sharding, a keyspace maps to multiple MySQL instances; if you’re not using sharding, a keyspace maps directly to a single MySQL database in a single MySQL instance. In either case, a keyspace appears as a single database from the application's viewpoint.
All sharded keyspaces in PlanetScale have a vSchema and are sharded by keyspace ID ranges. The keyspace ID is a concept internal to Vitess, and your application does not need to know anything about it. To set up sharding in PlanetScale, we work with you to configure the vSchema with a Primary Vindex. Mapping to a keyspace ID, then to a shard, gives us the flexibility to reshard the data with minimal disruption because the keyspace ID of each row remains unchanged through the process. Changing the number of shards happens transparently to the application and user, so no application changes are required if you want to add more shards.
If you’re interested in reading more about Sharding in PlanetScale and Vitess, Our team recently published content demonstrating our ability to shard predictably and handle large query volumes in the One Million QPS blog. We also released content related to Connection Pooling and our Global Routing Infrastructure.
Sharding in Temporal
Similar to PlanetScale, Temporal leverages horizontal partitioning by calculating a hash for each identifier and allocating it to a specific shard based on the hash value. Shards are represented as a number from 1 to N. However, to achieve consistency requirements with other persistence layers – Temporal serializes all updates belonging to the same shard, so all updates are sequential. As a result, the latency of a database operation limits the maximum theoretical throughput of a single shard.
Tuning the History Shard Count (numHistoryShards) in Temporal is a critical and required configuration of a Temporal cluster. The configured value assigned in this step directly impacts the system's throughput, latency, and resource utilization. Getting this configuration right for your setup and performance targets is essential for production, and the value is immutable after the initial cluster deployment. You must set this value high enough to scale with this Cluster's worst-case peak load.
Schema in Temporal is generally well-designed, and because the tables don't rely on each other – configuring the vSchema (In PlanetScale) for Temporal databases is a fairly simple process. For most Temporal tables, you will find either shard_id or range_hash defined within the Primary Key. This maps directly to the Primary Vindex we use as our Sharding Key when we create our VSchema.
Behind the scenes
Temporal is a very write-intensive application; it's easy to accumulate several terabytes of bin logs during application updates and upserts. In addition to this, some tables grow faster and have more traffic than others. This section will cover how we defined the vSchema and routing rules for one of our customers sharding Temporal today. One thing to note is that we approach sharding differently for each use case, and this customer example isn't how we shard all our customers today.
In the graph below, you can see that there are two keyspaces for their Temporal production workloads. One of those keyspaces is sharded, and the other is unsharded. The storage size across these two keyspaces varies, but the larger tables are on the sharded keyspace.

For this specific customer, we defined a vSchema to an unsharded keyspace with existing tables since they already had their Temporal workload connected to PlanetScale. The vSchema for the unsharded keyspace looks something like this since we only need to define the table name:{
  "tables": {
    "buffered_events": {},
    "cluster_membership": {},
    "cluster_metadata": {},
    "cluster_metadata_info": {},
    "namespace_metadata": {},
    "namespaces": {},
    "queue": {},
    "queue_metadata": {},
    "request_cancel_info_maps": {},
    "schema_update_history": {},
    "schema_version": {},
    "signal_info_maps": {},
    "signals_requested_sets": {},
    "timer_info_maps": {}
  }
}

To move the rest of the tables to a new sharded keyspace, we applied a different vSchema. Since this keyspace needed to be sharded, we first had to specify sharded=true and include the Predefined Vindexes we wanted to use with no tables. In the vSchema below, you can see that we defined xxhash as the Vindex or specific sharding function.{
  "sharded": true,
  "vindexes": {
    "xxhash": {
      "type": "xxhash"
    }

After that, we started the MoveTables process for all tables using the Vindex syntax. Once the MoveTables process completes, the routing rules stay in place. Then we manually switched traffic from the unsharded source keyspace to the new sharded keyspace using the --SwitchTraffic command. In the rest of the vSchema for the sharded keyspace below, you’ll see that the ColumnVindex or Sharding key is defined from the shard_id or range_hash columns in the Temporal Schema. For this reason, we also specified the column name within the ColumnVindex and hashing function.},
  "tables": {
    "activity_info_maps": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "current_executions": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "executions": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "history_node": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "history_tree": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "replication_tasks": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "replication_tasks_dlq": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "shards": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "task_queues": {
      "columnVindexes": [
        {
          "column": "range_hash",
          "name": "xxhash"
        }
      ]
    },
    "tasks": {
      "columnVindexes": [
        {
          "column": "range_hash",
          "name": "xxhash"
        }
      ]
    },
    "timer_tasks": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "transfer_tasks": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    },
    "visibility_tasks": {
      "columnVindexes": [
        {
          "column": "shard_id",
          "name": "xxhash"
        }
      ]
    }
  }
}

Conclusion
Our PlanetScale Infrastructure team has also recently implemented Temporal workflows internally to automate manual human tasks for Vitess releases. In addition, we have customers using it in production in sharded environments.
Over the most recent holiday selling period (Black Friday and Cyber Monday), one of our customers sustained record numbers of QPS and IOPS with no interruptions. In the graphs below, you can see daily breakdowns of peak load traffic for their Temporal databases in production.
QPS for Temporal for the week: 
QPS for Temporal BF (Thursday): 
QPS for Temporal BF (Friday): 
QPS for Temporal Cyber-Monday (Monday): 
QPS for Temporal Cyber-Monday (Tuesday): 
Unlike other NoSQL bindings for Temporal, sharding a database in PlanetScale doesn’t create more potential failure points or add more complexity than it’s worth. You reduce the impact of node failures and increase write and read throughput. Writes and reads are distributed across many machines making millions of QPS possible. The days of compromising operational simplicity for scalability when choosing a data persistence layer for Temporal are behind us.]]></content>
        <summary><![CDATA[Learn how PlanetScale simplifies the process of running Temporal in production by looking at how our customer runs heavy production workloads.]]></summary>
      </entry>
    
      <entry>
        <title>Rails’ safety mechanisms</title>
        <link href="https://planetscale.com/blog/rails-safety-mechanisms" />
        <id>https://planetscale.com/blog/rails-safety-mechanisms</id>
        <published>2022-12-12T00:43:00.000Z</published>
        <updated>2022-12-12T00:43:00.000Z</updated>
        
        <author>
          <name>Jason Charnes</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Companies build products with Ruby on Rails because of its focus on efficiency. The framework comes with a set of defaults that allows you to focus on building, not configuring.
Defaults can have limits. We can often find ourselves with paper cuts if we’re not careful. Fear not! Rails has conventions for this, too. Let’s look at some tools the framework gives us to protect ourselves.
Mass assignment protection in Rails
One of the first things Rails developers learn is handling form parameters. The params object contains all the form parameters for us to use in controllers and views. But whenever Rails generates a scaffold it uses something called strong parameters.
What are strong parameters, exactly? When creating or updating an Active Record model we can pass an optional hash of attributes.User.create(name: "Jason Charnes", location: "Memphis")`

This pairs beautifully with the Rails form helpers:<%= form_with model: @user do |form| %>
  <%= form.text_field :name %>
  <%= form.text_field :location %>
  <%= form.submit %>
<% end %>

The parameters sent from the form to the controller look like this:{user: {name: "Jason Charnes", location: "Memphis"}}

We can pass the user hash directly:class UsersController < ApplicationController
  def create
    User.create!(params[:user])
  end
end

🚨 This is mass assignment; it’s dangerous.
While creating the record, we’re mass assigning the attributes. This is simple, clean, and appears to work great, but there’s a problem.
The User model also has an admin boolean that defaults to false. What if our end-user is sneaky and adds this to the form?<input type="hidden" name="user[admin]" value="true" />

Now the params look like this:{user: {name: "Jason Charnes", location: "Memphis", admin: "true"}}

We’re passing the entire list of parameters to Active Record, including admin.class UsersController < ApplicationController
  def create
    @user = User.create!(params[:user])
    @user.admin? #=> true
  end
end

Yikes, we gave away the keys to the kingdom. This is the role strong parameters play.
Strong parameters
Instead of mass assignment (passing the raw params directly to Active Record), we use strong parameters to signal which attributes are allowed.class UsersController < ApplicationController
  def create
    User.create!(user_params)
  end

  private

  def user_params
    params.require(:user).permit(:name)
  end
end

Here, we define a user_params method that is responsible for allowing specific parameters through.
We start by telling the params object that we require the parameters to have a key of :user. If the params don’t have this key, an error is raised and the request is halted.
From there we provide a list of permitted attributes to the permit method.
Now, if our end-user tries to pass admin as a parameter, Rails will exclude it from the list of params passed to Active Record.
Rails will log unpermitted parameters by default. This is useful for debugging in development and looking for bad actors in production. If you want to take things further, you can tell Rails to raise an error if unpermitted parameters are passed:config.action_controller.action_on_unpermitted_parameters = :raise

Strong parameters are a controller-level feature. Is there a way to enforce this in the model? Rails doesn’t provide this, but it used to.
Before the introduction of strong parameters in Rails 4, mass assignment protection happened at the model level. (Does anyone remember attr_protected and attr_accessible?!)
The strong parameters pattern is more flexible. You likely want to permit different attributes depending on the context. A user shouldn’t set their admin status. But an existing admin may be able to set it.
While it’s easier to define it in the model, it’s simpler to let the controller do it.
N+1 prevention in Rails
N+1 queries are unfortunately easy to perform in Active Record. ✨
If we access an association we haven’t loaded, Rails will make the database lookup on our behalf. Rails has our back here. It comes at a cost, though. (Unless you view N+1s as a feature.)
Pretend we’re rendering a list of orders:class OrdersController < ApplicationController
  def index
    @orders = Order.all
  end
end
<% @orders.each do |order| %>
  <tr>
    <td><%= order.id %></td>
    <td><%= order.customer.name %></td>
    <td><%= order.created_at %></td>
  </tr>
<% end %>

We’re rendering the customer’s name on each order. An order belongs to a customer.class Order < ApplicationRecord
  belongs_to :customer
end

We didn’t ask the database for customers, though. Just orders. The logs show the single database query for orders. ✅SELECT "orders".* FROM "orders"

And a customers table query for each order we rendered. ❌SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 3], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 4], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 5], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 7], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 8], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 10], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 11], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 12], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 14], ["LIMIT", 1]]
SELECT "customers".* FROM "customers" WHERE "customers"."id" = $1 LIMIT $2  [["id", 15], ["LIMIT", 1]]

Rails knows customers aren’t loaded, so it lazy loads (does a database lookup for) each customer on the fly.
The query returns 15 orders, which means 15 individual customer queries. This is what I mean by N+1. We have N order records. For every order, we have +1 more query to look up the customer.
This is a problem when working with large datasets.
Each request to the customers table isn’t necessarily a bottleneck — they’re pretty quick. But they add up. And sometimes it’s not just one association per record. It’s many associations per record.
The solution to N+1s
We fix this by preloading associations.class OrdersController < ApplicationController
  def index
    @orders = Order.includes(:customer).all
  end
end

By adding the .includes method with the association we want to preload, Rails ensures that every corresponding customer is loaded.
Instead of 15 extra SQL queries, we only have 1 extra query:SELECT "orders".* FROM "orders"

SELECT "customers".* FROM "customers" WHERE "customers"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)  [["id", 1], ["id", 2], ["id", 3], ["id", 4], ["id", 5], ["id", 6], ["id", 7], ["id", 8], ["id", 9], ["id", 10], ["id", 11], ["id", 12], ["id", 13], ["id", 14], ["id", 15]]

Over time it’s easier to know when you’re introducing an N+1 query, but it’s still easy to miss them. Let’s look at how we can catch N+1s before shipping to production.
Before Rails 6.1
The Bullet gem is the tool for discovering N+1s.
Bullet keeps an eye out for N+1 queries in your application. From Rails logs to notifying you in Slack, it has many ways to warn you of N+1 queries.
Remember, it’s still up to you to listen and fix the N+1s. 😅
Rails 6.1+
If you don’t want to fool with another dependency or know that you’ll just ignore Bullet warning you of N+1s, there’s a more invasive option.
Rails 6.1 introduced a mechanism for detecting down N+1s: strict_loading.class Order < ApplicationRecord
  belongs_to :customer, strict_loading: true
end

Adding this option to the customer association results in an ActiveRecord::StrictLoadingViolationError raised if Rails detects you’re lazy loading the association.
When we encounter this error, it’s clear to us what happened:`Order` is marked for strict_loading. The Customer association named `:customer` cannot be lazily loaded.

Adding this option to each association is tedious! Luckily, there’s a way to enable this functionality globally.
Applying strict loading in development
If you only want an error raised in development, not production, you can enable the option in config/development.rbconfig.active_record.strict_loading_by_default = true

I like Aaron Francis’ idea of enforcing strict loading in development only. In our companion article on Laravel’s safety mechanisms, he suggests:
Lazy loading relationships does not affect the correctness of your application, merely the performance of it. Ideally, all the relations you need are eager loaded, but if not, it simply falls through and lazy loads the required relationships.
For that reason, we recommend disallowing lazy loading in every environment except production. Hopefully, all lazy loads will be caught in local development or testing, but in the rare case that a lazy load makes its way into production, your app will continue to work just fine, if a bit slower.
Applying strict loading regardless of the environment
If performance is critical and it’s important to raise this error in all environments, including production, apply the option in config/application.rb:config.active_record.strict_loading_by_default = true

Asynchronous association destruction
Active Record provides a mechanism for deleting associated records when a parent record is deleted.
The quickest, simplest way to delete dependent data is by providing the dependent: :delete_all option on the association:class Order < ApplicationRecord
  has_many :invoices, dependent: :delete_all
end

Deleting this order will execute a separate, single SQL query to delete all the associated invoices.
Sometimes, though, the dependent records have “on delete” callbacks that need to run. Because delete_all uses a SQL query, it’s not going to instantiate the dependent records and run the delete callbacks.
To ensure the callbacks are run, we’d change the dependent option to destroy:class Order < ApplicationRecord
  has_many :invoices, dependent: :destroy
end

Deleting an order now instantiates each dependent, and associated record and calls #destroy on it.
But what happens if there are tens, hundreds, or even thousands of associated records? It means tens, hundreds, or even thousands of object instantiations and SQL calls.
This long-running functionality feels like something that could run in a background job. As it turns out, Rails agrees.
Changing the dependent: :destroy to dependent: :destroy_async will enqueue a background job to destroy the dependent, associated records.
It’s worth noting this only works for associations that do not have a foreign key constraint. For more options for destroying dependent relations, take a look at the Miss Hannigan gem.
We have another post on deleting data at scale with Rails if you want to dig in deeper.
Fail loudly, fail proudly
Active Record methods fail loudly or silently.
You can often tell if an Active Record method fails loudly or silently by its name. Most (not all) methods that end with a bang (!) will raise an error. What’s the difference?
Loud failures
Loud failures raise an error and halt code execution.
A good example of this is Active Record’s .find method which locates the database record by ID. Trying to find a record that doesn’t exist raises an ActiveRecord::RecordNotFound error.class OrdersController < ApplicationController
  def update
    @order = Order.find(params[:non_existent_id]

    @order.update(order_params)
  end
end

We never get to the update call because the lookup raises an error and halts the request.
Silent failures
Instead of using .find, replace it with .find_by, which takes a list of attributes to look up instead of an ID.class OrdersController < ApplicationController
  def update
    @order = Order.find_by(id: params[:non_existent_id]

    @order.update(order_params)
  end
end

In this case, .find_by returns nil. This means we’ll get to the #update action, but it will try to call update on nil, which will raise the ever-present “undefined method x for nil.” 🥶
Comparing the two
In this example, we create an order and notification. Once we’re done creating records, we send a notification email.order = Order.create(order_params)

notification = Notification.create(notifiable: order)

NotificationEmail.deliver_later(notification)

What happens if creating the notification fails validation?
We get an instance of Notification back that isn’t in the database. The notification is passed to the NotificationEmail and sent in the background.
This block of code doesn’t fail. But when the job runs later, in the background, it fails. The notification can’t be retrieved from the database because it was never saved to the database. 😅
Silently failing
If we acknowledge it’s okay for a notification to fail, we can continue using the silent failure, create, and add a boundary:notification = Notification.create(notifiable: order)

if notification.valid?
  NotificationEmail.deliver_later(notification)
end

Failing loudly!
But often times I don’t expect code to fail. If it is failing, it’s a signal of a larger problem that should be addressed.
In our example, the background job failing wasn’t the root issue. It was a side effect. If we don’t expect creating a notification to ever fail, failing loudly might be a good option.
In this example, failing loudly is done by using create!:notification = Noficiation.create!(notifiable: order)
Notification.deliver_later(notification)

This juicy code block raises an error if validation fails and makes it easier to track.
If the job is failing because the notification wasn’t created, we have to figure out what enqueued the job and why it failed.
If an error is raised because the Notification failed to create, the job is never enqueued and we immediately know where to look to fix the problem. The error itself signals what went wrong.
Credential protection in Rails
I once got a bill from AWS for $20,000+. Want to guess what happened?
My private git repo was compromised where I had hard-coded AWS credentials. That day I learned about the importance of keeping credentials out of your code.
We often store credentials as environment variables. Whenever we add a new credential, we update all the places our application runs: production, CI, the password manager team shares, etc.
What if I told you Rails has a built-in mechanism for securely storing credentials?
Rails credentials
Instead of keeping environment variables in sync across multiple developers and platforms, we can store our credentials securely in our application.
To get started, run the Rails CLI:bin/rails credentials:edit --environment=development

The first time this command is run it creates two files:
config/credentials/development.yml.enc
config/credentials/development.key
The first file is an encrypted YAML file. When we ran our command, the file was temporarily decrypted and opened in our editor for us to edit.
While the file is decrypted, we can update it to store our credentials in a standard YAML key/value structure:stripe:
  secret_key: SK1234
  publishable_key: PK1234

Once we close the file, it’s re-encrypted. The file can be decrypted only with the key it was encrypted with, which was the second file created. Rails adds the decryption keys to .gitignore, keeping it from accidentally being committed to git.
Once our secrets are saved, we can access them in the app using the following:Rails.application.credentials.stripe.secret_key
Rails.application.credentials.stripe.publishable_key

 🚨 Keep the key that Rails generated safe! If you lose it, you’ll lose access to your credentials. If you work on a team, keep it somewhere safe, like a password manager. 
Multiple environments
In the example above, I used the development environment. Rails is going to automatically defer to the development set of credentials in development.
I create a credentials file for each environment: development, staging, and production. This makes it easy to keep development and production credentials separate.
Development emails in Rails
What’s more fun than using real email addresses in development? Accidentally sending emails to those real email addresses in development. 😎
(While we’re here… please don’t send emails from Active Record callbacks.)
While tools exist to make this experience better (Letter Opener or Mailtrap), Rails provides a few mechanisms to help.
Turn off email delivery
This is the least exciting yet most effective mechanism for preventing unwanted emails from being sent. To turn off email delivery, add the following setting to config/development.rb:config.action_mailer.perform_deliveries = false

This option may be suitable if you have email previews and a solid test suite.
Change the delivery method
Maybe you want to have a record of the email being sent for debugging. Action Mailer can save “sent” emails as files instead of delivering them.
To do this, add the following setting to config/development.rb:config.action_mailer.deliver_method = :file

Any emails “sent” from the application in development will be saved to tmp/mails. This saves the raw output, which might be enough for your use case.
For me, though, I typically want to see the email in its final, table-loaded, CSS-less, 1990s HTML email form.
Email interceptors
The final option we’ll look at is intercepting all emails sent in development and rerouting them to your email address. This only works if you have an active SMTP server or email provider configured in development mode.
We do this by defining an email interceptor:class DevelopmentEmailInterceptor
  def self.delivering_email(message)
    message.to = ["jason@example.com"]
  end
end

The email interceptor implements the .delivering_email method. Inside the method, we’ll change the message’s recipient to our email address.
To wire up the interceptor, we add the following option to config/development.rb:config.action_mailer.interceptors = ["DevelopmentEmailInterceptor"]

Now, any email sent from development will be rerouted to our email, no matter who it was initially addressed to.
Stay safe
These tools give us more confidence in building our applications.
Having Rails raise an error every time you forget to include an association may feel like a minor annoyance. But it’s less annoying than having to revisit code a few months later to fix a performance issue a preload would have avoided.
Having to add the boilerplate strong parameters requires may feel boring. But give me boring over the excitement of a bad actor creating a security incident for my customers.
Go enjoy the vast magic of Rails, my friends.]]></content>
        <summary><![CDATA[A comprehensive overview of Rails’ many safety features that can help you prevent painful mistakes.]]></summary>
      </entry>
    
      <entry>
        <title>Building a multi-region Rails application with PlanetScale</title>
        <link href="https://planetscale.com/blog/rails-multi-region-database" />
        <id>https://planetscale.com/blog/rails-multi-region-database</id>
        <published>2022-12-08T17:30:00.000Z</published>
        <updated>2022-12-08T17:30:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[You've put all this effort into making your Rails application as fast as possible. Each and every query is optimized. Views are cached. N+1 queries are fixed.
The last remaining problem is the speed of light between your server and your users.
Just check out this chart. Here is the additional network latency for a single request to an app deployed to US East. Works great if you're located near the server, but not so great as you get farther away.
Location
Latency to US East application
N. California
52ms
Paris
83ms
Frankfurt
92ms
Singapore
214ms
Cape town
231ms
Latency values from https://www.cloudping.co
Wouldn't it be wonderful if we could just deploy our Rails application everywhere that our users are! Getting our application servers distributed around the globe is actually quite easy these days with providers like Fly.io, Heroku and Render. Cool people call this, deploying to "the edge".
Even if we do this, we still have one major problem. The database. It will need to be multi-region too.
If our application is running in Singapore, but our database is in US east, we'll be paying that ~200ms penalty per database query.
Multi-region databases with Rails
To deploy our application globally, we must also co-locate our data with our application.
This means we need to do two things:
Setup database replicas in the same regions as our application
Teach our Rails application to read from the nearest replica
Our end goal will be having a Rails application that sends all of its reads to the nearest database replica. Any writes will still be directed at the primary.
Set up database replicas
If you haven't already, sign up for a PlanetScale account. Spin up a new database, and follow our Rails quickstart to get connected. Once you have your database connected to your Rails application, it's time to configure it to support multi-region.
With PlanetScale, we can set up read-only replicas all over the globe. To do this, navigate to your database's main branch, click "Add region" toward the bottom to create a replica, choose the region you want to add, and grab the credentials to connect.
PlanetScale will set up a replica in your chosen region and automatically keep it in sync as data is written to your primary region.

Read-only database connection
Now that your read-only region is configured on PlanetScale, you need to set up a new read-only connection to your replica in your application.
To do this, modify your database.yml to include both a primary and read-only replica connection.default: &default
  adapter: trilogy
  encoding: utf8mb4
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  username: root
  password:
  socket: /tmp/mysql.sock

development:
  primary:
    <<: *default
    database: multi_region_rails_development
  primary_replica:
    <<: *default
    database: multi_region_rails_development
    replica: true

test:
  primary:
    <<: *default
    database: multi_region_rails_test
  primary_replica:
    <<: *default
    database: multi_region_rails_test
    replica: true

Add the following to your application_record.rb:# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class

  connects_to database: { writing: :primary, reading: :primary_replica }
end

Once Rails is aware of your replica connection, you'll be able to manually query it by wrapping any queries in a block using connected_to(role: :reading).ActiveRecord::Base.connected_to(role: :reading) do
  books = Book.where(author: "Taylor")
  # all code in this block will be connected to the replica
end

Automatic connection switching
Manually wrapping every read query would be tedious. Rails has a better way. Automatic connection switching enables Rails to swap between your primary and replica connections as needed. All writes will be directed to the primary. Reads will hit the replica.
This is what we need for our application to work well automatically when deployed to different regions.
To set this up, run:bin/rails g active_record:multi_db

And then uncomment the following lines in application.rb:Rails.application.configure do
  config.active_record.database_selector = { delay: 2.seconds }
  config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
  config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
end

Notice this line: config.active_record.database_selector = { delay: 2.seconds }. It's the key detail that will enable your application to handle reading its own writes.
Replication lag and reading your own writes
The majority of web requests to most Rails applications are GET requests. These requests read data from your database.
POST/PUT/PATCH and DELETE requests update data in your application. When using multiple database connections, one common pitfall is replication lag.
When using database replicas, there will always be a small delay between when data is written to the primary and when it is available on the replicas. This is known as replication lag. It can vary based on how busy the primary database is.
Replication lag becomes a problem for your application when a user writes to the database and then immediately tries to read that same data from the replica. It's possible the data is not there yet and the user will be served an error rather than the data they are expecting.
To solve this, Rails has middleware that will automatically set a cookie for 2 seconds after each write. While this cookie is present Rails will direct all reads to the primary rather than the replica.
Connecting to the nearest database replica
Now that our application can connect to our replica, we need it to selectively connect to the closest one to take advantage of the low latency.
To do this, we need to tell our application which set of credentials to use based on where our Rails application is deployed.
In this example, we have our connection details stored in Rails credentials.<%
  # Our application has a region environment variable.
  # We check this variable and connect to the closest DB region.
  region = ENV["APP_REGION"]

  # When in Frankfurt, we use our Frankfurt region.
  # When in São Paolo, => São Paolo region.
  region_replica_mapping = {
      "fra" => Rails.application.credentials.planetscale_fra,
      "gra" => Rails.application.credentials.planetscale_gra
  }

  # If no specific region exists, we’ll connect to the primary.
  db_replica_creds = region_replica_mapping[region] || Rails.application.credentials.planetscale
%>

production:
  primary:
    <<: *default
    username: <%= Rails.application.credentials.planetscale&.fetch(:username) %>
    password: <%= Rails.application.credentials.planetscale&.fetch(:password) %>
    database: <%= Rails.application.credentials.planetscale&.fetch(:database) %>
    host: <%= Rails.application.credentials.planetscale&.fetch(:host) %>
    ssl_mode: <%= Trilogy::SSL_VERIFY_IDENTITY %>
  primary_replica:
    <<: *default
    username: <%= db_replica_creds.fetch(:username) %>
    password: <%= db_replica_creds.fetch(:password) %>
    database: <%= db_replica_creds.fetch(:database) %>
    host: <%= db_replica_creds.fetch(:host) %>
    ssl_mode: <%= Trilogy::SSL_VERIFY_IDENTITY %>
    replica: true

Once this is in place, we can now have our globally deployed app read data from our globally deployed database. This will result in much faster GET requests for anyone in that region. Any writes will still go to the primary.]]></content>
        <summary><![CDATA[Learn how to configure your database in a multi-region Rails application to decrease latency across the globe.]]></summary>
      </entry>
    
      <entry>
        <title>Secure your connection string with AWS KMS</title>
        <link href="https://planetscale.com/blog/secure-your-connection-string-with-aws-kms" />
        <id>https://planetscale.com/blog/secure-your-connection-string-with-aws-kms</id>
        <published>2022-12-07T15:00:00.000Z</published>
        <updated>2022-12-07T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Overview
Most developers are fully aware that it is bad practice to store sensitive information in code. If your codebase is accessed by an unauthorized user, that individual now has the means to use whatever that info was trying to protect. For instance, if you were to store your PlanetScale connection string in the codebase, anyone with access to the code can now directly access your database and wreak all sorts of havoc.
One simple way to avoid this is to use environment variables so that sensitive information can be stored in the system that runs the code, as opposed to the code itself. If you are building backend services in AWS, there is actually a way to secure those environment variables even further by encrypting them using a managed key in the AWS KMS service. This article explains what KMS is and how you can use it within a Lambda function.
To follow along with this tutorial, you’ll need the following:
A PlanetScale account.
The PlanetScale CLI is installed and configured, or a pre-existing database and connection string.
Go, as well as a basic understanding of the language (although the code samples are heavily commented on even if you aren’t familiar with Go).
An AWS account.
Please note that we will be creating resources in AWS which cost real money. Resources may be covered under the free tier for AWS users, so check your AWS plan.
As this is a security-focused topic, it's also recommended that you understand the AWS Shared Responsibility Model.
What is KMS
AWS Key Management Service (KMS) is a service provided by AWS that allows you to manage cryptographic keys from a single location. It lets you easily generate symmetric or asymmetric keys, or upload keys you’ve generated yourself, that can be used to encrypt and decrypt data.
When considering the AWS security model, it allows you to dictate not only what information is secured using those keys, but also what services in AWS can access the keys to decrypt information that was previously encrypted using AWS Identity and Access Management (IAM) policies. It’s also designed in such a way that AWS employees have no access to those keys, so not even the individuals working within AWS can decrypt your sensitive information.
Tutorial overview
It is recommended to encrypt your environment variables (such as your PlanetScale connection strings) with a system like KMS to ensure maximum security. Let’s take a look at how to do this using a Lambda function. Here is an overview of what we’ll look at in the rest of this article:
You’ll start by creating a Lambda function that reads some data from a PlanetScale database.
Then you’ll create a managed key in KMS.
Next, the environment variable that stores the connection string will be encrypted, which will cause the function to error.
Finally, you’ll update the code to decrypt the connection string, and test reading data from the database again.
The database used for this project will contain a single table called recipes that has the following structure:+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | int          | NO   | PRI | NULL    | auto_increment |
| name                    | varchar(100) | YES  |     | NULL    |                |
| est_time_to_make_in_min | int          | YES  |     | NULL    |                |
| description             | varchar(100) | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

If you wish to follow the same structure, the following SQL commands can be used to create the table in your database and seed some data. These can be run using the PlanetScale CLI, or in the Dashboard using the Console tab.CREATE TABLE recipes (
  id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(100),
  est_time_to_make_in_min INT,
  description VARCHAR(100)
);

INSERT INTO recipes (name, est_time_to_make_in_min, description) VALUES
	('Goatsnake Pizza', 60, 'Lots of deliciousness'),
	('Clutch Pizza', 45, 'This is a pretty awesome pizza too'),
	('Pepperoni Pizza', 30, 'Spicy stuff!');

Create the Lambda function
Start by creating a new folder on your computer to hold the Go project. Open a terminal in that directory and run the following command to initialize the project, replacing $PROJECT_NAME with an arbitrary name for the project:go mod init $PROJECT_NAME

Create a file named main.go and add the following code. The code is commented so you’ll understand what's going on:package main

import (
	"database/sql"
	"encoding/json"
	"log"
	"os"
	"github.com/aws/aws-lambda-go/lambda"
	_ "github.com/go-sql-driver/mysql"
)

// The Recipe model will hold the data for a record pulled from the database.
type Recipe struct {
	Id            int
	Name          string
	EstTimeToMake int
	Description   string
}

// Sets up the connection to the PlanetScale database.
func GetDatabase() (*sql.DB, error) {
	db, err := sql.Open("mysql", os.Getenv("DSN"))
	return db, err
}

// The function that will be called when the Lambda function is invoked.
func handler() {
	// Get a database connection
	db, err := GetDatabase()
	if err != nil {
		panic(err)
	}

	// Run the query to fetch all recipes
	query := "SELECT * FROM recipes;"
	res, err := db.Query(query)
	if err != nil {
		panic(err)
	}

	// Loop through the results, placing them in an array of the Recipe struct
	var recipes []Recipe
	for res.Next() {
		var r Recipe
		res.Scan(&r.Id, &r.Name, &r.EstTimeToMake, &r.Description)
		recipes = append(recipes, r)
	}

	// Convert the results to a JSON string and log out that string.
	jbytes, err := json.Marshal(recipes)
	if err != nil {
		panic(err)
	}
	log.Printf("Returned recipes: %v", string(jbytes))
}

// The main entry point for the Lambda service.
func main() {
	lambda.Start(handler)
}

Now head back to the terminal and run the following command to install any missing dependencies:go mod tidy

Next, you’ll need to build the function and zip up the binary so it can be uploaded to the Lambda service. You’ll need to set the environment variables for GOARCH and GOOS so the compiler creates the appropriate binary for a Lambda environment. Run one of the following commands (depending on your operating system) to create a binary in a dist folder:// Mac/Linux
GOARCH=amd64 GOOS=linux go build -o dist/main .

// Windows
$Env:GOARCH="amd64"; $Env:GOOS="linux"; go build -o dist/main .

The last step before uploading the function to AWS is to add the binary that was created into a zip file since this is what the Lambda service expects. On a Mac, you can right-click the main file and compress it to create that file.

Set up and test the Lambda function in AWS
The next step is to configure a Lambda function in AWS and upload the zipped folder from the previous section. In AWS, use the global search to find “Lambda”, and select it from the list of results.

Click on "Create function" to start the wizard to create a Lambda function.

Give the function a name and change the "Runtime" to Go 1.x. Leave the rest of the defaults and click "Create function" at the bottom of the form.

Once the function has been created, you’ll need to upload the zip file created in the previous section. From the "Code" source section, select "Upload from" > ".zip file".

In the next modal, click the "Upload" button and select the zip file from your computer. Click "Save" once you’ve selected it.

Next, you’ll need to change the default handler from hello to main, which is the name of the binary that was built for Lambda. Under Runtime settings, click "Edit".

Change the Handler field to “main” and click "Save".

Next, select the "Configuration" tab > "Environment variables" > "Edit".

Create an entry named “DSN” and paste in the connection string for your PlanetScale database. You can find this in your PlanetScale dashboard by clicking "Connect", clicking the "Connect with" dropdown, and selecting "Go". Once you have it, paste it in and click "Save".

Finally, lets test the function and see if we get data back from the database. Select the "Test" tab, then click the "Test" button.

The view should update and display an alert box called Execution result. If you followed all of the previous steps correctly, the box should be green. Expand it and you should see the records from the database under Log output.

Now lets see how to encrypt our connection string with a KMS key. Before moving on from Lambda, you’ll need to grab the execution role for this Lambda. You can find that in the "Configuration" tab under "Permissions". Take note of it as you’ll need it in the next step.

Create a customer managed key in KMS
Start in the AWS console and use the global search to find “key management service”. Select it from the list of available services.

If you do not see a button to create a key immediately, select "Customer managed keys" from the left navigation first. Click "Create key".

As mentioned earlier, AWS lets you create symmetric and asymmetric keys. Both options can be used to encrypt and decrypt data, but asymmetric keys are useful if you need to download the public key for signing other artifacts outside of AWS. Since we’re only working within AWS, leave "Symmetric" selected and click "Next".

In the next view under Alias, give the key a display name for your reference and click "Next".

Now you need to configure the key administrators, which can be an IAM user, group, or role. Key administrators are users that are allowed to make changes to the key from the AWS console or APIs. For this tutorial, select your own IAM user account. Scroll down and click "Next".

The next view will let you select IAM users, groups, or roles that are allowed to access your key in KMS. Type the name of the execution role for your Lambda function from the previous section and select it from the list. Click "Next" once you’ve selected it.

Finally, scroll to the bottom and click "Finish".

Encrypt the connection string in Lambda
Head back to your Lambda function, select the "Configuration" tab > "Environment variables" > "Edit".

Now expand the Encryption configuration section. Check the "Enable helpers for encryption in transit" box and you’ll notice that an Encrypt button is now present next to the DSN environment variable.

When you click "Encrypt", a modal will appear where you can select your KMS key created in the previous section. If you expand Decrypt secrets snippet, you’ll also be shown the code you can use to pull in the encrypted value in and decrypt it for use in your code. We’ll be adding this into the Lambda function. Select your KMS key and click "Encrypt".

The value for the DSN environment variable should have updated to an encrypted value. Click "Save".

Now if you try to test the code again, it should fail since the code doesn't know what to do with the encrypted connection string. Notice how the error is specifically around how the MySQL driver can’t figure out how to connect to the PlanetScale database.

To fix this, open main.go again on your computer and update the first half of the file (up through GetDatabase()) to look like the following. The imports will be updated, the init() function will be added, and the GetDatabase() function will be updated to reflect the DSN variable which holds the decrypted connection string.package main

import (
	"database/sql"
	"encoding/json"
	"log"
	"os"
	"encoding/base64"
	"github.com/aws/aws-lambda-go/lambda"
	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/kms"
	_ "github.com/go-sql-driver/mysql"
)

// Set up variables to be used with the encrypted connection string.
var functionName string = os.Getenv("AWS_LAMBDA_FUNCTION_NAME")
var encrypted string = os.Getenv("DSN")
var DSN string

// The init function will run first, decrypting DNS into the above variable.
func init() {
	kmsClient := kms.New(session.New())
	decodedBytes, err := base64.StdEncoding.DecodeString(encrypted)
	if err != nil {
		panic(err)
	}

	input := &kms.DecryptInput{
		CiphertextBlob: decodedBytes,
		EncryptionContext: aws.StringMap(map[string]string{
			"LambdaFunctionName": functionName,
		}),
	}

	response, err := kmsClient.Decrypt(input)
	if err != nil {
		panic(err)
	}
	DSN = string(response.Plaintext[:])
}

// The Recipe model will hold the data for a record pulled from the database.
type Recipe struct {
	Id            int
	Name          string
	EstTimeToMake int
	Description   string
}

// Sets up the connection to the PlanetScale database.
func GetDatabase() (*sql.DB, error) {
	db, err := sql.Open("mysql", DSN) // ← Update the second parameter here
	return db, err
}

// remainder of the code...

Now follow the process from the previous section to build the project, zip it up, and upload it into AWS. Once you do so, test the function again in AWS and it should return data successfully.

Conclusion
If you’ve followed along, you should have a good understanding on how KMS can be used to encrypt sensitive info within an application build on AWS. This is a much more secure way to store connection strings so that even if your AWS account is compromised, unauthorized users would not be able to access your PlanetScale database. While the examples here used Go, the same principles apply to any application, regardless of the language.]]></content>
        <summary><![CDATA[Learn how to encrypt your connection strings so that not even AWS can access them.]]></summary>
      </entry>
    
      <entry>
        <title>All of the tech PlanetScale Vitess replaces</title>
        <link href="https://planetscale.com/blog/all-the-tech-planetscale-replaces" />
        <id>https://planetscale.com/blog/all-the-tech-planetscale-replaces</id>
        <published>2022-11-30T13:00:00.000Z</published>
        <updated>2022-11-30T13:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[On the surface, you might look at PlanetScale's Vitess product as a drop-in replacement for your database, and you wouldn't be wrong. Our primary focus is providing a simple and cost-effective way to create a performant MySQL configuration so you can focus on writing code instead of also wearing the DBA hat. But our service goes well beyond a traditional MySQL server. In this article, we'll take a look at all the individual software and infrastructure components that PlanetScale can replace.
MySQL with Vitess
Let's start with the most obvious one: MySQL. If you've ever had to manage a MySQL environment, you probably know it's not a trivial task. If you were to set up a performant and reliable MySQL environment in a data center, you'd have to consider far more than just a server running the service. For starters, you'd have to know how to set up a MySQL cluster with multiple physical or virtual servers to address inevitable outages. You'd also need to understand the right kind of storage to hold the databases so they aren't limited by the IOPs of the underlying disks. Considering the servers also need to talk to the outside world, as well as each other, you can't skimp on networking gear either.
Crafting and maintaining a database environment for a production service is no joke. PlanetScale's Vitess clusters provide an environment that automatically configures the compute, storage, and networking, as well as the necessary components to handle failovers in case a node goes offline within our environment. On top of the orchestration provided by Vitess, the environment automatically load balances connections to provide a single endpoint for applications to connect. The same component that routes queries within the environment also manages connection pooling, eliminating the need for tools like ProxySQL.

Online schema changes
One of the most challenging things to manage in the development cycle is syncing the schema between two databases. A common scenario where this might need to occur is when working on separate environments within the same application. Given the example of a development and production environment running side by side, developers will often make changes to the database schema in a development environment to coincide with the new features they are building. When the time comes to go live, there needs to be a way to apply the changes from the development database schema to the production schema without bringing the entire application down.
You might explore tools like gh-ost or pt-online-schema-change to address this need, but PlanetScale handles this automatically with our Branching and Deploy request workflow. Every database can have one or more branches, which would be used in place of the development environment in the scenario above where separate databases are used. Developers would make any necessary schema changes to that branch during the development phase.

When they want to release the changes, a deploy request would be created that would let anyone on the team review the changes that will be applied to the main production database, very similar to the way pull requests work in a git environment. PlanetScale also runs the changes through an automated analysis system to inform developers whether or not the schema changes may be destructive or affect the data in any way.

Once the changes are approved, the schema changes will be applied to the production branch of the database with zero downtime. Using the branching system, PlanetScale also has built-in change tracking for your database schema, providing a method to audit and track what changes are made to the database and by whom.
The branching feature also provides a method to quickly rollback schema changes. In a situation where changes are deployed, but your application starts experiencing unforeseen issues as a result, you'll have a 30-minute window to quickly revert the changes without affecting the data that was written to the database while the changes were active. This can act as an emergency backout to quickly get you back up and running while further debugging needs to happen.
Monitoring and metrics
Managing the infrastructure is an excellent first step towards a performant MySQL environment, but poorly written queries and missing or misconfigured indexes can still degrade any well-architected setup. Typically, you'd need to install third-party tools like SolarWinds Database Performance Monitor (formerly VividCortex) to keep an eye on how well the queries are performing and report any potential issues so you can decide how to optimize queries.
PlanetScale has this functionality built-in for all databases on our platform with the Insights feature, regardless of the tier you are on. Insights provides a way to analyze slow-running queries so you can make intelligent decisions on how to optimize those queries or apply the right indexes. Execution data is retained for 7 days and is used to track how many times a query was run, how long it took, and how many rows were both read and returned. All of this data is then provided in a concise, filterable, and sortable table with an accompanying chart that lets you select the specific timeframe you want to analyze.

Backup and restore
Backups are critical to the operation of any database in case data gets mistakenly deleted or updated. While many platforms require you to manually configure backups, database backups in PlanetScale are automatically configured for every database. When you create a database, each branch of that database will be backed up once every 12 hours for the Base plan, and backups are retained for 2 days. These are simply the defaults though; you are welcome to create any additional backup schedules you desire and are only charged for the cost of the data stored. The data is stored right in our system so you won't need to configure external storage like S3 to retain your backups.
When you need to restore, you'll be presented with a list of the available restore points for that database, which can be restored into a dedicated branch, giving you the option to review the data and test your application with it before fully moving to the restored version of your database.

Conclusion
Reiterating what was stated earlier, building a database system (regardless of the engine) that is performant, scalable, and feature-rich goes well beyond installing the software on a computer. You'd need a great deal of other tooling on top of the database itself, which also means additional resources to manage and maintain said tooling. PlanetScale aims to provide the simplest way to spin up a complex MySQL environment, complete with bells and whistles, without you having to worry about the overhead that comes along with it.]]></content>
        <summary><![CDATA[PlanetScale Vitess is more than just a drop-in replacement for MySQL. Learn about everything PlanetScale can do for you.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale and HIPAA</title>
        <link href="https://planetscale.com/blog/planetscale-and-hipaa" />
        <id>https://planetscale.com/blog/planetscale-and-hipaa</id>
        <published>2022-11-18T17:03:57.138Z</published>
        <updated>2022-11-18T17:03:57.138Z</updated>
        
        <author>
          <name>Sam Kottler</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[PlanetScale is committed to supporting the needs of our customers in every industry. We are happy to announce that customers on an Enterprise plan can now request Business Associate Agreements (BAAs) with PlanetScale.
What is HIPAA
The Health Insurance Portability and Accountability Act (HIPAA) is a federal law that, among other things, sets privacy and security standards to protect sensitive patient health information. Under HIPAA, the use and disclosure of Protected Health Information (PHI) is governed by the HIPAA Security Rule, Privacy Rule, and Breach Notification Rule.
There is no formal certification recognized by the Department of Health and Human Services for HIPAA. PlanetScale has worked to ensure that our security and privacy measures meet HIPAA requirements.
Business Associate Agreements with PlanetScale
PlanetScale can now enter into Business Associate Agreements (BAAs) with customers on any plan.
Refer to the PlanetScale Security and compliance documentation for additional information about how PlanetScale can help support customers with requirements around BAAs. To start a conversation about the PlanetScale Enterprise, reach out to the PlanetScale Sales team.]]></content>
        <summary><![CDATA[PlanetScale can now enter into Business Associate Agreements (BAA) with customers on an Enterprise plan.]]></summary>
      </entry>
    
      <entry>
        <title>One million connections</title>
        <link href="https://planetscale.com/blog/one-million-connections" />
        <id>https://planetscale.com/blog/one-million-connections</id>
        <published>2022-11-01T00:43:00.000Z</published>
        <updated>2022-11-01T00:43:00.000Z</updated>
        
        <author>
          <name>Liz van Dijk</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[A powerful feature of serverless computing architecture is the ability to design and horizontally scale individual components of your stack, allowing more computing resources to be allocated on the fly to be used only when you need them. Databases are an essential part of the serverless stack, but people using serverless functions are finding an all too common problem with databases: connection limits. Users are forced to use proxies such as pgbouncer to handle even small workloads. With MySQL, the dreaded “Too many connections” error rings all too familiar for developers who are seeing a sudden surge in users.
A standalone database relies heavily on its ability to compartmentalize memory use to provide the strong isolation guarantees we expect, so it needs to allocate certain memory buffers on a per-connection basis. The more connections we create, the less memory we have available for the overall buffer pool, and so MySQL comes with a max_connections variable built in that acts as a “last resort” safety measure. This setting stops new connections from being established after the configured point, and it’s critical to avoid situations like an unexpected Denial of Service attack causing a memory-related outage on the database level. While it may seem harmless to raise this variable at first (you may not be approaching the instance memory limits quite yet), making MySQL live outside its means (i.e. overcommitting memory) opens the door to dangerous crashes and potential downtime, so this is not recommended.
Application connection pools
Now, establishing and cleaning up connections also takes time and computing resources, so many development frameworks offer built-in functionality like connection pools. Connection pools allow a bulk amount of connections to be established up front and for the application to queue up its database requests on that side. While that works really well as both a performance optimization and safety feature for the database side, application-side connection pools become a similarly challenging area when trying to scale a serverless stack.
PlanetScale connection pooling
To both safeguard and optimize connection management for MySQL, Vitess and PlanetScale offer connection pooling on the VTTablet level. This scales alongside your cluster, and also allows for connection requests to be queued up there when a sudden application scale-up starts sending queries from a very large amount of horizontally spawned processes. This keeps the underlying MySQL processes safe from a memory management standpoint, and allows you to keep adding workers as needed to scale the application.
In addition to that, PlanetScale’s Global Routing Infrastructure provides another horizontally scalable layer of connectivity, which we put to the test recently to help us prepare for the broader rollout of our serverless driver.

One million MySQL connections
This combination of Vitess connection pooling and the PlanetScale Global Routing Infrastructure enables us to maintain nearly limitless connections. We decided to put this to the test by running one million active connections on a PlanetScale database. Keeping with our “One million” blog theme, the target itself should be considered no more than an arbitrary number. After all, our architecture is designed to keep scaling horizontally beyond that point. However, it’s high enough to serve most of our users’ needs and illustrates the capabilities of the architecture.
To isolate our connection layer, we devised a test environment that makes use of AWS Lambda, using a fan-out pattern to run a simple Go executable that uses the go-sql-driver to establish a number of parallel connections.
Interesting fact we ran into here: by default, the Lambda runtime environment has a hard open_files limit of 1024 and a function concurrency limit of 1000. As such, we configured our test to run a total of exactly 1000 “worker functions”, each of which established exactly 1000 connections, so we could stay within the Lambda runtime limits.
Each worker loop executes the following:
Opens a new connection to MySQL
Once established, sends a simple query to verify we can talk to the underlying database.
Waits for the other loops to finish creating their connections.
Once we reach the desired concurrency, all workers are instructed to wait an extra few minutes with an open connection before closing out, so we can easily observe the stable parallelism in monitoring. We were able to scale up to maintain a total of one million open connections in under two minutes.

The PlanetScale Global Routing Infrastructure is ready for your serverless function workloads. Sign up to try it out yourself, or reach out to talk to us if you’d like to learn how to make our scalability work for your application.]]></content>
        <summary><![CDATA[Learn how to use PlanetScale to safely include your database in your serverless functions without hitting connection limits in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL Integers: INT BIGINT and more</title>
        <link href="https://planetscale.com/blog/mysql-data-types-integers" />
        <id>https://planetscale.com/blog/mysql-data-types-integers</id>
        <published>2022-10-31T15:00:00.000Z</published>
        <updated>2022-10-31T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MySQL has a number of integer types, and while INT may seem like the right choice for most scenarios, it’s worth understanding what options you have so you can make the right choice when designing your database. In this article, we’ll take a look at the various integer types and take a deeper dive into how they are stored in MySQL.
An overview of the MySQL INT type
An integer is simply a whole number. It can be positive, negative, or even zero. In MySQL, there are actually several different data types you can use to store integers, each with its own range of numbers. The standards INT type can store up to 4,294,967,296 values including 0, and MySQL permits negative numbers by default unless otherwise specified. Defining an INT column looks like this in a CREATE TABLE statement:CREATE TABLE my_table (
	my_integer_col INT
);

Since MySQL defaults to allowing both negative and positive numbers, my_integer_col would be able to store whole numbers from -2,147,483,648 to 2,147,483,647. Without deeper computer science knowledge, these VERY specific numbers may appear strange. It all has to do with how binary works and how the data is stored by the database engine.
MySQL INT types and the binary system
Most of the modern world uses a base-10 number system, which means there are 10 possible values (0-9) available for a single position in a given number. Once a position reaches the maximum allowed value, the number will roll over and add another position to indicate that the value has increased.
Base-10 (decimal)
8
9
10 *
11
12
* Position roll-over
This probably feels like common sense. After all, you likely started learning this system from an early age. There are, however, different number systems. Binary is one of those number systems, and it is the one most commonly used by computers.
Binary is a base-2 number system. There are only two possible values available to a given position: 0 or 1. Regardless of the available values, the positions will still roll over once they have reached the maximum allowed value. This can make binary numbers look incredibly foreign. Notice in the image below how the positions roll over every other number for binary, as opposed to every 10 numbers in decimal.
Base-2 (binary)
Base-10 (decimal)
0
0
1
1
10 *
2
11
3
100 *
4
101
5
110 *
6
111
7
1000 *
8
1001
9
1010 *
10 *
* Position roll-over
So what’s all this got to do with integers in MySQL? The INT data type is a signed, 32-bit value. The “positions” are referred to as bits. This means there are 32 positions available for a 1 or 0 to be placed, with the left-most bit being used to represent if the number is positive or negative. Here is what an integer would look like to MySQL under the hood:32-bit binary:     00000000000000000000010001110101
Decimal:           1141

Different integer data types
There are more integer types than just INT, four more to be exact. The table below shows the available integer types, as well as their approximate ranges.
Type
Length (in bits)
Minimum Value (signed)
Maximum Value (signed)
BIGINT
64
-2 63
2 63 - 1
INT
32
-2,147,483,648
2,147,483,647
MEDIUMINT
24
-8,388,608
8,388,607
SMALLINT
16
-32,768
32,767
TINYINT
8
-128
127
Notice how the TINYINT range is -128 to 127. This is because 0 is included in the upper half of the range with signed integer values.
Signed vs unsigned integers
In the section about binary, I noted that the leftmost bit is used to determine if the value is positive or negative. MySQL actually allows you to modify this behavior and include that bit in the stored value. This permits you to store much larger numbers in a given column, with the tradeoff that no negative numbers can be stored. Using the UNSIGNED keyword when creating a column, you can tell MySQL that the values should all be positive:CREATE TABLE my_table (
	my_integer_col INT UNSIGNED
);

Below is the same table from the previous section, but updated to show the maximum value available to each integer type. The minimum value is omitted as it will always be 0:
Type
Maximum Value (unsigned)
BIGINT
2 64 - 1
INT
4,294,967,295
MEDIUMINT
16,777,215
SMALLINT
65,535
TINYINT
255
A note about integer width
Earlier in this article, I showed how you would create an INT column using the following syntax:CREATE TABLE my_table (
	my_integer_col INT
);

There may be times you come across an integer column defined with a width as INT(5), with 5 being the width of the column. If you are familiar with other types such as VARCHAR, you may assume that this will change the allowed range or number of characters that MySQL will let you store in that column, but this is not the case with integers. When used with ZEROFILL keyword, MySQL will automatically left-pad the value with zeroes up to the defined width.
Here are some examples on how MySQL both stores and returns values based on the column definition:
Column definition
Stored value
Returned value
INT
123
123
INT(5)
123
123
INT(5) ZEROFILL
123
00123
INT(5) ZEROFILL
123456
123456
As a best practice, I’d suggest avoiding using integer columns in this manner since it only affects the displayed value once the data is returned, and that kind of logic is best left up to the application using MySQL.
When to use the different MySQL integer types
With integer types, it mostly comes down to the following two questions:
Do you need to store negative numbers?
If your answer is yes, then you'd want to use a signed version of any of the integer types described in this article, otherwise opt for the UNSIGNED variant as you automatically get to use higher numbers.
How large are the numbers you need to store?
Since the integer types all have a hard cap on the maximum value you can store, this will help determine which you should use when defining your schema. It’s also worth noting that larger integer types consume more disk space, so keep this in mind when considering which type to use.
Further learning
If you'd like to learn more about data types in MySQL, we have an article on the JSON data type and one on the VARCHAR data type that you may find useful.
We also have short videos on the following data types:
Integers
Decimals
Strings
Binary Strings
Long Strings
Enums
Dates
JSON]]></content>
        <summary><![CDATA[Gain a deeper understanding of the MySQL integer types by exploring the different options (INT BIGINT MEDIUMINT etc) and how they are stored.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 15</title>
        <link href="https://planetscale.com/blog/announcing-vitess-15" />
        <id>https://planetscale.com/blog/announcing-vitess-15</id>
        <published>2022-10-26T15:50:00.000Z</published>
        <updated>2022-10-26T15:50:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Vitess 15 is now generally available, with a number of new enhancements designed to make Vitess easier to use, more resilient, and easier to scale!
VTOrc release
VTOrc, a Vitess-native cluster monitoring and recovery component, is now GA. VTOrc monitors and repairs Vitess clusters, eliminating paging and manual intervention while automating recovery. This makes Vitess fully self-healing and resilient to MySQL server failures. It also replaces the third-party integration with Orchestrator that users have traditionally relied on to recover from MySQL server failures. Users deploying VTOrc benefit by reducing both the operational burden of running Vitess and the amount of downtime they experience.
VTAdmin release
VTAdmin, the next generation of cluster management APIs and UIs for Vitess, is also now GA. Unlike the previous UI, which was available for use only on a per-cluster basis, VTAdmin provides a single control plane to manage multiple Vitess clusters (e.g., development versus production). This makes it much easier for users to monitor and administer their Vitess deployments and significantly reduces the amount of time they spend doing so. VTAdmin is built on a modern UI technology stack that will allow us to maintain a much richer Web UI as new features and functions are added.
Old UI:


New UI:


VEP-4 progress
In addition to these major streams of work, we have made tremendous progress on VEP-4, aka The Flag Situation, reorganizing our code so that Vitess binaries and their flags are clearly aligned in help text. This is an immediate win for usability and positions us well to move on to a viper implementation, which will facilitate additional improvements including standardization of flag syntax and runtime configuration reloads. Furthermore, we are aligning with industry standards regarding the use of flags, ensuring a seamless experience for users migrating from or integrating with other platforms.
VDiff v2 update
We are also pleased to announce that VDiff v2, used to check that data migrations have been conducted successfully, is now feature complete. While previous versions were time and memory-intensive and required that users start over if they failed, VDiff v2 distributes work so that memory issues are all but eliminated. It is also resumable, allowing users to start where they left off if there is an interruption for any reason. Error reporting improvements help ensure that users know what needs to be fixed before resuming the process. VDiff v2 greatly enhances usability, and we expect it to be GA in the next release.
MySQL compatibility and performance
We continue to make improvements in the areas of MySQL compatibility and performance. For instance, we now produce more efficient query plans for subqueries and derived tables. We have also improved our benchmarking infrastructure, arewefastyet, to make it easier to add new benchmarks.
Try it out
We are very pleased with the great strides we have made with v15 and hope that you will be as well. We encourage all current users of Vitess and everyone who has been considering it to try this new release! Additionally, we have released a new version of the Operator, v2.8.0, that works with Vitess v15, and we invite you to read its release notes as well. We look forward to your feedback, which can be provided via GitHub or Slack.]]></content>
        <summary><![CDATA[Vitess 15 is now generally available with updates to VTOrc, VTAdmin, MySQL compatibility, and more.]]></summary>
      </entry>
    
      <entry>
        <title>What is Vitess: resiliency, scalability, and performance</title>
        <link href="https://planetscale.com/blog/what-is-vitess" />
        <id>https://planetscale.com/blog/what-is-vitess</id>
        <published>2022-10-21T15:00:00.000Z</published>
        <updated>2022-10-21T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Overview
In today’s fast-paced development landscape, building software that is fast, scalable, and that resists outages can make or break the success of any application. Containerized systems with orchestration layers like Kubernetes enable this robustness at the software level. Implementing these protections into your database, however, can often require an entire team of developers and database administrators dedicated to managing multiple database replicas with custom sharding logic to protect from outages and enable high availability.
PlanetScale is able to offer these features by using Vitess to power all of the databases on our platform. In this article, I’ll explain what Vitess is, how it works, and why you should care.
What is Vitess?
Vitess is an open-source, database clustering system for MySQL. At its core, it is a collection of systems that work together to enable MySQL to be more resilient, scalable, and performant. It was originally built by the team at YouTube in 2010 to address the increasing database scaling demands required by the platform. Today, it continues to scale massive companies like GitHub and Slack. The project is very actively maintained, with contributions from PlanetScale, Google, GitHub, Slack, Square, Stripe, and several more data-heavy companies.
Resiliency. Scalability. Performance. Read any modern database platform’s whitepapers and you’ll likely notice a lot of the same buzzwords, but let’s break down how Vitess ACTUALLY delivers these benefits.
Get a crash course in setting up, deploying, and managing Vitess in our Vitess course. 
How Vitess delivers resiliency
At the heart of it, MySQL is an application just like any other. It is definitely more specialized than most others, but still has some of the same attributes. One of the best ways to increase the resiliency of any application is to add more instances of it. This way, if one goes down, the others can pick up the slack.
Vitess does this by running multiple instances of MySQL (on one or more servers) and uses a lightweight proxy, known as VTGate, to intelligently route queries to the proper MySQL instance. Vitess can also automatically detect when a MySQL instance goes offline and determine the best candidate to take its place as the primary MySQL process to serve queries for a given table.
Scalability with Vitess
Vitess allows you to scale massive MySQL databases via horizontal sharding with minimal application changes. It can split tables up across multiple MySQL instances to balance the load across multiple servers. When a query is received by the VTGate, the system will automatically determine which MySQL instances a row or set of rows lives in, will adjust the query to simultaneously grab the rows from these instances, and return the data just as if you were querying data from a single database. All of this is completely transparent to the developer — and perhaps more importantly, the user!
Improved performance with Vitess
The points made in the previous two sections alone would massively increase the performance of MySQL simply by balancing the load across multiple servers, but Vitess has a few other enhancements built in to squeeze out as much performance as possible. One of those enhancements is the way that Vitess manages connections between the different subsystems.
The various Vitess components are written with Go and internally communicate with one another over gRPC. With the concurrency features built into the Go language, Vitess is able to easily handle thousands of clients simultaneously. Every client (GUI, application, etc) that connects to a Vitess instance establishes a lightweight connection to the VTGate instead of MySQL directly. VTGate understands the MySQL protocol and performs that intelligent query routing mentioned earlier based on the current Vitess infrastructure. To avoid creating too many connections, each instance of MySQL has an associated process called the VTTablet, to which VTGate sends the query.
Vitess takes the lightweight connections established by each client to VTGate and maps them to a smaller pool of MySQL connections managed by VTTablet. This process in turn helps to avoid overloading the individual MySQL processes, resulting in lower resource utilization since only VTTablet needs to connect to the underlying MySQL process.
Vitess made easy with PlanetScale
PlanetScale prides itself in being the only MySQL-compatible database that both scales and increases developer velocity, and Vitess is at the very center of it. Every single database created through PlanetScale spins up all of this infrastructure, with all the aforementioned benefits, in mere seconds for you to start building on. The end result is that developers who build on our platform get a MySQL database that truly has the capabilities to resist outages and scale to any size, without having to worry about managing the underlying infrastructure.]]></content>
        <summary><![CDATA[Learn what Vitess is, how it works, and how it can improve your database‘s resilience, scalability, and performance.]]></summary>
      </entry>
    
      <entry>
        <title>Laravel’s safety mechanisms</title>
        <link href="https://planetscale.com/blog/laravels-safety-mechanisms" />
        <id>https://planetscale.com/blog/laravels-safety-mechanisms</id>
        <published>2022-10-19T00:03:57.138Z</published>
        <updated>2022-10-19T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Laravel is a mature PHP web application framework with built-in support for almost everything modern applications need. But we’re not going to cover all of those features here! Instead, we’ll look at a topic that doesn’t get talked about nearly enough: Laravel’s many safety features that can help prevent painful mistakes.
We’ll take a look at the following safety mechanisms:
N+1 prevention
Partially hydrated model protection
Attribute typos and renamed columns
Mass assignment protection
Model strictness
Polymorphic mapping enforcement
Long-running event monitoring
Each of these protections is configurable, and we’ll recommend how and when to configure them.
N+1 prevention
Many ORMs, Eloquent included, offer a “feature“ that allows you to lazy load a model’s relationship. Lazy loading is convenient because you don’t have to think upfront about which relationships to select from the database, but it often leads to a performance nightmare known as the “N+1 problem.”
The N+1 problem is one of the most common problems people run into when using an ORM, and it’s often a reason people cite for avoiding ORMs altogether. That’s a bit of an overcorrection, as we can simply disable lazy loading altogether!
Imagine a naive listing of blog posts. We’ll show the blog’s title and the author’s name.$posts = Post::all();

foreach($posts as $post) {
    // `author` is lazy loaded.
    echo $post->title . ' - ' . $post->author->name;
}

This is an example of the N+1 problem! The first line selects all of the blog posts. Then, for every single post, we run another query to get the post’s author.SELECT * FROM posts;
SELECT * FROM users WHERE user_id = 1;
SELECT * FROM users WHERE user_id = 2;
SELECT * FROM users WHERE user_id = 3;
SELECT * FROM users WHERE user_id = 4;
SELECT * FROM users WHERE user_id = 5;

The “N+1“ notation comes from the fact that an additional query is run for each of the n-many records returned by the first query. One initial query plus n-many more. N+1.
Even though each individual query is probably quite fast, in aggregate, you can see a huge performance penalty. And because each individual query is fast, this isn’t something that would show up in your slow query log!
With Laravel, you can use the preventLazyLoading method on the Model class to disable lazy loading altogether. Problem solved! Truly, it is that simple.
You can add the method in your AppServiceProvider:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    Model::preventLazyLoading();
}

Every attempt to lazy load a relationship will now throw a LazyLoadingViolationException exception. Instead of lazy loading, you’ll need to explicitly eager load your relationships.// Eager load the `author` relationship.
$posts = Post::with('author')->get();

foreach($posts as $post) {
    // `author` is already loaded.
    echo $post->title . ' - ' . $post->author->name;
}

Lazy loading relationships does not affect the correctness of your application, merely the performance of it. Ideally, all the relations you need are eager loaded, but if not, it simply falls through and lazy loads the required relationships.
For that reason, we recommend disallowing lazy loading in every environment except production. Hopefully, all lazy loads will be caught in local development or testing, but in the rare case that a lazy load makes its way into production, your app will continue to work just fine, if a bit slower.
To prevent lazy loading in non-production environments, you can add this to your AppServiceProvider:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // Prevent lazy loading, but only when the app is not in production.
    Model::preventLazyLoading(!$this->app->isProduction());
}

If you want to log errant lazy loading in production, you can register your own lazy load violation handler using the static handleLazyLoadingViolationUsing method on the Model class.
In the example below, we will disallow lazy loading in every environment, but in production, we log the violation rather than throwing an exception. This ensures that our application continues to work as intended, but we can go back and fix our lazy load mistakes.use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // Prevent lazy loading always.
    Model::preventLazyLoading();

    // But in production, log the violation instead of throwing an exception.
    if ($this->app->isProduction()) {
        Model::handleLazyLoadingViolationUsing(function ($model, $relation) {
            $class = get_class($model);

            info("Attempted to lazy load [{$relation}] on model [{$class}].");
        });
    }
}

Partially hydrated model protection
In almost every book about SQL, one of the performance recommendations that you’ll see is to “select only the columns that you need.” It’s good advice! You only want the database to fetch and return the data that you’re actually going to use because everything else is simply discarded.
Until recently, this has been a tricky (and sometimes dangerous!) recommendation to follow in Laravel.
Laravel’s Eloquent models are an implementation of the active record pattern, where each instance of a model is backed by a row in the database.
To retrieve the user with an ID of 1, you can use Eloquent’s User::find() method, which runs the following SQL query:SELECT * FROM users WHERE id = 1;

Your model will be fully hydrated, meaning that every column from the database will be present in the in-memory model representation:$user = User::find(1);
// -> SELECT * FROM users where id = 1;

// Fully hydrated model, every column is present as an attribute.

// App\User {#5522
//   id: 1,
//   name: "Aaron",
//   email: "aaron@example.com",
//   is_admin: 0,
//   is_blocked: 0,
//   created_at: "1989-02-14 08:43:00",
//   updated_at: "2022-10-19 12:45:12",
// }

Selecting all of the columns, in this case, is probably fine! But if your users table is extremely wide, has LONGTEXT or BLOB columns, or you’re selecting hundreds or thousands of rows, you probably want to limit the columns to just the ones you plan on using. (Watch our schema videos to learn more about the LONGTEXT and BLOB columns and why you should avoid selecting them if you don't need them.)
You can control which columns are selected using the select method, which leads to a partially hydrated model. The in-memory model contains a subset of attributes from the row in the database.$user = User::select('id', 'name')->find(1);
// -> SELECT id, name FROM users where id = 1;

// Partially hydrated model, only some attributes are present.
// App\User {
//   id: 1,
//   name: "Aaron",
// }

Here’s where things get dangerous.
If you access an attribute that was not selected from the database, Laravel simply returns null. Your code will think an attribute is null, but really it just wasn’t selected from the database. It might not be null at all!
In the following example, a model is partially hydrated with only id and name, then the is_blocked attribute is accessed further down. Because is_blocked was never selected from the database, the attribute’s value will always be null, treating every blocked user as if they aren’t blocked.// Partially hydrate a model.
$user = User::select('id', 'name')->find(1);

// is_blocked was not selected! It will always be `null`.
if ($user->is_blocked) {
    throw new \Illuminate\Auth\Access\AuthorizationException;
}

This exact example probably (probably) wouldn’t happen, but when data retrieval and usage are spread across multiple files, something like this will happen. There is no warning anywhere that a model is partially hydrated, and as requirements evolve, you may end up accessing attributes that were never loaded.
With extreme care and 100% test coverage, you might be able to prevent this from ever happening, but it’s still a loaded gun pointed straight at your foot. For that reason, we’ve recommended never modifying the SELECT statement that populates an Eloquent model.
Until now!
The release of Laravel 9.35.0 brings us a new safety feature to prevent this from happening.
In 9.35.0 you can call Model::preventAccessingMissingAttributes() to prevent accessing attributes that were not loaded from the database. Instead of returning null, an exception will be thrown, and everything will grind to a halt. This is a very good thing.
You can enable this new behavior by adding this to your AppServiceProvider:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    Model::preventAccessingMissingAttributes();
}

Notice that we enabled this protection across the board, regardless of environment! You could enable this protection only in local development, but the most important place for it to be enabled is production.
Unlike N+1 protection, preventing access to missing attributes is not a performance issue, it’s an application correctness issue. Enabling it prevents your application from behaving in unexpected and incorrect ways.
Accessing attributes that weren’t selected could lead to all sorts of catastrophic behavior:
Data loss
Overwriting data
Treating free users as paid
Treating paid users as free
Sending factually incorrect emails
Sending the same email dozens of times
The list goes on and on.
While throwing exceptions in production is inconvenient, it’s much worse to have silent failures that could lead to data corruption. Better to face the exceptions and fix them.
Attribute typos and renamed columns
This is a continuation of the previous section and another plea to turn on Model::preventAccessingMissingAttributes() in your production environments.
We just spent a long time looking at how preventAccessingMissingAttributes() protects you from partially hydrated models, but there are two other scenarios where this method can protect you!
The first is typos.
Continuing with the is_blocked scenario from above, if you accidentally misspell “blocked,” Laravel will just return null instead of letting you know about your mistake.// Fully hydrated model.
$user = User::find(1);

// Oops! Spelled "blocked" wrong. Everyone gets through!
if ($user->is_blokced) {
    throw new \Illuminate\Auth\Access\AuthorizationException;
}

This particular example would likely be caught in testing, but why risk it?
The second scenario is renamed columns. If your column started out named blocked and then later you decide it makes more sense for it to be named is_blocked, you’d need to make sure to go back through your code and update every reference to blocked. And if you miss one? It just becomes null.// Fully hydrated model.
$user = User::find(1);

// Oops! Used the old name. Everyone gets through!
if ($user->blocked) {
    throw new \Illuminate\Auth\Access\AuthorizationException;
}

Turning on Model::preventAccessingMissingAttributes() would turn this silent failure into an explicit one.
Mass assignment protection
A mass assignment is a vulnerability that allows users to set attributes that they shouldn’t be allowed to set.
For example, if you have an is_admin property, you don’t want users to be able to arbitrarily upgrade themselves to an admin! Laravel prevents this by default, requiring you to explicitly allow attributes to be mass assigned.
In this example, the only attributes that can be mass assigned are name and email.class User extends Model
{
    protected $fillable = [
        'name',
        'email',
    ];
}

It doesn’t matter how many attributes you pass in when creating or saving the model. Only name and email will get saved:// It doesn’t matter what the user passed in, only `name`
// and `email` are updated. `is_admin` is discarded.
User::find(1)->update([
    'name' => 'Aaron',
    'email' => 'aaron@example.com',
    'is_admin' => true
]);

Many Laravel developers opt to turn off mass assignment protection altogether and rely on request validation to exclude attributes. That’s totally reasonable! You just need to ensure you never pass $request->all() into your model persistence methods.
You can add this to your AppServiceProvider to turn off mass assignment protection altogether.use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // No mass assignment protection at all.
    Model::unguard();
}

Remember: you’re taking a risk when you unguard your models! Be sure to never blindly pass in all of the request data.// Only update `name` and `email`.
User::find(1)->update($request->only(['name', 'email']));

If you decide to keep mass assignment protection on, there is one other method that you’ll find helpful: the Model::preventSilentlyDiscardingAttributes() method.
In the case where your fillable attributes are only name and email, and you try to update birthday, then birthday will be silently discarded with no warning.// We’re trying to update `birthday`, but it won’t persist!
User::find(1)->update([
    'name' => 'Aaron',
    'email' => 'aaron@example.com',
    'birthday' => '1989-02-14'
]);

The birthday attribute gets thrown away because it’s not fillable. This is mass assignment protection in action, and it’s what we want! It’s just a little bit confusing because it’s silent instead of explicit.
Laravel now provides a way to make that silent error explicit:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // Warn us when we try to set an unfillable property.
    Model::preventSilentlyDiscardingAttributes();
}

Instead of silently discarding the attributes, a MassAssignmentException will be thrown, and you’ll immediately know what’s happening.
This protection is very similar to the preventAccessingMissingAttributes protection. It is primarily about application correctness versus application performance. If you’re expecting that data is saved, but it is not saved, that’s an exception and should never be silently ignored, regardless of environment.
For that reason, we recommend keeping this protection on in all environments!use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // Warn us when we try to set an unfillable property,
    // in every environment!
    Model::preventSilentlyDiscardingAttributes();
}

Model strictness
Laravel 9.35.0 provides a helper method called Model::shouldBeStrict() that controls the three Eloquent “strictness” settings:
Model::preventLazyLoading()
Model::preventSilentlyDiscardingAttributes()
Model::preventsAccessingMissingAttributes()
The idea here is that you could put the shouldBeStrict() call in your AppServiceProvider and turn all three settings on or off with one method call. Let’s quickly recap our recommendations for each setting:
preventLazyLoading: Primarily for application performance. Off for production, on locally. (Unless you’re logging violations in production.)
preventSilentlyDiscardingAttributes: Primarily for application correctness. On everywhere.
preventsAccessingMissingAttributes: Primarily for application correctness. On everywhere.
Considering this, if you’re planning on logging lazy loading violations in production, you could configure your AppServiceProvider like this:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // Everything strict, all the time.
    Model::shouldBeStrict();

    // In production, merely log lazy loading violations.
    if ($this->app->isProduction()) {
        Model::handleLazyLoadingViolationUsing(function ($model, $relation) {
            $class = get_class($model);

            info("Attempted to lazy load [{$relation}] on model [{$class}].");
        });
    }
}

If you're not planning on logging lazy load violations (which is a reasonable decision!), then you would configure your settings this way:use Illuminate\Database\Eloquent\Model;

public function boot()
{
    // As these are concerned with application correctness,
    // leave them enabled all the time.
    Model::preventAccessingMissingAttributes();
    Model::preventSilentlyDiscardingAttributes();

    // Since this is a performance concern only, don’t halt
    // production for violations.
    Model::preventLazyLoading(!$this->app->isProduction());
}

Polymorphic mapping enforcement
A polymorphic relationship is a special type of relationship that allows many types of parent models to share a single type of child model.
For example, a blog post and a user may both have images, and instead of creating a separate image model for each, you can create a polymorphic relationship. This lets you have a single Image model that serves both the Post and User models. In this example, the Image is the polymorphic relationship.
In the images table, you’ll see two columns that Laravel uses to locate the parent model: an imageable_type and an imageable_id column.
The imageable_type column stores the model type in the form of the fully qualified class name (FQCN), and the imageable_id is the model's primary key.mysql> select * from images;
+----+-------------+-----------------+------------------------------+
| id | imageable_id | imageable_type | url                          |
+----+-------------+-----------------+------------------------------+
|  1 |           1 | App\Post        | https://example.com/1001.jpg |
|  2 |           2 | App\Post        | https://example.com/1002.jpg |
|  3 |           3 | App\Post        | https://example.com/1003.jpg |
|  4 |       22001 | App\User        | https://example.com/1004.jpg |
|  5 |       22000 | App\User        | https://example.com/1005.jpg |
|  6 |       22002 | App\User        | https://example.com/1006.jpg |
|  7 |           4 | App\Post        | https://example.com/1007.jpg |
|  8 |           5 | App\Post        | https://example.com/1008.jpg |
|  9 |       22003 | App\User        | https://example.com/1009.jpg |
| 10 |       22004 | App\User        | https://example.com/1010.jpg |
+----+-------------+-----------------+------------------------------+

This is Laravel’s default behavior, but it’s not a good practice to store FQCNs in your database. Tying the data in your database to the particular class name is very brittle and can lead to unforeseen breakages if you ever refactor your classes.
To prevent this, Laravel gives us a way to control what values end up in the database with the Relation::morphMap method. Using this method, you can give every morphed class a unique key that never changes, even if the class name does change:use Illuminate\Database\Eloquent\Relations;

public function boot()
{
    Relation::morphMap([
        'user' => \App\User::class,
        'post' => \App\Post::class,
    ]);
}

Now we’ve broken the association between our class name and the data stored in the database. Instead of seeing \App\User in the database, we’ll see user. A good start!
We’re still exposed to one potential problem, though: this mapping is not required. We could create a new Comment model and forget to add it to the morphMap, and Laravel will default to the FQCN, leaving us with a bit of a mess.mysql> select * from images;
+----+-------------+-----------------+------------------------------+
| id | imageable_id | imageable_type | url                          |
+----+-------------+-----------------+------------------------------+
|  1 |           1 | post            | https://example.com/1001.jpg |
|  2 |           2 | post            | https://example.com/1002.jpg |
| .. |         ... | ....            |  . . . . . . . . . . . . . . |
| 10 |       22004 | user            | https://example.com/1010.jpg |
| 11 |          10 | App\Comment     | https://example.com/1011.jpg |
| 12 |          11 | App\Comment     | https://example.com/1012.jpg |
| 13 |          12 | App\Comment     | https://example.com/1013.jpg |
+----+-------------+-----------------+------------------------------+

Some of our imageable_type values are correctly decoupled, but because we forgot to map the App\Comment model to a key, the FQCN still ends up in the database!
Laravel has our back (again) by providing us a method to enforce that every morphed model is mapped. You can change your morphMap call to an enforceMorphMap call, and the fall-through-to-FQCN behavior is disabled.use Illuminate\Database\Eloquent\Relations;

public function boot()
{
    // Enforce a morph map instead of making it optional.
    Relation::enforceMorphMap([
        'user' => \App\User::class,
        'post' => \App\Post::class,
    ]);
}

Now, if you try to use a new morph that you haven’t mapped, you’ll be greeted with a ClassMorphViolationException, which you can fix before the bad data makes it to the database.
The most pernicious failures are the silent ones; it’s always better to have explicit failures!
Preventing stray HTTP requests
While testing your application, it’s common to fake outgoing requests to third parties so you can control the various testing scenarios and not spam your providers.
Laravel has offered us a way to do that for a long time by calling Http::fake(), which fakes all outgoing HTTP requests. Most often, though, you want to fake a specific request and provide a response:use Illuminate\Support\Facades\Http;

// Fake GitHub requests only.
Http::fake([
    'github.com/*' => Http::response(['user_id' => '1234'], 200)
]);

In this scenario, outgoing HTTP requests to any other domain will not be faked and will be sent out as regular HTTP requests. You may not notice this until you realize that specific tests are slow or you start hitting rate limits.
Laravel 9.12.0 introduced the preventStrayRequests method to protect you from making errant requests.use Illuminate\Support\Facades\Http;

// Don’t let any requests go out.
Http::preventStrayRequests();

// Fake GitHub requests only.
Http::fake([
    'github.com/*' => Http::response(['user_id' => '1234'], 200)
]);

// Not faked, so an exception is thrown.
Http::get('https://planetscale.com');

This is another good protection to always enable. If your tests need to reach external services, you should explicitly allow that. If you have a base test class, I recommend putting it in the setUp method of that base class:protected function setUp(): void
{
    parent::setUp();

    Http::preventStrayRequests();
}

In any tests where you need to allow non-mocked requests to go out, you can re-enable that by calling Http::allowStrayRequests() in that particular test.
Long-running event monitoring
These last few methods aren’t about preventing discrete, incorrect behaviors but rather monitoring the entire application. These methods can be helpful if you don’t have an application performance monitoring tool.
Long database queries
Laravel 9.18.0 introduced the DB::whenQueryingForLongerThan() method, which allows you to run a callback when cumulative runtime across all of your queries exceeds a certain threshold.use Illuminate\Support\Facades\DB;

public function boot()
{
    // Log a warning if we spend more than a total of 2000ms querying.
    DB::whenQueryingForLongerThan(2000, function (Connection $connection) {
        Log::warning("Database queries exceeded 2 seconds on {$connection->getName()}");
    });
}

If you want to run a callback when a single query takes a long time, you can do that with a DB::listen callback.use Illuminate\Support\Facades\DB;

public function boot()
{
    // Log a warning if we spend more than 1000ms on a single query.
    DB::listen(function ($query) {
        if ($query->time > 1000) {
            Log::warning("An individual database query exceeded 1 second.", [
                'sql' => $query->sql
            ]);
        }
    });
}

Again, these are helpful methods if you do not have an APM tool or a query monitoring tool like PlanetScale’s Query Insights.
Request and command lifecycle
Similar to long-running query monitoring, you can monitor when your request or command lifecycle takes longer than a certain threshold. Both of these methods are available beginning with Laravel 9.31.0.use Illuminate\Contracts\Http\Kernel as HttpKernel;
use Illuminate\Contracts\Console\Kernel as ConsoleKernel;

public function boot()
{
    if ($this->app->runningInConsole()) {
        // Log slow commands.
        $this->app[ConsoleKernel::class]->whenCommandLifecycleIsLongerThan(
            5000,
            function ($startedAt, $input, $status) {
                Log::warning("A command took longer than 5 seconds.");
            }
        );
    } else {
        // Log slow requests.
        $this->app[HttpKernel::class]->whenRequestLifecycleIsLongerThan(
            5000,
            function ($startedAt, $request, $response) {
                Log::warning("A request took longer than 5 seconds.");
            }
        );
    }
}

Make the implicit explicit
Many of these Laravel safety features take implicit behaviors and turn them into explicit exceptions. In the early days of a project, it’s easy to keep all of the implicit behaviors in your head, but as time goes on, it’s easy to forget one or two of them and end up in a situation where your application is not behaving as you’d expect.
You have enough things to worry about. Take some off your plate by enabling these protections!]]></content>
        <summary><![CDATA[A comprehensive overview of Laravel’s many safety features that can help you prevent painful mistakes.]]></summary>
      </entry>
    
      <entry>
        <title>Optimizing queries in arewefastyet</title>
        <link href="https://planetscale.com/blog/arewefastyet-query-optimization-with-insights" />
        <id>https://planetscale.com/blog/arewefastyet-query-optimization-with-insights</id>
        <published>2022-10-11T08:00:00.000Z</published>
        <updated>2022-10-11T08:00:00.000Z</updated>
        
        <author>
          <name>Florent Poinsard</name>
        </author>
        
        <author>
          <name>Harshit Gangal</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Arewefastyet is an automatic benchmarking tool for Vitess. It runs automated micro and macro benchmarks daily to monitor the performance of Vitess.
Arewefastyet uses PlanetScale for its database, which comes with an in-dashboard query monitoring tool: Insights. This tool provides a complete overview of your database's performance, including details all the way down to the individual query level.
Using Insights, we were able to detect and improve the performance of two slow queries, which resulted in an 85% decrease in query latency.

Detecting the slow queries
Insights includes an overview graph that lets you see several important metrics, such as query latency and number of rows read, at a glance. As you can see in the graph above, our p95 query latency was hovering around 40 ms. We wanted to see if we could improve this, so the first step was to narrow down the latency to the most problematic queries.
With Insights, this is simple. You can view a list of all queries that ran over a selected time period and sort them based on time per query.
This table immediately surfaced two queries that were running slowly. Let's examine them now.

The benchmark comparison query
The first query that showed up is used to return the previous successful run of the micro-benchmarks to compare against the latest run. This comparison is visible on the Microbenchmarks page.select
    e.uuid,
    e.git_ref
  from
    execution as e
  where
    e.source = :source
    and e.`status` = :status
    and e.type = :type
    and e.git_ref != :git_ref
  order by
    e.started_at desc
  limit
    :row_count

The OLTP and TPCC benchmark query
This next query is used to return the last runs of OLTP and TPCC benchmarks for X days. The result we get from this query allows us to display the result (qps, tps, latency, etc.) for each benchmark. We then display this information on the CRON analytics page.SELECT info.macrobenchmark_id, e.git_ref, e.source, e.finished_at, IFNULL(e.uuid, ''), " +
		"results.tps, results.latency, results.errors, results.reconnects, results.time, results.threads, " +
		"results.total_qps, results.reads_qps, results.writes_qps, results.other_qps " +
		"FROM execution AS e, macrobenchmark AS info, macrobenchmark_results AS results " +
		"WHERE e.uuid = info.exec_uuid AND e.status = \"finished\" AND e.finished_at BETWEEN DATE(NOW()) - INTERVAL ? DAY AND DATE(NOW() + INTERVAL 1 DAY) " +
		"AND e.source = ? AND info.vtgate_planner_version = ? AND info.macrobenchmark_id = results.macrobenchmark_id AND info.type = ?

Improving the slow queries
Now that we had identified our two problematic queries, it was time to improve them. This step was straightforward. We noticed in both cases that these queries were missing an index.
We started with the OLTP and TPCC benchmark query. We opened a new PlanetScale deploy request to add an index to finished_at and status. Once merged, the deploy request is reflected in the graph below by the purple line with the number 6, meaning deploy request 6.
To address the benchmark comparison query, we added an index on started_at. This update is reflected in deploy request number 7 on the graph.

As you can see, these two small additions resulted in an 85% decrease in query latency. With Insights, we were able to detect, debug, and massively improve our queries in just minutes.]]></content>
        <summary><![CDATA[Learn how we detected and optimized two slow queries in arewefastyet using PlanetScale Insights.]]></summary>
      </entry>
    
      <entry>
        <title>Introduction to MySQL joins</title>
        <link href="https://planetscale.com/blog/introduction-to-mysql-joins" />
        <id>https://planetscale.com/blog/introduction-to-mysql-joins</id>
        <published>2022-10-07T08:01:46.798Z</published>
        <updated>2022-10-07T08:01:46.798Z</updated>
        
        <author>
          <name>JD Lien</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Relational databases, such as MySQL, give you the ability to organize data into separate tables, but link the tables together to form relationships when necessary.
MySQL joins give you the ability to link data together in a MySQL database. A join is a way to get columns from more than one table into a single set of results. This is usually much more efficient than trying to perform multiple queries and combining them later.
This article looks at the different types of joins that can be performed in MySQL and goes over the different options you have to combine data from multiple tables: inner joins, left and right joins, and full outer joins.
A base example of MySQL joins
To further our understanding of joins, we’ll create a simple database of grocery items, each item having a category. Categories are stored in the categories table and items are stored in a separate items table.CREATE TABLE categories (
  id int PRIMARY KEY AUTO_INCREMENT,
  name varchar(250) NOT NULL
);

Example categories table populated with data:
id
name
1
Produce
2
DeliCREATE TABLE items (
  id int PRIMARY KEY AUTO_INCREMENT,
  name varchar(512),
  category_id int NULL
);

Example items table populated with data:
id
name
category_id
1
Apples
1
2
Cheese
2
We’ll build on this example throughout the tutorial to explore the different types of joins and how to use them.
Inner joins
Now that we have items and categories stored, we may want to display the items along with the category name instead of just the category_id, since “Deli” is more meaningful to a human than “2”.
To do this, we can use an INNER JOIN, which selects matching records between tables. This is the default behavior for JOIN, so INNER JOIN is the same as JOIN.
A common mistake with inner joins
If you do a join without an ON clause, you will do what is sometimes called a CROSS JOIN, which will show each row in the left table once for every row in the right table. This is not usually what we want — and it produces a lot more results.
Let’s look at an example of a cross join in MySQL:-- Don’t do this unless you know what you are doing:
SELECT * FROM items
JOIN categories; -- No ON columns specified!

We end up with way more results than we bargained for! With lots of data, this would be a big mess.
id
name
category_id
id
name
1
Apples
1
1
Produce
1
Apples
1
2
Deli
2
Cheese
2
1
Produce
2
Cheese
2
2
Deli
Specifying the columns to JOIN categories ON
To get the results we want, we must say which columns are related.
In other words, we have to say that the primary key (id) of categories relates to the foreign key (category_id) of items. This is what it looks like in MySQL:SELECT * FROM items
-- JOIN is the same as INNER JOIN
JOIN categories ON items.category_id = categories.id;

id
name
category_id
id
name
1
Apples
1
1
Produce
2
Cheese
2
2
Deli
Now, we have conveniently returned the category name along with each item!
Giving columns unique names
You may notice in the table above that there are now two name fields since both tables had their own name column. To make the query more usable, we can use MySQL aliases to output the columns AS something else.
For this example, we’ll use c for categories and i for items.SELECT * FROM items AS i -- we now refer to items as i
JOIN categories AS c -- we now refer to categories as c
    ON i.category_id = c.id;

Using AS is optional, so we will often see it left out.
It is also a good idea to specify all the columns we want to return instead of requesting all of them with *, especially when using tables with many columns, as this can make queries run faster. In this example, let’s omit the category id column.
Because the same column names are selected more than once, we should also specify which tables these columns come from. We can use the i and c aliases for that.SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name -- now refer to categories.name AS category_name
FROM items i
JOIN categories c ON i.category_id = c.id;

id
name
category_id
category_name
1
Apples
1
Produce
2
Cheese
2
Deli
Now a useful result is returned that we can display to a user!
To recap, we use MySQL inner joins to combine the data from two tables by a relationship. There is a left table — the first table specified after FROM (in this case, items), and a right table, specified after the JOIN (categories) in our example.
The inner join can be represented by this Venn diagram showing that the only data returned is the data where items and categories are related.

Left and right joins in MySQL
Let’s say we add more data to our tables:
an item without a category
a new category (but no items that use it yet).
categories
id
name
1
Produce
2
Deli
3
Dairy
items
id
name
category_id
1
Apples
1
2
Cheese
2
3
Bread
NULL
If we do an INNER JOIN on this data, it looks like this in MySQL:SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
JOIN categories c ON i.category_id = c.id;

id
name
category_id
category_name
1
Apples
1
Produce
2
Cheese
2
Deli
Notice anything missing?
Bread isn’t there! Why not?
When we do an inner join on i.category_id = c.id, we are telling MySQL to return only the records with a category. Since "Bread" has a category_id that is NULL, it doesn’t match anything and therefore isn’t returned.
Similarly, since no items have our new “Dairy” category, this will not be present in the results either.
Often, you will still want to return all the items, even those that don’t have a matching foreign key in the table it is joined to. To achieve this, we can use LEFT JOIN to ensure all the item records in the first (left) table are returned. RIGHT JOIN works almost exactly the same way, except it returns all the records in the right table — in this case, categories.
If we do a LEFT JOIN on this data, we will get the following:SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
LEFT JOIN categories c ON i.category_id = c.id;

id
name
category_id
category_name
1
Apples
1
Produce
2
Cheese
2
Deli
3
Bread
NULL
NULL
Now we have all the items thanks to using a LEFT JOIN instead of INNER JOIN!
Similarly, we can use a RIGHT JOIN to return all the categories (but not necessarily all the items).SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
RIGHT JOIN categories c ON i.category_id = c.id;

id
name
category_id
category_name
1
Apples
1
Produce
2
Cheese
2
Deli
NULL
NULL
NULL
Dairy
Left and right joins can be represented by these Venn diagrams.
 
Full outer joins
If we want to show all the items and all the categories, we must do a special join that is sometimes called a FULL OUTER JOIN, although this type of join is not supported in MySQL. We can, however, simulate this by doing both a LEFT JOIN and RIGHT JOIN, and combining them with a UNION.
To accomplish this, we have to add a WHERE clause that only includes the records with a NULL item id from the second part of the query. Otherwise, those items with categories will all show twice.SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
LEFT JOIN categories c ON i.category_id = c.id
UNION ALL
SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
RIGHT JOIN categories c ON i.category_id = c.id
-- This prevents duplicate items from showing
-- as we only want categories with no items.
WHERE i.id IS NULL;

id
name
category_id
category_name
1
Apples
1
Produce
2
Cheese
2
Deli
3
Bread
NULL
NULL
NULL
NULL
NULL
Dairy
This type of OUTER JOIN is represented by this diagram. 
Showing only unrelated data (WHERE keys are NULL)
It is sometimes helpful to query for only the records that aren’t related. We may want to find only the items that aren’t categorized — perhaps so we can find them to clean them up.
To do this, we can add an additional WHERE clause to a LEFT and RIGHT JOIN.SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
LEFT JOIN categories c ON i.category_id = c.id
WHERE c.id IS NULL;

id
name
category_id
category_name
3
Bread
NULL
NULL
This JOIN is represented here. 
To show only the categories without items, we can use a similar RIGHT JOIN with a WHERE clause that only shows records with a NULL item id.SELECT
    i.id,
    i.name,
    i.category_id,
    c.name AS category_name
FROM items i
RIGHT JOIN categories c ON i.category_id = c.id
WHERE i.id IS NULL;

id
name
category_id
category_name
NULL
NULL
NULL
Dairy
This JOIN is represented here.

Full outer joins with only unrelated data
Finally, if we want to show both the unrelated items and categories, we can use the OUTER JOIN type of query, but look for either the items or categories keys being NULL.
To make this query work, it helps to enclose the bulk of it in parentheses and apply the WHERE clause to the outer query.SELECT * FROM (
SELECT i.id,
          i.name,
          i.category_id,
          c.name AS category_name
   FROM items i
         LEFT JOIN categories c ON i.category_id = c.id
   UNION ALL
   SELECT i.id,
          i.name,
          i.category_id,
          c.name AS category_name
   FROM items i
         RIGHT JOIN categories c ON i.category_id = c.id
   WHERE i.id IS NULL
) AS all_items_all_categories
WHERE id IS NULL OR category_id IS NULL;

id
name
category_id
category_name
3
Bread
NULL
NULL
NULL
NULL
NULL
Dairy
This JOIN of unrelated items is represented here. 
Summary
You should now understand how to use MySQL joins to combine data from multiple tables and how each type of join differs. To summarize:
INNER JOIN or JOIN returns only records with matching keys in both tables.
LEFT JOIN returns records from the first table only if they also are referenced by the second table.
RIGHT JOIN returns records from the second table only if they also are referenced by the first table.
FULL OUTER JOIN returns all records from both tables, even if they don’t have a match in the other table.
WHERE can filter results of a join to only show records with NULL keys.
UNION can combine results of two queries into one result set.
With a good understanding of joins, you are on your way to doing powerful and efficient queries in MySQL. To learn even more about joins, you can check out our overview of MySQL joins video as well as our video on indexing joins in MySQL.]]></content>
        <summary><![CDATA[Learn how and when to use inner joins, outer joins, left joins, and right joins in MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>Indexing JSON in MySQL</title>
        <link href="https://planetscale.com/blog/indexing-json-in-mysql" />
        <id>https://planetscale.com/blog/indexing-json-in-mysql</id>
        <published>2022-10-04T00:03:57.138Z</published>
        <updated>2022-10-04T00:03:57.138Z</updated>
        
        <author>
          <name>Aaron Francis</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MySQL gave us the JSON data type back in mid-2015 with the release of MySQL 5.7.8. Since then, it has been used as a way to escape rigid column definitions and store JSON documents of all shapes and sizes: audit logs, configuration settings, 3rd party payloads, user-defined fields, and more.
Although MySQL gives us functions for reading and writing JSON data, you’ll quickly discover something that is conspicuously missing: the ability to directly index your JSON columns.
In other databases, the best way to directly index a JSON column is usually through a type of index known as a Generalized Inverted Index, or GIN for short. Since MySQL doesn’t offer GIN indexes, we’re unable to directly index an entire stored JSON document. All is not lost though, because MySQL does give us a way to indirectly index parts of our stored JSON documents.
Depending on the version of MySQL that you're using, you have two options for indexing JSON. In MySQL 5.7 you would have to create an intermediate generated column, but starting in MySQL 8.0.13, you can create a functional index directly.
Let’s start with a example table used for logging various actions taken in an application.CREATE TABLE `activity_log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `properties` json NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`)
)

Into that table we’ll insert JSON documents that have this shape:{
  "uuid": "e7af5df8-f477-4b9b-b074-ad72fe17f502",
  "request": {
    "email": "little.bobby@tables.com",
    "firstName": "Little",
    "formType": "vehicle-inquiry",
    "lastName": "Bobby",
    "message": "Hello, can you tell me what the specs are for this vehicle?",
    "postcode": "75016",
    "townCity": "Dallas"
  }
}

In our example, we’ll be indexing the email key inside the request object. This will allow our (fictional) users to quickly find forms submitted by specific people.
Let’s take a look at our first option for indexing: generated columns.
Indexing JSON via a generated column
A generated column can be thought of as a calculated, computed, or derived column. It is a column whose value is the result of an expression, rather than direct data input. The expression can contain literal values, built-in functions, or references to other columns. The result of the expression must be scalar and deterministic.
Since we’re trying to index the request.email field in the properties column, our generated column will use the JSON unquoting extraction operator to pluck the value out.
To verify that we’ve formed our expression correctly, we’ll first run a SELECT statement and inspect the results.mysql> SELECT properties->>"$.request.email" FROM activity_log;
+--------------------------------+
| properties->>"$.request.email" |
+--------------------------------+
| little.bobby@tables.com        |
+--------------------------------+

The ->> operator is a shorthand, unquoting extraction operator, making it equivalent to JSON_UNQUOTE(JSON_EXTRACT(column, path)). We could have written the previous SELECT statement using the longhand and gotten the same result.mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(properties, "$.request.email"))
    ->   FROM activity_log;
+-----------------------------------------------------------+
| JSON_UNQUOTE(JSON_EXTRACT(properties, "$.request.email")) |
+-----------------------------------------------------------+
| little.bobby@tables.com                                   |
+-----------------------------------------------------------+

Which method you choose is a matter of personal preference!
Now that we’ve confirmed our expression is valid and accurate, let’s use it to create a generated column.ALTER TABLE activity_log ADD COLUMN email VARCHAR(255)
  GENERATED ALWAYS as (properties->>"$.request.email");

The first part of the ALTER statement should look very familiar, we’re adding a column named email and defining it as a VARCHAR(255). In the latter half of the statement we declare that the column is generated and that it should always be equal to the result of the expression properties->>"$.request.email".
We can confirm our column has been added by selecting it as we would any other column.mysql> SELECT id, email FROM activity_log;
+----+-------------------------+
| id | email                   |
+----+-------------------------+
|  1 | little.bobby@tables.com |
+----+-------------------------+

You’ll see that MySQL is now maintaining this column for us. If we were to update the JSON value, the generated column value would change as well.
Now that we have our generated column in place, we can add an index to it like we would any other column.ALTER TABLE activity_log ADD INDEX email (email) USING BTREE;

That’s it! You’ve now indexed the request.email key in your JSON properties column. Let’s verify that MySQL would use the index to speed up queries that are filtering on email.mysql> EXPLAIN SELECT * FROM activity_log WHERE email = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: activity_log
   partitions: NULL
         type: ref
possible_keys: email
          key: email
      key_len: 768
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

MySQL reports that it plans to use the email index to satisfy this query.
Generated column indexes and the optimizer
MySQL's optimizer is a powerful and mysterious entity. When we give MySQL a command, we’re telling it what we want, not how to get it. Often times MySQL will take our query and rewrite it slightly, which is a good thing! Tens of thousands of hours across dozens of years have gone into making the optimizer effective and efficient.
When it comes to indexes on generated columns, the optimizer can "see through" different access patterns to ensure the underlying index is being used.
We defined an index on email, which is a generated column based on the expression properties->>"$.request.email". We’ve already proven that the index is used when we query against the email column. What’s more interesting is that the optimizer is smart enough to help us out if we forget to query against the named email column!
In the following query, we don’t access the generated column by name, but instead use the shorthand JSON extraction operator. (Some rows omitted from the EXPLAIN statement for brevity.)mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 768
        [...]: [...]

Even though we didn’t explicitly address the column by name, the optimizer understands that there is an index on a generated column based on that expression and opts to use the index. Thanks optimizer!
We can confirm this is the case for the longhand as well.mysql> EXPLAIN SELECT * from activity_log WHERE
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(properties, "$.request.email")
    ->   ) = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 768
        [...]: [...]

Again, the optimizer "reads through" our expression and uses the email index.
Not convinced? Let’s take a peek at what the optimizer is doing by running a SHOW WARNINGS after our previous EXPLAIN statement to see the rewritten query.mysql> SHOW WARNINGS;
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `activity_log`.`id` AS `id`,`activity_log`.`properties` AS `properties`,`activity_log`.`created_at` AS `created_at`,`activity_log`.`email` AS `email` from `activity_log` where (`activity_log`.`email` = 'little.bobby@tables.com')

If you look closely, you’ll see that the optimizer has rewritten our query and changed the equality comparison to reference the indexed column. This is especially useful if you're unable to control the access pattern because the query is being issued from a 3rd party package in your codebase, or you're unable to change this part of your code for some other reason.
If the underlying expression doesn’t match very closely then the optimizer will not be able to use the index, so be sure to take care when creating your generated column. The MySQL documentation explains the optimizer's use of generated column indexes in further detail.
Functional indexes
Beginning with MySQL 8.0.13, you're able to skip the intermediate step of creating a generated column and create what is called a "functional index." The MySQL documentation calls these functional key parts.
A functional index is an index on an expression rather than a column. Sounds a lot like a generated column, doesn’t it? There’s a reason it sounds similar, and that’s because a functional index is implemented using a hidden generated column! We no longer have to create the generated column, but a generated column is still being created.
There are a few gotchas with functional indexes though, especially when it comes to using them for JSON.
It would be nice to create our JSON index like this:ALTER TABLE activity_log
  ADD INDEX email ((properties->>"$.request.email")) USING BTREE;

But if you do try that, you get a nasty error:Query 1 ERROR: Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.

So what’s going on here? In our earlier examples, we were the ones in charge of creating the generated column and we declared it as a VARCHAR(255), which is easily indexable by MySQL.
However, when we use a functional index, MySQL is going to create that column for us based on the data type that it infers. JSON_UNQUOTE returns a LONGTEXT value, which is not able to be indexed.
Fortunately, the error message points us in the right direction: we need to cast our value to a type that is not LONGTEXT. Casting using the CHAR function tells MySQL to infer a VARCHAR data type.ALTER TABLE activity_log
  ADD INDEX email ((CAST(properties->>"$.request.email" as CHAR(255)))) USING BTREE;

Now that we’ve added the index, we’ll see if it works by running an EXPLAIN.mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: activity_log
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

Unfortunately, our index isn’t being considered at all, so we’re not out of the woods yet.
Unless otherwise specified, casting a value to a string sets the collation to utf8mb4_0900_ai_ci. The JSON extraction functions, on the other hand, return a string with a utf8mb4_bin collation. Therein lies our problem! Because the collation is mismatched between the query's expression and the stored index, our new functional index isn’t being used.
The final step is to explicitly set the collation of the cast to utf8mb4_bin.ALTER TABLE activity_log
  ADD INDEX email ((
    CAST(properties->>"$.request.email" as CHAR(255)) COLLATE utf8mb4_bin
  )) USING BTREE;

Rerunning the previous EXPLAIN, we can see that we’re finally in a position to use the functional index.mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 1023
        [...]: [...]

Clearly functional indexes come with a few pitfalls, some of which are explicit and easy to debug, and some that require a little bit more digging into the documentation.
Remember that functional indexes use hidden generated columns under the hood. If you prefer to take control of the generated column yourself (even in MySQL 8.0.13 and later) that’s a perfectly reasonable approach!
While direct JSON indexing may not be available in MySQL, indirect indexing of specific keys can cover a majority of use cases.
Don’t just stop with JSON, either! You can use generated columns and functional indexes across all types of common, hard to index patterns.
Go forth and index with confidence.]]></content>
        <summary><![CDATA[Learn how to index JSON in MySQL with generated columns and functional indexes.]]></summary>
      </entry>
    
      <entry>
        <title>MySQL data types: VARCHAR and CHAR</title>
        <link href="https://planetscale.com/blog/mysql-data-types-varchar-and-char" />
        <id>https://planetscale.com/blog/mysql-data-types-varchar-and-char</id>
        <published>2022-09-30T15:00:00.000Z</published>
        <updated>2022-09-30T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Overview
Ever find yourself building a database only to start questioning what data types you should use for a specific column? In this entry of the MySQL data types series, we’ll explore the various ways you can save strings and text to a database to help demystify the options you have as a developer, starting with VARCHAR and CHAR.
VARCHAR vs CHAR
VARCHAR is probably the most widely used data type for strings. It stores a string of variable length up to a maximum of 65,535 characters. When creating a VARCHAR field, you’ll be able to specify the maxmimum number of characters the field will accept using the VARCHAR(n) format, where n is the maximum number of characters to be stored. Due to the fact that is is variable length, it will only allocate enough disk space to store the contents of the string, not the full length of the contents passed in.
VARCHAR also allocates a little bit of extra space with each value stored. Depending on the space required to store the data, 1 or 2 bytes of overhead will be allocated. If the space required is less than 255 bytes, a 1 byte prefix will be added, otherwise a 2 byte prefix will be used. The exact space required to store a value depends on the character set used (more on that in a bit).
CHAR is another method to store strings, but it has a maximum length of 255 and is fixed length. As with VARCHAR, you may optionally set the maximum number of characters in a CHAR field with the CHAR(n) format. If not specified, n defaults to 1. The values stored in a CHAR column are right-padded with empty spaces, so it will always store n characters regardless of the string being saved. In certain circumstances, this can actually increase performance of the database.
Factoring in the charset
While most programming languages use characters from the English language, humans across the world write and read using different types of characters. This can be something simple like Ñ in Spanish, or something very different like データベース in Japanese. To address this, MySQL has different character sets (or charsets) to address the symbols used in different languages. Character sets affect the way text is stored in the database, but also affect the amount of storage space allocated when saving data.
For example, when using the default charset of utf8mb4, MySQL will allocate 4 bytes per character stored. Factoring this in, along with a maximum row size of 65,535 bytes across ALL columns, you'd realistically only be able to create a VARCHAR column with a maximum length 16,383 characters due to the storage requirements for each character.
Visualizing the differences
When saving data to a CHAR field, one side effect is that any trailing spaces in the string are effectively lost when the value is saved. In fact, MySQL will not even return trailing spaces when you query data from a CHAR column because it has to assume that the extra spaces are just padding.
To demonstrate this, let’s create a table with two columns, one VARCHAR(20) and one CHAR(20). We’ll then insert some data with five spaces at the end of it to see how it’s stored.CREATE TABLE strings(
	id INT PRIMARY KEY AUTO_INCREMENT,
	variable VARCHAR(20),
	fixed CHAR(20)
);
INSERT INTO strings (variable, fixed) VALUES ("Drifter     ", "Drifter     ");

Now if I run the following SELECT statement, it appears that the returned data is the same.SELECT * FROM strings;


However, if I use the CHAR_LENGTH function to calculate the number of characters used in each field, you’ll notice the data stored in the VARCHAR field (respresented with varchar_data_length) is 12, which considers the 5 extra space characters at the end, whereas the CHAR field only shows 7. This is because MySQL is storing the whitespace at the end of the VARCHAR value, but it assumes the extra space at the end of the CHAR value is the padding that was appended based on the data type.SELECT CHAR_LENGTH(variable) AS varchar_data_length, CHAR_LENGTH(fixed) AS char_data_length FROM strings;


As stated earlier, VARCHAR values have an extra overhead when the data is written to disk as well. This means that if you are storing the string “Spider” which is 6 characters in length, and you are storing it in both a VARCHAR(6) and a CHAR(6) column, the VARCHAR value will use 25 bytes (4 bytes per character using the utf8mb4 character set plus 1 byte of overhead) of disk space, whereas the CHAR value will use 24 bytes.
However, if you store “Eido” in those same columns, the VARCHAR will only use 5 bytes and the CHAR will still use 6 bytes. Since the CHAR data type is fixed-length, it is right-padded with 2 empty spaces, for a total of 6.
Value
VARCHAR(6) Stored value
VARCHAR(6) Space used
CHAR(6) Stored value
CHAR(6) Space used
"Spider"
"Spider"
25 bytes
"Spider"
24 bytes
"Eido"
"Eido"
17 bytes
"Eido  "
24 bytes
"Eido  "
"Eido  "
25 bytes
"Eido  "
24 bytes
When to use VARCHAR vs CHAR
Now that you understand the differences between VARCHAR and CHAR, here are a few tips on deciding which data type fits your application best:
Use VARCHAR if:
You need to store a string with more than 255 characters.
You find yourself in a rare scenario where you do need to preserve trailing spaces.
Use CHAR if:
You are at or below 255 characters, and you always know the length of the string.
A fixed-length serial number would be a good example of when CHAR is useful.
Further learning
If you'd like to learn more about data types in MySQL, we have an article on the INT data type and one on the JSON data type that you may find useful.
We also have short videos on the following data types:
Integers
Decimals
Strings
Binary Strings
Long Strings
Enums
Dates
JSON]]></content>
        <summary><![CDATA[In this entry of the series we explore using VARCHAR and CHAR data types in your database and give some pointers on which type is best to use and when.]]></summary>
      </entry>
    
      <entry>
        <title>Debugging database errors with Insights</title>
        <link href="https://planetscale.com/blog/debugging-database-errors-with-insights" />
        <id>https://planetscale.com/blog/debugging-database-errors-with-insights</id>
        <published>2022-09-27T00:03:57.138Z</published>
        <updated>2022-09-27T00:03:57.138Z</updated>
        
        <author>
          <name>Rafer Hazen</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[As much as we want to avoid them, all applications eventually encounter errors interacting with their database. What’s important is that you have the ability to easily view and understand these errors so you can quickly solve the underlying issue.
To help with that, we’ve introduced a new error tracking feature to Insights that will help you get to the bottom of things quickly. To show this new feature off in action, let’s walk through an example of how the PlanetScale team used this internally to troubleshoot an issue.
Investigating an error
After releasing Insights error tracking for PlanetScale staff, we noticed occasional upticks in the new "Query errors" graph on our main production database.

In the errors tab, we saw a fair number of AlreadyExists errors on the database_branch_password table. These errors weren’t alarmingly frequent — we receive a few 10s to a few hundred per day — but we wanted to dig in to ensure that we weren’t causing our users frustration with failed requests.

Our first step is to click on the error to see a list of recent occurrences.

This page shows the full error message and lists individual occurrences of the errors with the timestamp, normalized SQL query, and any associated tags. This page has a few interesting things to tell us:
The error is coming from the DatabaseBranchPasswords#create action
Occurrences of these errors come in small batches with nearly identical timestamps
The actor tag for each batch of errors is always the same
The index with duplicate entries is defined as follows in our Rails schema file:t.index ["database_branch_id", "display_name"], name: "idx_branch_id_display_name", unique: true

Given the error message and the index definition, we can infer that our application is attempting to insert multiple rows with the same values for the database_branch_id and display_name columns, and MySQL is rejecting the insert. To debug this from the application side, the tags show us that the create action of DatabaseBranchPasswordsController is the place to start. This action ultimately ends with a call to the create method of the DatabaseBranchPassword ActiveRecord model. In DatabaseBranchPassword we have the following uniqueness validation, which should return errors on the model if the uniqueness constraint in question is violated:class DatabaseBranchPassword < ApplicationRecord
  ...
  validates :display_name, uniqueness: { scope: [:database_branch_id]
---
  ...
end

Next, we verified in a development environment that the uniqueness validation seemed to be working correctly: when we try to create two identical DatabaseBranchPassword rows, we get an ActiveRecord error showing that the name has already been taken. So, with the validation working as expected, what could be going on here?
This is where another piece of information we learned from the errors page comes into play: the queries that attempt to insert duplicate rows came in at nearly the same time, leading us to suspect that a race condition could be involved. Could it be possible that two attempts to create a row with the same values both pass the Rails uniqueness validation and get sent to the database? Turns out: yes!
The uniqueness validation queries the database to see if the display_name has been taken, and if it hasn’t, ActiveRecord attempts to persist the row. If two (or more) requests to our password create API are initiated at nearly the same time, it’s possible that two separate application threads could both query the database at the same time, before either thread had created the record, and then proceed as if there was no issue. The database, as the final arbiter of uniqueness, would then only allow one of these queries to succeed and the other would receive the error we see in Insights.
Now that we know this error occurs when multiple nearly-simultaneous API requests are issued, our next step was to determine who or what was issuing these requests. Because we tag all queries from authenticated requests with information about the actor, it was easy to look them up and determine that an internal tool was issuing multiple password create requests in parallel. Our solution was simply to modify the script to avoid that behavior. Because an interactive user is unlikely to issue password creates quickly enough to trigger this behavior, we were content to call this issue solved.
Tags
In addition to including tags in Insights errors, we’ve also improved how tags work for Insights in general. Most notably, you can now search both queries and errors based on tags. Searching queries by tag has one caveat: to associate tags, a query pattern must have had at least one query that took more than 1 second, read more than 10k rows, or resulted in an error. We find that most of the queries we want to search for have met these conditions.
Queries can be filtered by tag with the following syntax: tag:tag_name:tag_value.

To show all queries that have a particular tag key present, regardless of the value, use tag:tag_name.

Try it out now!
Insights errors and improved tag searching functionality are available for all plans now. Try it out and let us know what you think!
For more information, check out the Query Insights documentation.]]></content>
        <summary><![CDATA[Learn about the new PlanetScale Insights database errors feature.]]></summary>
      </entry>
    
      <entry>
        <title>The MySQL JSON data type</title>
        <link href="https://planetscale.com/blog/the-mysql-json-data-type" />
        <id>https://planetscale.com/blog/the-mysql-json-data-type</id>
        <published>2022-09-23T15:00:00.000Z</published>
        <updated>2022-09-23T15:00:00.000Z</updated>
        
        <author>
          <name>Mike Stojan</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Overview
JavaScript Object Notation (JSON) is a light-weight text-based file format similar to YAML or XML which simplifies data exchange. It was invented by Douglas Crockford in the early 2000s and became increasingly popular with the rise of document-based (also called NoSQL) databases.
JSON supports strings, numbers, booleans, objects, and arrays as well as null values. A simple JSON example containing key-value pairs, an object "bandMembers" and an array "songs" would look like this:{
  "artist": "Starlord Band",
  "bandMembers": {
    "vocals": "Steve Szczepkowski",
    "guitar": "Yohann Boudreault",
    "bass": "Yannick T.",
    "drums": "Vince T."
  },
  "bandMembersCount": 4,
  "album": "Space Rider",
  "releaseDate": "2021-10-25",
  "songs": [
    "Zero to Hero",
    "Space Riders with No Names",
    "Ghost",
    "Bit of Good (Bit of Bad)",
    "Watch me shine",
    "We’re Here",
    "The Darkness inside",
    "No Guts No Glory",
    "All for One",
    "Solar Skies"
  ],
  "songsCount": 10
}

MySQL has implemented rudimentary support for the JSON data type with version 5.7.8 in mid 2015 and has been adding improvements and new features ever since. Seven years later, MySQL now supports numerous SQL functions to work with JSON documents, it provides automatic content validation, allows for partial in-place updates, and uses a binary storage format for increased performance.
When to use JSON
Relational databases follow a predetermined structure and put emphasis on data cohesion and integrity. To achieve this, its data types and formats, as well as its data size, are all being enforced rigorously by the means of a schema.
The JSON data type is a bit of an anti-pattern to the rigorousness nature of such a schema. It allows you to break out of it, to gain flexibility when you need it. And it proves useful as long as you're aware of the trade-offs described in the next section.
Some examples of when it may be beneficial to store data as a JSON document are:
Log output written by an application or a server
A Rest API response you want to store
Storing configuration data
A set of entities with variable attributes
You can also use JSON documents in your relational database design to break up complex relations spanning across multiple tables. This process is called denormalization, which is another relational database anti-pattern. In certain cases, however, it can result in performance improvements depending on your use case and application design.
Caveats
The flexibility provided by the JSON data type comes with a few caveats you will need to be aware of.
Most notably, you will need to factor in that JSON documents often require more storage capacity. In MySQL, their storage footprint is similar to the LONGBLOB or LONGTEXT data type. There is an overhead, though, due to the binary enconding and the added metadata and dictionaries which exist to speed up database reads. A good rule of thumb is that a string stored in JSON uses roughly 4 to 10 bytes of additional storage compared to a LONGBLOB or LONGTEXT column.
If you want to optimize your database schema towards storage efficiency, it is best to go with MySQL's more traditional data types (CHAR, VARCHAR, INT, and alike), as they are all more storage-efficient than JSON can likely ever be.
Another caveat to be aware of is the performance impact. Similar to other binary formats, JSON documents cannot be indexed directly. This, and the variable amount of data you can store in a JSON document, means that querying a JSON column often uses more buffer space and returns larger result sets, leading to more data exchange.
While JSON documents cannot be indexed directly in MySQL, they can be indexed indirectly. Learn how in Indexing JSON in MYSQL.
While a JSON document stored in MySQL can be up to 1 GB, in theory, it is recommended to keep JSON documents to only a few MB in size. On PlanetScale, we support JSON documents up to 67 MB.
JSON functions
MySQL comes with a robust set of JSON functions enabling you to create, update, read, or validate your JSON documents. PlanetScale supports all of the JSON functions except JSON_TABLE.
Common operations
Let’s walk through a few examples together.
First, we create a table with an INTEGER and a JSON column.CREATE TABLE songs (id int AUTO_INCREMENT PRIMARY KEY NOT NULL, songs JSON);

An empty table needs data, so let’s use JSON_ARRAY to add some.INSERT INTO songs VALUES(id, JSON_ARRAY('Zero to Hero', 'Space Riders with No Names', 'Ghost', 'Bit of Good (Bit of Bad)', 'Watch me shine', 'We\'re Here', 'The Darkness inside', 'No Guts No Glory', 'All for One', 'Solar Skies'));

How do we know this is an array? Well, we can get its type by using JSON_TYPE.SELECT JSON_TYPE(songs) FROM songs;
+------------------+
| json_type(songs) |
+------------------+
| ARRAY            |
+------------------+

If we want to extract an item from the array, we can do so with JSON_EXTRACT. In the below example, we extract the fourth element from the array.blog-mysql-json/main> SELECT JSON_EXTRACT(songs, '$[3]') FROM songs;
+-----------------------------+
| json_extract(songs, '$[3]') |
+-----------------------------+
| "Ghost"                     |
+-----------------------------+

We can also use ->, which is the operator equivalent for JSON_EXTRACT.blog-mysql-json/main> SELECT songs->'$[3]' FROM songs;
+-----------------+
| songs -> '$[3]' |
+-----------------+
| "Ghost"         |
+-----------------+

If we need the unquoted result, we can use ->>, which is short for JSON_UNQUOTE(JSON_EXTRACT()).blog-mysql-json/main> SELECT songs->>'$[3]' FROM songs;
+------------------+
| songs ->> '$[3]' |
+------------------+
| Ghost            |
+------------------+

If we need to add data to the JSON array, we can use JSON_ARRAY_APPEND or JSON_ARRAY_INSERT to update it.UPDATE songs SET songs = JSON_ARRAY_APPEND(songs, '$', "One last song");
UPDATE songs SET songs = JSON_ARRAY_INSERT(songs, '$[0]', "First song");

For more information on how to use all the different JSON functions, please see MySQL's documentation for the JSON data type and the JSON Function reference.
Further learning
If you'd like to learn more about data types in MySQL, we have an article on the INT data type and one on the VARCHAR data type that you may find useful.
We also have short videos on the following data types:
Integers
Decimals
Strings
Binary Strings
Long Strings
Enums
Dates
JSON]]></content>
        <summary><![CDATA[Learn what the MySQL JSON data type is when to use MySQL JSON and some caveats to using JSON documents in relational databases.]]></summary>
      </entry>
    
      <entry>
        <title>Using the PlanetScale serverless driver with AWS Lambda functions</title>
        <link href="https://planetscale.com/blog/using-the-planetscale-serverless-driver-with-aws-lambda-functions" />
        <id>https://planetscale.com/blog/using-the-planetscale-serverless-driver-with-aws-lambda-functions</id>
        <published>2022-09-21T14:00:00.000Z</published>
        <updated>2022-09-21T14:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Overview
We recently released the PlanetScale serverless driver for JavaScript to allow developers to connect to their databases over HTTP, as opposed to TCP, which is blocked by some cloud providers. This guide will walk you through the most common use cases of the driver while building a serverless API on AWS using a Lambda function and API Gateway.
To follow along, you’ll need:
A PlanetScale account.
An AWS account.
NodeJS installed.
VS Code and the VS Code Rest Client plugin installed.
Please note that building on top of AWS costs real money. Some of the costs may be covered on the AWS free tier.
Set up the database on PlanetScale
Start in PlanetScale by creating a new database. I’ll name mine travel_api.

Now let’s add some data. Click on "Branches" > "main" to access the main branch.

Now click on "Console" to access the web console of the main branch.

Run the following two SQL snippets to create a table and add a few records to it.CREATE TABLE hotels(
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL,
  address VARCHAR(50) NOT NULL,
  stars FLOAT(2) UNSIGNED
);
INSERT INTO hotels (name, address, stars) VALUES
  ('Hotel California', '1967 Can Never Leave Ln, San Francisco CA, 94016', 7.6),
  ('The Galt House', '140 N Fourth St, Louisville, KY 40202', 8.0);

The serverless driver is currently in beta and needs to be enabled on the database level. To do this, click on the "Settings" tab > "Beta features", and click "Enroll" next to the PlanetScale serverless driver for JavaScript line. By enabling this feature, every new password created will have a different hostname, specifically to endpoints that support accessing your database over HTTP.

Now head back to the "Overview" tab and click "Connect".

From the Connect modal, select "@planetscale/database" from the dropdown. Note the text in the .env tab as we’ll need to configure these as environment variables in AWS.

Set up the Lambda function
Start by creating an empty folder on your computer and opening VS Code. Open the integrated terminal and run the following command to initialize the project & install the necessary packages:npm init -y
npm install @planetscale/database node-fetch

Open the package.json file and add a new entry to the file named “type” and give it a value of “module”.{
  "name": "serverless-driver-aws-demo",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "type": "module", # ◀️ add type here
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@planetscale/database": "^1.3.0",
    "node-fetch": "^3.2.10"
  }
}

Create a file called index.js and add the following code to it.import { Client } from '@planetscale/database'
import fetch from 'node-fetch'

const db = new Client({
  fetch,
  host: process.env.DATABASE_HOST,
  username: process.env.DATABASE_USERNAME,
  password: process.env.DATABASE_PASSWORD
})

export async function handler(event) {
  const conn = db.connection()
  const results = await conn.execute('SELECT * FROM hotels')
  console.log(results)
}

Now we need to get the code into an AWS Lambda function. Log into the AWS console, search for “Lambda”, and select it from the list.

Click "Create function".

Give the function a name and make sure "Node.js 16.x" is selected under Runtime.

Once the function has been created, we need to upload a zipped version of the code we wrote. Zip up the contents of the folder, then in AWS, select "Upload from" > ".zip file".

Click the "Upload" button from the modal, select the zipped folder you created, and click "Save".

Next, select "Configuration" > "Environment variables", and click "Edit" in the main section of the window to add environment variables.

Click "Add environment variable" three times to get three entries and populate the fields using the environment variables gathered from the Connect modal in PlanetScale. Click "Save" once you’ve added them.

Now head back to the "Code" tab and click "Test".

A modal will appear called Configure test event. Populate the "Event name" field with any arbitrary string (I’ll use “Test”), scroll to the bottom, and click "Save".

Now click "Test" again and it will run the function. You should see the output of the results object in a tab of the editor.

Build an API with API Gateway
Now that you’ve seen how to use the serverless driver for JavaScript in the code, let’s explore the other common query types by re-building the function to support API Gateway, and mapping some of the HTTP methods to those queries like so:
HTTP Method Name
Query Type
get
SELECT
post
INSERT
put
UPDATE
delete
DELETE
In the following code sample, we’ve pulled out the logic to run the SELECT statement from the previous section into the get() function. We’re also using a switch statement on event.requestContext.http.method to map the request to a different function depending on that HTTP method. Finally, we also added a method to handle a post request so we can add data to the database.
Update index.js to match the following code, zip up the contents once again, and upload them into Lambda using the process defined earlier:import { Client } from '@planetscale/database'
import fetch from 'node-fetch'

const db = new Client({
  fetch,
  host: process.env.DATABASE_HOST,
  username: process.env.DATABASE_USERNAME,
  password: process.env.DATABASE_PASSWORD
})

export async function handler(event) {
  const conn = db.connection()

  switch (event.requestContext.http.method) {
    case 'GET':
      return await get(conn, event)
    case 'POST':
      return await post(conn, event)
    default:
      return {
        statusCode: 404
      }
  }
}
async function get(conn, event) {
  const results = await conn.execute('SELECT * FROM hotels')

  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(results.rows)
  }
}

async function post(conn, event) {
  const { name, address, stars } = JSON.parse(event.body)

  const res = await conn.execute('INSERT INTO hotels (name, address, stars) VALUES (:name, :address, :stars)', {
    name,
    address,
    stars
  })

  if (res.error) {
    return {
      statusCode: 500,
      headers: {
        'Content-Type': 'application/javascript'
      },
      body: JSON.stringify(res.error)
    }
  }

  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'application/javascript'
    },
    body: JSON.stringify({
      id: Number(res.insertId)
    })
  }
}

Now head into the AWS console and find “API Gateway” using the global search.

Click on "Create API" to start the process of building a new instance of API Gateway for the Lambda function we created.

To create an HTTP API, click the "Build" button in that section.

Click on "Add integration". Then select Lambda as the integration type, and select the Lambda you created in the previous section. Give your API a name as well and click "Next".

Under Configure routes, change the Resource path to be /hotels and click "Next".

Nothing needs to be changed in the Define stages step, so click "Next".

Finally, click "Create" to complete the process.

Now grab the Invoke URL from the API you just created, we’ll use this to build some simple tests within VS Code.

Back in VS Code, create a new file in the root of your directory called tests.http and populate it with the following. Make sure to replace <YOUR_INVOKE_URL> with what you pulled from API Gateway.@hostname = <YOUR_INVOKE_URL>

### Fetch hotels
get {{hostname}}/hotels

### Create hotel
post {{hostname}}/hotels
Content-Type: application/json
{
  "name": "Orka Sunlife Resort",
  "address": "Güzgülü Mevkii, Ölüdeniz Cd.",
  "stars": 4.2
}

The VS Code Rest Client plugin should recognize this file and display a small link with "Send Request" above each defined request method.

Click the "Send Request" link above the get method and you should receive an array of hotels in a second window pane that will be created automatically.

Now test the post method by clicking "Send Request" above that one. You should receive an id field to reflect the ID of the inserted record in PlanetScale.

Optionally you can also check the database in PlanetScale using the console to run the following script:SELECT * FROM hotels;

This should display the newly created hotel along with the original two added earlier.

Now let’s get the put and delete methods working. Update the handler function in the code to reflect the following. Note that the switch statement has been updated to handle those methods.export async function handler(event) {
  const conn = db.connection()

  switch (event.requestContext.http.method) {
    case 'GET':
      return await get(conn, event)
    case 'POST':
      return await post(conn, event)
    case 'PUT':
      return await put(conn, event)
    case 'DELETE':
      return await del(conn, event)
    default:
      return {
        statusCode: 404
      }
  }
}

At the end of the file, add the put and del JavaScript methods (we have to use del since delete is a keyword in the JavaScript language). Zip and re-upload the code into AWS after this has been done.async function put(conn, event) {
  const { id } = event.pathParameters
  const { name, address, stars } = JSON.parse(event.body)

  const res = await conn.execute('UPDATE hotels SET name=:name, address=:address, stars=:stars WHERE id=:id', {
    name,
    address,
    stars,
    id
  })

  if (res.error) {
    return {
      statusCode: 500,
      headers: {
        'Content-Type': 'application/javascript'
      },
      body: JSON.stringify(res.error)
    }
  }

  return {
    statusCode: 200
  }
}

async function del(conn, event) {
  const { id } = event.pathParameters

  const res = await conn.execute('DELETE FROM hotels WHERE id=:id', {
    id
  })

  if (res.error) {
    return {
      statusCode: 500,
      headers: {
        'Content-Type': 'application/javascript'
      },
      body: JSON.stringify(res.error)
    }
  }

  return {
    statusCode: 200
  }
}

Since typically put and delete methods are used on individual records, they are often accompanied by a record ID in the URL. We need to add an API route in API Gateway to handle the URL pattern /hotels/{id}. Navigate to your API in API Gateway again, select "Routes" from the left nav, and click "Create".

In the route field, add "/hotels/{id}" and click "Create".

Select the new route from the list and click "Attach integration".

Select your Lambda function from the list and click "Attach integration" again.

Now head back to the tests.http file in VS Code and add the following two requests to the file. Notice the JSON under the put request has each field modified just a bit. An ID of 3 is also at the end of the URL, which is how the Lambda code identifies which record it should update.### Update hotel
put {{hostname}}/hotels/3
Content-Type: application/json
{
  "name": "Orka Sunlife Resort Aqua",
  "address": "Güzgülü Mevkii, Ölüdeniz Cd. Turkey",
  "stars": 4.3
}
### Delete hotel
delete {{hostname}}/hotels/3

Run the put request and it simply returns an OK status, but if you run the get request again, you’ll see that the third entry in the array reflects the updated values we sent int.

Finally, run the delete request. Again, it returns an OK status. Run the get again and that third record is removed.

For more information on how to use the PlanetScale serverless driver for JavaScript, refer to our documentation portal where we have a detailed overview of when you should use it, as well as an example built with Node and Express that you can run directly on your workstation.]]></content>
        <summary><![CDATA[Learn how to use the PlanetScale serverless driver by creating a serverless API in AWS with JavaScript.]]></summary>
      </entry>
    
      <entry>
        <title>Declarative MySQL schemas with Atlas CLI</title>
        <link href="https://planetscale.com/blog/declarative-mysql-schemas-with-atlas-cli" />
        <id>https://planetscale.com/blog/declarative-mysql-schemas-with-atlas-cli</id>
        <published>2022-09-16T14:00:00.000Z</published>
        <updated>2022-09-16T14:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Overview
One of the best things the DevOps movement has ushered in is the concept of Infrastructure as Code. IaC lets you define your infrastructure in specially formatted files, and allows you to use automation tools to create or modify your infrastructure based on those files. But did you know that you can also manage your database schemas in a similar approach?
Atlas CLI is a command line tool that helps manage the structure of your database by keeping a representation of the schema in a file. It can be used by itself to manage your schema changes, or as part of a CI/CD pipeline to automate the process of updating your schema based on the definition file. In this article, we’ll cover the basics of using Atlas CLI to generate a schema definition file, as well as updating the schema of a PlanetScale database using the tool.
To follow along, you should have the following:
A PlanetScale account.
The PlanetScale CLI installed and configured.
The Atlas CLI installed and configured.
Set up the database
Start by creating a new database in PlanetScale using the CLI.pscale database create hotels_db

Now create a password to use to connect to the new database.pscale password create hotels_db main <YOUR_PASSWORD_NAME>

Giving your password a name lets you identify the credential set in the PlanetScale dashboard.
Take note of the USERNAME, ACCESS HOST URL, and PASSWORD values as you’ll need them in the following section. Next, you’ll need to enter into a shell session with the database to create a table. Run the following command to enter the shell:pscale shell hotels_db main

Run the following SQL script to create a table called hotels:CREATE TABLE hotels(
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL,
  address VARCHAR(50) NOT NULL,
  stars FLOAT(2) UNSIGNED
);

Generate the schema definition file
Atlas makes it easy to apply a "Database as Code" approach to an existing database by generating a file representing the schema of that database. Before you can do so, you’ll need to craft a connection string so the CLI can properly connect to the PlanetScale database created in the previous section. Use the following format to create your own connection string:"mysql://<USERNAME>:<PASSWORD>@<ACCESS HOST URL>/hotels_db?tls=true"

Going forward, this article will use <CONNECTION_STRING> as a reference to the connection string above. To generate a schema file based on the database above, run the following command:atlas schema inspect -u <CONNECTION_STRING> > schema.hcl

You should now have a file named schema.hcl in the working directory. If you inspect it, it should look like the following. Note how the outer table node contains a reference to the hotels_db schema, as well as a definition for each column created in the previous section.table "hotels" {
  schema = schema.hotels_db
  column "id" {
    null           = false
    type           = int
    unsigned       = true
    auto_increment = true
  }
  column "name" {
    null = false
    type = varchar(50)
  }
  column "address" {
    null = false
    type = varchar(50)
  }
  column "stars" {
    null     = true
    type     = float
    unsigned = true
  }
  primary_key {
    columns = [column.id]
  }
}
schema "hotels_db" {
  charset = "utf8mb4"
  collate = "utf8mb4_0900_ai_ci"
}

Modify the schema
Modifying the schema simply involves making a change to the schema definition file and applying it with the atlas schema apply command. Let’s add a description column to the hotels table by adding the following snippet between the stars column and the primary_key node:column "description" {
  null = false
  type = varchar(100)
}

Run the apply command using the connection string and a reference to the schema.hcl file.atlas schema apply -u <CONNECTION_STRING> -f schema.hcl

Atlas will show you the changes it is about to make to the database upon applying the updated schema. Hit enter on your keyboard to confirm the changes.-- Planned Changes:
-- Modify "hotels" table
ALTER TABLE `hotels_db`.`hotels` ADD COLUMN `description` varchar(100) NOT NULL
Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure?:
  ▸ Apply
    Abort

Once changes have been applied, you can inspect the table by using the pscale shell, as described above, and running the following DESCRIBE command:DESCRIBE hotels;

Notice how the table contains the description column now. That column was added by Atlas when the schema was applied.

Closing remarks
Atlas can be an incredible utility to add to your DevOps tool kit. It helps you manage your database as code instead of managing your schema manually with SQL commands. Keeping your database schema under version allows it to have accountability (by configuring Atlas to apply changes on git operations) as well as provides a historical reference to see how your database structure changes over time. One thing to note is that when using Atlas with PlanetScale, you’ll need to make sure you don’t turn on safe migrations, as that will prohibit you from running DDL on production.]]></content>
        <summary><![CDATA[Learn how to use Atlas CLI with PlanetScale to define your database as code.]]></summary>
      </entry>
    
      <entry>
        <title>Build a multi-stage pipeline with PlanetScale and AWS</title>
        <link href="https://planetscale.com/blog/build-a-multi-stage-pipeline-with-planetscale-and-aws" />
        <id>https://planetscale.com/blog/build-a-multi-stage-pipeline-with-planetscale-and-aws</id>
        <published>2022-09-13T15:00:00.000Z</published>
        <updated>2022-09-13T15:00:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[One of the foundational features of PlanetScale is the concept of branching — creating an isolated version of your production database to test schema changes before applying them to a production instance. Through the power of our CLI and some scripting, you can also automate the process of merging those changes and integrate it into your CI/CD pipelines.
In this article, we’ll step through creating a multi-environment pipeline powered by PlanetScale and AWS, triggered by a common developer flow in GitHub. We’ll start with a simple API written in Go, and build the pipeline around deploying it to two separate container services in Lightsail, driven by AWS CodeBuild.
Prerequisites
If you want to follow along with this article, you’ll need to have the following set up ahead of time:
A PlanetScale account.
A GitHub account.
A Docker Hub account.
Docker Desktop installed and running.
An AWS account.
The AWS CLI is installed & configured on your computer.
Please note that we will be creating resources in AWS which cost real money. Resources may be covered under the free tier for AWS users, so check your AWS plan.
The overall setup
There are quite a few moving pieces to this demo. Before I dive into exactly how to pull this off, I want to take a moment to describe what each service is used for and generally how they contribute to the overall demo.
GitHub and the sample application
I’m using a small API written in Go, as it’s my preferred type of project and language, but I’m also wrapping it up in Docker as a container that will be deployed to AWS Lightsail. The repository is hosted on GitHub and I’ll be using commits and pull requests to trigger the automation required to deploy new versions of the project.
AWS and the services used
AWS Lightsail will be used to run the containers for each environment. I’m choosing Lightsail over other container services since it has the most straightforward setup on AWS. I’ll also be using an Elastic Container Registry (ECR) to store images of the container. Lightsail will pull the most recent versions of an image tagged qa and prod for each environment when a deployment is triggered.
To handle the process of building and deploying the code, I’ll be using a combination of AWS CodeBuild and some bash scripting within the CodeBuild projects. CodeBuild will contain the definitions of how to build the image and trigger a deployment to the respective Lightsail environment. It will also be used to monitor the qa and main branches on the GitHub repository and execute a build when new commits are added to them. Here is a rundown of the CodeBuild steps used in each project:
QA Deployment Process
A developer pushes new commits to GitHub in the qa branch of the repository.
CodeBuild will detect the updated code in GitHub and start the build in the QA version of the project.
CodeBuild will build the container image, deploy it to ECR tagged as qa, and trigger a deployment to Lightsail.
Production deployment process
The above process is generally the same for the production environment, with the important addition of one step:
When a PR is merged into the main branch in GitHub, CodeBuild will start the deployment to production.
CodeBuild will first use the PlanetScale CLI to trigger a deployment request in PlanetScale to sync up any schema changes between the dev branch and the main branch of the databases.
Once the schema is synced up, CodeBuild will build the container image, deploy it to ECR tagged as prod, and trigger a deployment to the production instance of the service Lightsail.
Now grab a cup of your favorite beverage, open a blank document (we’ll be recording A LOT of details), and let’s get to it!
Throughout this article, there will be various checkpoints like this one. It means you should be adding something to that document to track throughout this guide.
Fork and review the code
Since GitHub will be used as the start of this entire process, you can start by forking the sample repository in GitHub at https://github.com/planetscale/go-bookings-api. Click the “Fork” button in the upper right.

In the Create a new fork page, click “Create fork”, accepting all the defaults.

The code is fairly straightforward and should be simple enough to understand, even if you are not proficient in Go.
data: Contains functions for reading and writing data to PlanetScale, broken down by entity.
routes: Contains functions that handle incoming HTTP requests, broken down by path.
scripts: Stores various scripts that are used throughout this guide.

Before moving on, let’s create a qa branch since we’ll be pulling it for the QA environment in AWS. From the main view of the repository, click on where it says "1 branch."

Now click on "New branch" to open the Create a branch modal.

Finally, name the branch qa then click "Create branch".

Create the database in PlanetScale
In this section, we’ll do the following:
Create a new database in PlanetScale.
Promote the main branch to production.
Create a dev branch that will be used for development & QA testing.
Add the schema to dev and merge it into main.
Seed the dev branch with a little bit of data.
To start this process, let’s create a database in PlanetScale. I’ll be using bookings_api as the name of my database. From the dashboard, click on ”New database” > ”Create new database”.

In the modal, name the database and click ”Create database”.

Once the main branch is finished initializing, click on the ”Branches” tab and select the main branch.

Click ”Promote a branch to production” and confirm on the modal which will appear.

Once the branch has been promoted, click the “Connect” button from the Overview to grab the connection string for later.

Select ”Go” from the Connect with dropdown. If your password shows as a bunch of asterisks, click "New password" to generate a new set of credentials. Copy the DSN variable in the .env tab, and paste it in your document for later.

Make sure to add the connection string for main to your tracking document. Also make sure to add your org name, which can be found right next to the PlanetScale logo in the upper left of the screen
Now head back to Branches and click ”New branch”.

Name the new branch dev and click ”Create branch”.

Once the branch has initialized, click “Connect” here as well to grab the connection string from the dev branch.

Make sure to add the dev connection string to your tracking document.
Now select the "Console" tab and run the following script to create a table.CREATE TABLE hotels(
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL,
  address VARCHAR(50) NOT NULL,
  stars FLOAT(2) UNSIGNED
);

Now add some data to the table with this script.INSERT INTO hotels (name, address, stars) VALUES
  ('Hotel California', '1967 Can Never Leave Ln, San Francisco CA, 94016', 3.8),
  ('The Galt House', '140 N Fourth St, Louisville, KY 40202', 4);

Next, let’s merge our dev branch into main. From the Overview tab of the dev branch, click on "Create deploy request".

Once PlanetScale has finished validating the changes, click "Deploy changes" and your schema changes will be applied to the main branch.

To validate that the changes have been successfully deployed, we can view the schema of the hotels table from the console. Since this database is new, we need to enable the functionality to use the console on the main branch, which is disabled by default. To do this, head to Settings and check the option for "Allow web console access to production branches". Click "Save database settings".

Now go to "Branches" > "main" > "Console", and run the following command.DESCRIBE hotels;

You should see the columns that were created in the dev branch, even though you are connected to main.

Now that our database and branches are set up, we can move into configuring the necessary AWS services.
Set up AWS services
In this section, we’ll configure a number of services in AWS:
Elastic Container Registry (ECR) to store the Docker images for the environments.
We’ll also manually upload the starting images to ECR.
Two Lightsail container services will be configured, one for QA and one for Production.
Some AWS regions are not supported by Lightsail. To keep things consistent, we’ll be using the us-east-1 region throughout this guide. Before proceeding, make sure you’ve switched to us-east-1 using the region switcher in AWS.

Create the Elastic Container Registry
In the AWS global search, enter "Elastic Container Registry" and select that option from the results.

If you don’t have any registries created, click on "Get Started". Otherwise, click "Create repository".

In the Create repository form, set the Visibility settings to "Private" and give the repository a name. I’ll be using bookings-api for the repository name. Scroll to the bottom and click "Create repository."

You should be redirected to your list of repositories. Grab the URI from the list and note it in that document.

Make sure to add the Repository URI to your tracking document.
Open a terminal on your computer. Run the following command to authenticate with your new ECR repository, replacing <REPOSITORY_URI>. You should receive a message stating Login Succeeded.aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <REPOSITORY_URI>

The above command requires the AWS CLI. If you do not have it installed, follow the directions provided on AWS’s guide to installing the CLI.
Pushing to the ECR repository
Before we can deploy containers to Lightsail, we need to get an image of our container into ECR. If you haven’t done so yet, clone the forked repository to your computer. Then open a terminal in the project folder. Run the following commands to build & push the image to ECR, replacing the <REPOSITORY_URI> variable with the URI pulled from ECR.# Build the image
docker build --platform=linux/amd64 -t bookings-api .

# Tag the 'qa' image to be pushed to ECR
docker tag bookings-api:latest <REPOSITORY_URI>:qa

# Push the 'qa' image to ECR
docker push <REPOSITORY_URI>:qa

# Tag the 'prod' image to be pushed to ECR
docker tag bookings-api:latest <REPOSITORY_URI>:prod

# Push the 'prod' image to ECR
docker push <REPOSITORY_URI>:prod

Now head back to the AWS console and open the repository you created earlier. If all was successful, you should see images tagged as prod and qa in the list.

Create the Lightsail instances
Use the AWS global search to find "Lightsail", and open it from the list of results. You’ll be redirected to a completely different UI from the standard AWS console, which is to be expected.

In the Lightsail dashboard, select the "Containers" tab, then "Create container service".

Make sure that the Container service location is set to Virginia, all zones (us-east-1). If not, click "Change AWS region" and select it from the list of available regions.
Under Choose the power, select the "Na" option to keep the cost as low as possible.

Skip the Set up your first deployment section for now. We need to configure access to our ECR instance before we can do that. Scroll down to Identify your service and give your service a name. I’ll name mine bookings-api-service-qa. Click "Create container service" to finish the setup.

Once the container service has been created, you’ll be dropped into the dashboard for the container service. Select the Images tab, scroll to the bottom, and click "Add repository" under Amazon ECR private repositories.

Select the bookings-api repository from the list and click "Add".

If the "Add" button is disabled, it is likely because there is a pending operation for your container service. Check the Status field in the header to make sure nothing is currently being done with the container service.
Lightsail will start provisioning the proper security permissions to permit itself to access the private ECR instance. This may take a few minutes.
Next, select the Deployments tab and click "Create your first deployment".

Complete the form like so:
Container name: bookings-api-qa
Image: <REPOSITORY_URI>:qa
Environment variables:
LISTEN: 0.0.0.0:80
DSN: <PLANETSCALE_DEV_BRANCH_CONNECTION_STRING>
Open ports:
80: HTTP

In the Public endpoint section, select "bookings-api-qa". Finally, click "Save and deploy".

You can monitor the deployment status from the following page.

Once the status has changed to Running, you can click the URL next to Public domain to validate that things are working properly. You should receive a simple text response that says “welcome”.

Add /hotels to the end of the URL and we should see the data from the PlanetScale database (specifically the dev branch).

Now you’ll need to essentially repeat these same steps (create a container service, create an image, and set up the deployment) for the production branch. The main difference will be when setting up the deployment. When creating the container service, set the name to bookings-api-service-prod.

Make sure to permit access to your ECR instance before configuring the deployment. You’ll also need to make sure to update the image tag and the DSN environment variable with the connection string from the main branch of the PlanetScale database.

Generate a Docker Hub Token
Now let’s take a detour and talk Docker Hub. Since Docker Hub limits image pulls based on IP address by default, the chances of our automated system in AWS having that limit already hit are pretty high since it’s a shared environment. In order to bypass this, we’ll need to generate a token for our user account and use that during the build process in AWS.
Log into Docker Hub and navigate to "Account Settings".

Now select "Security" from the left nav, then "New Access Token".

In the modal, give the access token a description and set the permissions to "Public Repo Read-only". This limits what the token can actually be used for with your account. Click "Generate" to get the token.

Take note of this token as it won’t be able to be retrieved again (although you can pretty easily delete this one and create another). Once you are done, you can exit Docker Hub as we won’t need to come back.

Make sure to add the Docker Hub token and your username to your tracking document
Build the pipeline in AWS
Now that the resources to host the API are configured in AWS, we can start building the pipeline that will handle both deploying new versions of the code into each environment and promoting schema updates to the database in PlanetScale. Here is a list of tasks we will accomplish in this section:
Create a CodeBuild project for QA which will build and deploy the container to ECR and Lightsail.
Modify the IAM role for QA to give it the necessary permissions to deploy to ECR and Lightsail.
Create a CodeBuild project for Production which will do the same as above, as well as merge database changes in PlanetScale.
Modify the IAM role for Production to give it the necessary permissions.
QA
We’ll need to perform most of the following steps for both the QA and the Production builds, but we can start with the QA environment. Start by using the AWS search to find ‘CodeBuild’, and select it from the list of results.

Click "Create project" to start building the QA project.

We’ll step through the Create build project form section by section as there are a number of things to set up here. Under Project configuration, name the project bookings-api-qa.

In the Source section, select "GitHub" from the Source provider dropdown. If you’ve already connected your GitHub account, you’ll be able to select from a list of repositories you own, otherwise, you can connect using the "Connect to GitHub" button. This will step you through connecting AWS CodeBuild to your GitHub account.

Once you’ve connected, you’ll get a few more options in this section. Select "Repository in my GitHub" account and use the search under GitHub repository to find the forked version of the code we’ve been using throughout this guide. Set Source version to “qa” since this is the branch we want to build in this project.

In the Primary source webhook events, check the box labeled "Rebuild every time a code change is pushed to this repository". This will allow AWS to configure GitHub to notify AWS when a change is made to the QA branch and to trigger a build on it.

Under Event type, select "PUSH" from the list, which will set up the webhook in GitHub to only send a message when commits are pushed to the repository.

Expand the Start a build under these conditions toggle and add “refs/heads/qa” to the HEAD_REF field. This will tell CodeBuild to only execute this build if a commit was pushed to the qa branch.

The Environment section has a number of items that need to be configured. Here is what each of these should be:
Environment image: Managed image — Uses a provided AWS container image to build the code.
Runtime(s): Standard — The default Standard runtime.
Operating system: Amazon Linux 2 — Since the code should target Linux to be built for Lightsail.
Image: aws/codebuild/amazonlinux2-x86_64-standard:4.0 — The latest Amazon Linux image.
Image version: Always use the latest — Self-explanatory.
Environment type: Linux — The standard Linux environment.
Privileged: Checked — We’re building a docker container so this needs to be checked.
Service role: New service role — Let CodeBuild create a role with the basic permissions for us.

Stay in the Environments section and toggle the "Additional configuration" item to get more options for configuring the environment. Most of these options can remain as is, but we need to add a number of environment variables here so that when a build is triggered, AWS has the necessary info to build and deploy our container image. Find the Environment variables section, and add the following variables. Click "Add environment variable" to add more to the list.
DOCKER_HUB_TOKEN — The token you retrieved from Docker Hub.
DOCKER_HUB_USER — Your Docker Hub username.
REPOSITORY_URI — The ECR repository URI.
PS_CONN_STR — The PlanetScale connection string of the dev branch.
We will be storing these variables as plain text for the purpose of this article. In a real production environment, sensitive credentials should be stored in a more secure system like AWS Secrets Manager

Now onto the Buildspec section. This is where we need to define the steps required to build the image. Since we want to handle the QA and Production build steps a bit differently (specifically when it comes to updating the schema in the PlanetScale database), we need to select "Insert build commands" so we can provide build steps that are not stored with the repository.
Once you’ve selected that, click on "Switch to editor" to get a larger text box and paste the below code in. Click "Update buildspec" when done.version: 0.2
phases:
  build:
    commands:
      # Setup environment
      - docker login -u $DOCKER_HUB_USER -p $DOCKER_HUB_TOKEN
      # Build the project
      - docker build --platform=linux/amd64 -t bookings-api .
      - aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REPOSITORY_URI
      - docker tag bookings-api:latest $REPOSITORY_URI:qa
      - docker push $REPOSITORY_URI:qa
      # Deploy
      - |
        aws lightsail create-container-service-deployment \
          --region us-east-1 \
          --service-name bookings-api-service-qa \
          --containers "{\"bookings-api-qa\":{\"image\":\"$REPOSITORY_URI:qa\",\"environment\":{\"LISTEN\":\"0.0.0.0:80\", \"DSN\":\"$PS_CONN_STR\"},\"ports\":{\"80\":\"HTTP\"}}}" \
          --public-endpoint '{"containerName":"bookings-api-qa","containerPort":80,"healthCheck":{"path":"/"}}'


The rest of the settings can be left as they are. Scroll to the bottom and click "Create build project".
Updating IAM permissions for QA
Although we’ve been allowing AWS to manage permissions for us up to this point, there is a bit of manual configuration that needs to be done before we can build and deploy the container to AWS. Here are the permissions that need to be added to the role for each CodeBuild project:
Permit CodeBuild to pull from the ECR instance.
Permit CodeBuild to create a service deployment in Lightsail.
From within the CodeBuild project, select the Build details tab.

Scroll to the Environment section and click the link under Service role. You’ll be redirected to the Role definition in IAM.

Under Permissions policies, click "Add permissions" > "Create inline policy".

In Create policy view, start with the Service section. Search for “Lightsail” and select it from the results.

Under Actions, search for “CreateContainerServiceDeployment” and select it from the list of results.

Now to limit the boundaries of the policy to ONLY the QA version of our container service, you’ll need to get the ARN of the container service. The only way to do this at the moment is using the AWS CLI. Open a terminal on your computer and run the following command to get a list of the container services and their ARNs:aws lightsail get-container-services --query 'containerServices[*].[containerServiceName,arn]'  --region us-east-1

Your output should show the name of the container service followed by its ARN in a JSON structure.

Make sure to note the ARNs of both container services in your tracking document
Back in IAM, under the Resources section, make sure "Specific" is checked and click "Add ARN". A modal should appear with a field to paste in the ARN we grabbed from the terminal.

Paste the qa ARN in and the other fields should automatically populate. Click "Add" to apply the changes.

You can collapse the Lightsail section and click "Add additional permissions" to get a blank form to add more permissions. Select Elastic Container Registry from the list of services. Under Actions, select the following:
List
DescribeImages
ListImages
Read
BatchCheckLayerAvailability
BatchGetImage
DescribeRepositories
GetAuthorizationToken
GetDownloadUrlForLayer
Tagging
TagResource
Write
CompleteLayerUpload
InitiateLayerUpload
PutImage
UploadLayerPart

The Resources section is a bit more straightforward for ECR. Simply click "Add ARN" and populate the region and repository name.

Once you are done, click "Review policy" at the bottom of the page. Give your policy a name and finish by clicking "Create policy".

Now that everything is set up, let’s run the build. From the CodeBuild project, click on "Start build".

Provided everything is set up properly, you should receive a Succeeded status after the build completes. If not, check the logs below to determine if anything is not set up properly.

Production
Now that QA is set up and ready to go, we need to set up the production pipeline. Since we will also be creating and approving a Deploy Request in PlanetScale, we need to create a service token in PlanetScale first which will allow the CodeBuild project to access our database using the PlanetScale CLI.
In PlanetScale at the root of your organization, click the Settings tab, then "Service tokens". Click "New service token" to open the modal to create a service token.

Give your token a name and click "Create service token". The name is for your reference and does not affect the token in any way.
Your token will be displayed this one time, so make sure to note it down before moving on. Click "Edit token permissions".
Now the next page will show the ID of that token. Note that down as you’ll need it to be set in CodeBuild. Click "Add database access" next.

Make sure to add the PlanetScale service token and service token ID to your tracking document
Now select your database from the dropdown and check the following options:
create_deploy_request
read_deploy_request
approve_deploy_request
Click "Save permissions" once you are done.

Now we can head back to AWS to configure CodeBuild. Most of the steps from the previous section will be carried over with a few minor tweaks. Start by creating a new CodeBuild project named bookings-api-prod.

Under Source, use all the same settings from QA but set the Source version to main to use the main branch from GitHub.

Check the box under Primary source webhook events to "Rebuild every time a code change is pushed".

Set the Event type to "PULL_REQUEST_MERGED".

Pull Requests have the branch data in the BASE_REF field, so expand "Start a build under these conditions" and set BASE_REF to “refs/heads/main”.

Use all the same settings for Environment that were used in QA.

Expand the additional options and find Environment variables. Add the same variables you did for the QA pipeline with the following changes:
PS_CONN_STR — The PlanetScale connection string for the main branch.
PS_TOKEN_ID — The PlanetScale service token ID.
PS_TOKEN — The PlanetScale service token.
PS_ORG — Your PlanetScale org name.

Under Buildspec, select "Insert build commands", then expand the editor to paste the following.version: 0.2
phases:
  build:
    commands:
      # Setup environment
      - docker login -u $DOCKER_HUB_USER -p $DOCKER_HUB_TOKEN
      - curl -LO https://github.com/planetscale/cli/releases/download/v0.112.0/pscale_0.112.0_linux_amd64.deb
      - dpkg -i ./pscale_0.112.0_linux_amd64.deb
      - pscale --version
      # Build the project
      - docker build --platform=linux/amd64 -t bookings-api .
      - aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REPOSITORY_URI
      - docker tag bookings-api:latest $REPOSITORY_URI:prod
      - docker push $REPOSITORY_URI:prod
      # Deploy PlanetScale schema changes
      - |
        DR_NUM=$(pscale deploy-request create bookings_api dev --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq '.number' )
        DR_STATE=$(pscale deploy-request show bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq -r '.deployment.state')
        while [ "$DR_STATE" = "pending" ];
        do
          sleep 5
          DR_STATE=$(pscale deploy-request show bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq -r '.deployment.state')
          echo "State: $DR_STATE"
        done
        if [ "$DR_STATE" = "no_changes" ]; then
          pscale deploy-request close bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG
        else
          pscale deploy-request deploy bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG
        fi
      # Deploy
      - |
        aws lightsail create-container-service-deployment \
          --region us-east-1 \
          --service-name bookings-api-service-prod \
          --containers "{\"bookings-api-prod\":{\"image\":\"$REPOSITORY_URI:prod\",\"environment\":{\"LISTEN\":\"0.0.0.0:80\", \"DSN\":\"$PS_CONN_STR\"},\"ports\":{\"80\":\"HTTP\"}}}" \
          --public-endpoint '{"containerName":"bookings-api-prod","containerPort":80,"healthCheck":{"path":"/"}}'

Before we move on, let’s take a moment and examine the script directly under # Deploy PlanetScale schema changes. I’ve added a commented version to explain exactly what each line is doing:# This line will create a deploy request from the dev branch, and is outputting JSON.
# It’s piping the JSON to `jq`, which is reading the Deploy Request number to the DR_NUM variable.
DR_NUM=$(pscale deploy-request create bookings_api dev --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq '.number' )

# This line grabs the Deploy Request and stores the state in DR_STATE
DR_STATE=$(pscale deploy-request show bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq -r '.deployment.state')

# This loop will wait until PlanetScale has finished checking to see if changes can be applied before moving forward.
while [ "$DR_STATE" = "pending" ];
do
  sleep 5
  DR_STATE=$(pscale deploy-request show bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG --format json | jq -r '.deployment.state')
  echo "State: $DR_STATE"
done

# Once the state has been updated, we’re going to check the state to decide how to proceed.
if [ "$DR_STATE" = "no_changes" ]; then
	# If the state is "no_changes", close the request without applying changes.
  pscale deploy-request close bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG
else
	# If it's anything else, attempt to deploy (merge) the changes into the `main` branch.
  pscale deploy-request deploy bookings_api $DR_NUM --service-token $PS_TOKEN --service-token-id $PS_TOKEN_ID --org $PS_ORG
fi

Scroll to the bottom and click "Create build project".
Updating Prod IAM permissions
Now we need to update the permissions for the role that was created for this project just like we did for QA. Select the Build details tab, find the Environment section, and click the link under Service role.

Click "Add permissions" > "Create inline policy".

Select "Lightsail" as the service, check "CreateContainerServiceDeployment" under Actions, and set the ARN of the production container service for Lightsail. Click "Add additional permissions" to add the ECR entry.

Select "Elastic Container Registry" as the service, add the same list of actions (see below), and set the ARN just as we did in the QA pipeline. Click Review policy once you are finished.
List
DescribeImages
ListImages
Read
BatchCheckLayerAvailability
BatchGetImage
DescribeRepositories
GetAuthorizationToken
GetDownloadUrlForLayer
Tagging
TagResource
Write
CompleteLayerUpload
InitiateLayerUpload
PutImage
UploadLayerPart

Give your policy a name and click "Create policy".

Now head back to CodeBuild and run the new project that was created just to make sure it deploys successfully. You can also monitor the dashboard in PlanetScale to see the deploy request being created and then closed due to no changes needing to be applied to the database.
Testing the entire flow
Now that everything has been configured and we’ve tested everything manually, it’s time to see this entire thing in action! In this section, we will:
Add a new column to the dev branch in our PlanetScale database.
Add a new field to the model in the API.
Push the code to the qa branch in GitHub, triggering a deployment to QA in AWS.
Create and merge a PR to the main branch in GitHub, triggering a deployment to Production in AWS. This will also handle merging the dev branch of our PlanetScale database into main.
First log into PlanetScale, navigate to the dev branch of your database, and open the "Console" tab. Run the following commands individually in the console:ALTER TABLE hotels ADD description VARCHAR(400);
DESCRIBE hotels;

You should see the new column that was added.

Now run the following script to populate the new description field for the first hotel.UPDATE hotels
	SET description = 'Welcome to the Hotel California, such a lovely place (such a lovely place)'
	WHERE id = 1;

Here is the script being run, as well as SELECT statements both before and after the UPDATE statement above.

Now we need to make a change to the code. Make sure you're on the qa branch. Open data/hotels.go and update the Hotel type to have a Description field. Make sure the type is *string so it can handle NULL values since we only added a description to one hotel.type Hotel struct {
	Id          int64
	Name        string
	Address     string
	Stars       float32
	Description *string // Add Description field
}

Scroll down a bit to the FetchHotels method and update the line with rows.Scan and add a ref to that new Description field.func FetchHotels() ([]Hotel, error) {
	conn, err := GetDbConnection()
	if err != nil {
		return nil, errors.Wrap(err, "(FetchHotels) GetConnection")
	}
	query := "SELECT * FROM hotels"
	rows, err := conn.Query(query)
	if err != nil {
		return nil, errors.Wrap(err, "(FetchHotels) db.Query")
	}
	hotels := []Hotel{}
	for rows.Next() {
		var hotel Hotel
		// Add `&hotels.Description` to the end of the following line, within the parens
		err = rows.Scan(&hotel.Id, &hotel.Name, &hotel.Address, &hotel.Stars, &hotel.Description)
		if err != nil {
			return nil, errors.Wrap(err, "(FetchHotels) rows.Scan")
		}
		hotels = append(hotels, hotel)
	}
	return hotels, nil
}

As an example, here is what the diff looks like in VSCode after the changes were made.

Now commit the code and push it to the repository. Check with CodeBuild and a build on QA should be in progress, triggered from the commit.

Once the build is finished, check in with the QA container service in Lightsail. Provided the status is Running, you can use the Public domain URL to test the changes. Since we’ve updated the FetchHotels function, add /hotels to the end of the URL to see the list of hotels with the new Description field added.

You should see the same list of hotels, with the first one having the description we added earlier in this section.

Head back into GitHub and create a pull request, comparing qa and main. By default, GitHub will try to create a Pull Request comparing your repository with the upstream PlanetScale version, so make sure to set the base repository to your version.

Give your pull request a name and click "Create pull request".

Go ahead and merge the pull request.

Head back into CodeBuild, and you’ll notice that the bookings-qa-prod project has a new build. Note that the Source version for the build is pr/5, referring to Pull Request #5, which was the PR number that I had created in my repo.

Over in the PlanetScale dashboard, you can also see that a Deploy Request was created and deployed from the CodeBuild project automatically.

Conclusion
While that was certainly a lot of ground to cover in a single article, building a pipeline from the ground up will have many moving parts and there is often quite a bit of configuration to get them all talking properly.
The goal was to create a realistic example of how branching in PlanetScale can help speed up development by automating the process of testing and merging changes between two instances of a database.
Did you enjoy this article? Do us a favor and share it with someone awesome!]]></content>
        <summary><![CDATA[Learn how to build an automated DevOps pipeline with AWS Lightsail CodeBuild and PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>TAOBench: Running social media workloads on PlanetScale</title>
        <link href="https://planetscale.com/blog/taobench-running-social-media-workloads-on-planetscale" />
        <id>https://planetscale.com/blog/taobench-running-social-media-workloads-on-planetscale</id>
        <published>2022-09-08T16:36:00.000Z</published>
        <updated>2022-09-08T16:36:00.000Z</updated>
        
        <author>
          <name>Liz van Dijk</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[As PlanetScale expands its customer base, our Tech Solutions team often gets to collaborate with companies trying to evaluate whether PlanetScale fits their existing development stack’s requirements. Understanding the impact of database performance and how it relates to an application’s overall performance profile can be a complicated exercise, so being able to reference standardized benchmark workloads can be helpful to build a deeper understanding of what to expect.
Last week, we published a first taste of the work we’ve been doing in this area: a short introduction to how PlanetScale and Vitess can be used to linearly scale a workload like TPC-C to one million queries per second. Focusing purely on the number itself is meaningless: in theory, the sky is the limit, and there are many Vitess users out there with workloads that scale to many times that size. The real value of this exercise lies in showcasing how workloads like TPC-C are able to scale linearly and predictably, without compromising on relational and transactional requirements.
The TPC-C benchmark has had a very long life, and has remained remarkably relevant until this day, but there are scenarios it doesn’t cover. Audrey Cheng and her team at University of California, Berkeley identified a real gap when it comes to available synthetic benchmarks for a more recent, but highly pervasive workload type: social media networks. In collaboration with engineers at Meta, she has published two white papers. The first focuses on analyzing the challenges of Meta’s overall workload composition. The second, presented this week, summarizes their proposal for an open source benchmark called TAOBench, which applies these principles and is compatible with many modern databases today.
Since Vitess powers some of the largest relational workloads on the internet today, their efforts provided us with the perfect opportunity to collaborate on a deeper dive into this particular workload type. While Cheng and her team are presenting the release of their second white paper at the Very Large Data Bases conferences this week, PlanetScale have added TAOBench to our arsenal of standardized benchmarks as well, and will be publishing more in-depth articles about our results over the coming weeks.
For now, let’s summarize the benchmark itself and PlanetScale’s published results.
The workload and dataset composition
The TAOBench team spent significant time analyzing a representative slice of Meta’s workload and summarized its key characteristics into two main scenarios:
Workload A (short for Application) focuses specifically on a transactional subset of the queries
Workload O (short for Overall) encompasses a more generalized profile of the TAO workload
In practice, the behavior of these workloads may come across as somewhat synthetic and reductionist, but taking some time to read the white papers will clarify how the two formulas end up very closely resembling the real world behavior observed at Meta.


Objects and edges
The schema for TAOBench is straightforward: it consists of 2 tables, one called objects and one called edges, concepts that loosely translate to the social graph of entities (think “users”, “posts”, “pictures”, etc.) and to the various types of relationships they have with each other (think “likes”, “shares”, “friendships”, etc.).
In simple relational database terms: The edges table can be viewed as a “many-to-many” relationship table that links rows in objects to other rows in objects.

Data composition matters
The statistical distribution of data in both tables is dependent on the chosen workload type (A or O), so data should be reloaded when switching between them. Focusing the workload around these two simplified concepts allows the benchmark to simulate typical “hot row” scenarios that can be particularly challenging for relational databases to handle. Think of what happens when something goes viral: a thundering herd of users comes through to interact with a specific piece of content posted somewhere. On the database level, beyond a sudden surge in connections, this can also translate into various types of locks centered around the backing rows for that piece, which can have rippling effects that ultimately translate to slower content access times for the users on the platform.
The TAOBench test timeline
The TAOBench workload executes in three distinct phases:
During the “load” phase, it performs bulk inserts, populating the objects and edges tables according to the chosen workload scenario.
The “bulk reads” phase (which is initiated at the start of any real benchmark run) performs very aggressive range scans across the entire dataset to serve as general “warmup” to whichever caching mechanisms may be in place, and also aggregates the necessary statistical information to feed into the experiments themselves. This phase is not measured, but can be extremely punishing to the underlying infrastructure.
The “experiments” phase accepts a set of predefined concurrency levels and runtime operation targets to help scale the chosen workload to various sizes of infrastructure.
PlanetScale’s results and initial conclusions
The results published in Cheng’s white paper were independently measured by the TAOBench team against PlanetScale infrastructure. Since the benchmark code has been made publicly available, PlanetScale has been able to verify them internally. Revisiting our earlier considerations about the “One million queries per second with MySQL” blog post, the numbers themselves are by no means approaching the limits of what can be accomplished on our infrastructure. Rather, they represent what is achieved by imposing an explicit resource limit to the cluster.
The limit Cheng’s team set for their tests was 48 CPU cores. Since the test was performed against PlanetScale’s multi-tenant serverless database offering where certain infrastructural resources are shared across multiple tenants, we underprovisioned the “query path” of the cluster itself to use a maximum of 44 CPU cores out of the requested 48 maximum. The other 4 cores would be used for multi-tenant aspects of the infrastructure, such as edge load balancers. Stay tuned for a more in-depth blog post of how the resources were allocated to the various Vitess components we provisioned and for an exploration of how the various OS-level metrics looked.
Our key takeaway from the initial results as published is the sustained stability of PlanetScale clusters under even the most extreme resource pressure. As is to be expected in an artificially constrained environment, TAOBench’s “experiments” phase uses gradually increasing concurrency pressure to bring the target database to its knees, and once 44 cores are all running at 100%, throughput (measured in requests per second) is expected to hit a ceiling while average response times increase.
Most systems have some stretch, even while running at what looks like 100% CPU. With ever increasing workload pressure, though, every piece of software eventually starts experiencing failures, by way of thrashing, congestive collapse or other effects.
Distributed database systems are not magically protected from these failure scenarios. If anything, increased infrastructural complexity and the potential for competition amongst different types of resources generally translates to many more interesting ways things can break down. Observing how software behaves in these types of failure scenarios can reveal a lot about what might be expected in those situations that are impossible to plan for. Finding the balance between resource efficiency and graceful failure handling requires equal parts of software maturity and ongoing infrastructural engineering excellence.
That is why enterprise organizations are increasingly choosing to have their database workloads managed by PlanetScale. Don’t hesitate to contact us if you’d like to talk about how we can apply these principles to your use case.]]></content>
        <summary><![CDATA[Learn how we used TAOBench with PlanetScale to benchmark social media workloads]]></summary>
      </entry>
    
      <entry>
        <title>Gated Deployments: addressing the complexity of schema deployments at scale</title>
        <link href="https://planetscale.com/blog/gated-deployments-addressing-the-complexity-of-schema-deployments-at-scale" />
        <id>https://planetscale.com/blog/gated-deployments-addressing-the-complexity-of-schema-deployments-at-scale</id>
        <published>2022-09-06T21:50:00.000Z</published>
        <updated>2022-09-06T21:50:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today we introduce Gated Schema Deployments, which help us think of schema changes as more of atomic operations, solve the complexity of making changes across a sharded database, and answer one of the most common requests from our users. Let’s first understand what the deployment complexity is.
PlanetScale already offers non-blocking schema changes. You make some schema changes in your development branch, you submit a deployment request, it gets approved, and each change is then applied, non-intrusively, to your production database. Some deployments take mere seconds and some take hours, based on the size of the affected tables and based on the production workload.
The problem begins with multi-dimensional deployments. With these, you will either have multiple schema changes in the same deployment, or have a single change deployed over a multi-sharded database, or both.
The multi-change dimension
A schema deployment may contain multiple changes. Those will often correlate to each other. E.g., a new table is added, and a new column is added in an existing table, that refers to the PRIMARY KEY column of the new table. Or, maybe a new type of data is added (e.g., a location), that applies to many tables, and so multiple tables are modified.
The common practice to deploy multiple changes is to run them one at a time. A CREATE TABLE is a simple and lightweight enough of an operation, but an ALTER TABLE change is frequently heavyweight, and running multiple of those in parallel can hog your database. But running one migration at a time also creates a bit of discrepancy. Some of your changes may have been applied, some not. And should anything go wrong in the duration of the deployment, you’re left with a half-baked change.
With Gated Deployments, PlanetScale applies all changes as closely as possible, seconds apart from each other. In order to do that, it may run some migrations in parallel, but without exhausting database resources. Consider two ALTER TABLE changes over two large tables. The bulk work is copying over the existing table data, which is done sequentially per table. But tailing the changelog and applying the ongoing changes can be done in parallel.
We run as much of the bulk work as possible upfront, sequentially, and then run the more lightweight work in parallel. We thus maintain multiple schema changes at once, ongoing. Once all changes are in good shape, we complete the migrations as closely as possible. While not strictly atomically, the deployment can be considered more atomic; up till the final stage, no change is reflected in production. In fact, the deployment may be canceled at any point up until its completion time.
The multi-shard dimension
As your database scales, you want to address a new, multi-sharded dimension. By design, a multi-sharded database acts as though it were a monolith, and yet the shards are independent of each other. This has important benefits such as resource allocation or minimization of the blast radius when anything goes wrong. But it also raises new challenges: how do you keep the schema in sync across shards?
Different shards work under different workloads, and may run a schema migration to completion minutes to possibly hours apart. A multi-sharded database where different shards have different schemas can be either inconsistent in performance, or outright inconsistent in design.
Our gated deployments minimize that gap period, by tracking the progress of a schema deployment across all shards and holding off the final switch to the new schema until all shards are ready. The switch then takes place almost simultaneously (though not atomically) on all shards.
Solving the most requested schema revert feature: controlled gated deployments
PlanetScale offers a powerful flow: the ability to revert a deployment. A way to revert to the previous schema, but without losing any data accumulated in the interim. When a deployment completes, PlanetScale offers a 30 minute window in which it’s possible to revert, if needed.
The most common questions around our schema revert feature revolves about that time limit: “why 30 minutes?”, “What happens if the deployment completes at 2:00am over the weekend, and I can’t access my laptop in time?”, “Can we have better control over the timings?”
Responding to these questions, Gated Deployments now allow users to choose the deployment completion time at their discretion. By default, deployments auto-complete when ready, and this is great for most cases, and clears up the deployment queue. However, if the user so chooses, they may uncheck the "Auto-apply" box.

The deployment now stages all changes and runs all long-running tasks. When all changes are ready, the deployment awaits the user to hit the "Apply changes" button. With no input from the user, the deployment will just keep on running in the background, always keeping up to date with data changes.

For example, a deployment with three ALTER statements over large tables may take a day to run. It may be 2:00am on the weekend when it finally completes the hard work of copying the dataset. But it won’t apply the changes: the deployment will just keep on syncing and responding to ongoing changes like any INSERT, DELETE or UPDATE on the relevant tables. Come Monday morning, when the developer is at their desk and fully prepared to begin their work week, they may click the “Apply changes” button. The deployment then completes, and the 30 minute window for schema reverts starts ticking, all while the developer is in control of the situation.
Try it out
This release of Gated Deployments brings us another step closer to our goal of a more modern and cohesive development flow, where schema changes happen alongside application development, not in isolation.]]></content>
        <summary><![CDATA[We just introduced a new feature Gated Deployments that gives you more control over when your schema changes deploy.]]></summary>
      </entry>
    
      <entry>
        <title>One million queries per second with MySQL</title>
        <link href="https://planetscale.com/blog/one-million-queries-per-second-with-mysql" />
        <id>https://planetscale.com/blog/one-million-queries-per-second-with-mysql</id>
        <published>2022-09-01T18:43:00.000Z</published>
        <updated>2022-09-01T18:43:00.000Z</updated>
        
        <author>
          <name>Jonah Berquist</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Knowing your database can scale provides great peace of mind. We built PlanetScale on top of Vitess so that we could harness its ability to massively scale. One of the core strengths in our ability to scale is horizontal sharding. To demonstrate the power of horizontal sharding, we decided to run some benchmarking.
We set up a PlanetScale database and started running some benchmarks with a common tpc-c sysbench workload. We weren’t aiming for a rigorous academic benchmark here, but we wanted to use a well-known and realistic workload. We will have more benchmark posts coming and have partnered with an academic institution who will be releasing their work soon.
For this post, there are two goals. The first is to demonstrate PlanetScale’s ability to handle large query volumes. For this, we set a goal of a million queries per second. In Vitess terms, this is not a large cluster. There are many Vitess clusters running at much higher query volumes, but we think it’s a good baseline. The second is demonstrating predictable scalability through horizontal scaling. Increasing throughput capacity is a matter of adding more machines.
Scaling up by adding shards
We started with an unsharded database, then created a vschema and began sharding. Because we like powers of 2, we started with 2 shards and began doubling our shard count for subsequent runs. For each level of sharding, we ran sysbench several times, with increasing numbers of threads. With each iteration, we found there was a point at which additional threads no longer resulted in additional throughput. Instead, query latency increased as we reached our throughput limits.
In the graphs below, which were run against a 16 shard database, you can see the increase in the number of sysbench threads reflected in the number of connections. As the number of threads increases, so does the throughput in queries per second.


Hitting the limits
However, we begin to see diminishing returns as we saturate the resources of each shard. This is noticeable above when the QPS increase was greater between 1024 threads and 2048 threads than it was between 2048 threads and 4096 threads. Similarly, in metrics from vtgate shown below, we see an increase in latency as we max out our throughput. This is particularly evident in our p99 latency.


At this point, we know we need additional shards to get more throughput.
Adding more shards
In the data below, you can see the approximate doubling of queries per second as we double the number of shards. With 16 shards we maxed out around 420k QPS. With 32 shards we got up to 840k QPS. While we could continue doubling the number of shards indefinitely, we had set for ourselves a target of one million queries per second.

Achieving one million queries per second
It’s important to note that, while we like powers of 2, this isn’t a limitation, and we can use other shard counts. Since we had just over 800k QPS with 32 shards, we calculated that 40 shards would satisfy our 1M QPS requirement. When we spun this database up and ran our parallel sysbench clients against it, these were the results: over one million queries per second sustained over our 5 minute run.

If you’d like to experience this level of database power, get in touch with our sales team. We ran this benchmark against a single-tenant environment, with levels of resources that we reserve for our enterprise customers. We also made a few non-standard configuration tweaks, including raising some query and transaction timeouts to accommodate this sysbench workload.
This is the first in a series of PlanetScale benchmark posts. Stay tuned for more.]]></content>
        <summary><![CDATA[Discover how PlanetScale handles one million queries per second (QPS) with horizontal sharding in MySQL]]></summary>
      </entry>
    
      <entry>
        <title>Zero downtime Laravel migrations</title>
        <link href="https://planetscale.com/blog/zero-downtime-laravel-migrations" />
        <id>https://planetscale.com/blog/zero-downtime-laravel-migrations</id>
        <published>2022-08-29T14:00:35.694Z</published>
        <updated>2022-08-29T14:00:35.694Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[PlanetScale allows you to branch your database in the same way you branch your code. Throughout this article, we discuss both database branches and application code branches. For clarity, we’ll refer to PlanetScale database branches as "PlanetScale branches" and branches for your application code as "code branches".
The problem with running migrations at deployment
In many Laravel workflows, your deployment script includes php artisan migrate, which runs your new migrations on your production database every time you deploy. As an example, let’s look at the default quick deploy sequence that Forge runs when you push to production:
Navigate into the site’s directory
Run git pull
Run composer install
Run php artisan migrate
Making schema changes, such as ALTER, CREATE, etc, directly on your production database is known as Direct DDL (Data Definition Language). Direct DDL can be dangerous, as it can lead to locking in your tables, which may leave your tables completely inaccessible, even for reads. Direct DDL is also not rate-limited or isolated and does not have a rollback strategy that doesn’t include more locks, or worse, data loss.
Running php artisan migrate on your production database at deployment can be dangerous, as this can lock your database, preventing reads and writes.
To give a little more context, let’s briefly look at how locking in MySQL works.
Locking in MySQL
For MySQL to execute a transaction, such as an ALTER TABLE statement, it sometimes has to lock the table to guarantee Isolation.
For example, if you deploy a schema change that increases the size of a varchar column, a lock may be temporarily placed on that entire table so that the transaction can be completed. This means that nobody will be able to access the table (read or write) while the operation is occurring.
There are different types of locks and a lot of different scenarios that affect when and what type of lock is used.
You can find a full list of operations that cause locking in the MySQL docs.
So if you want to avoid downtime or "maintenance mode" due to locking, what do you do instead?
How PlanetScale enables non-blocking schema changes
Online schema change tools allow you to avoid locking. Instead of applying changes directly to a table, we follow this process:
Create a copy of the table (known as a shadow table)
Apply the schema changes
Get the data in sync between both tables
Swap the tables atomically
Drop the old table
PlanetScale handles all of this for you with our branching workflow.
PlanetScale workflow

Whenever you need to make a schema change, you’ll:
Create a PlanetScale development branch (an isolated copy of your database) off of your production schema.
Introduce the changes on the PlanetScale development branch.
When you're finished making schema changes, create a deploy request.
Your team can review and approve it.
Click "Deploy", and your schema changes will be added to the deployment queue.
This is where the online schema change magic happens.
Consider you add a migration to your Laravel app that runs this SQL to increase the size of a description column:ALTER TABLE posts ALTER COLUMN description VARCHAR (300);

When PlanetScale applies that migration via deploy requests, we copy the existing posts table to a new shadow table, update the description column, make sure both are in sync, and initiate the cutover where we swap the two tables.
This way, the original table never has to get locked.
This may not seem like a huge deal if you don’t have a lot of traffic, and the chances of someone trying to access a table during a schema change are small. However, as your application grows and migrations take longer to run, you may need to eventually solve this.
Fortunately, deploying schema changes with PlanetScale doesn’t require much extra effort, and, most importantly, will be a lot less stressful for you in the long run knowing that you will never have to deal with blocking schema changes.
When to run migrations
Now that you know why you shouldn’t run php artisan migrate during deployment, the next natural question is:
When do I run my migrations?
The short answer is: it depends. Let’s look at two examples:
Example 1: You're adding a field to an input form in your application code, which also requires adding a column to one of your tables. In this case, you have to make sure your schema has been updated in production before that application code goes live.
Example 2: You're getting rid of an existing column on one of your tables. In this case, you need to make sure you stop allowing writes to it from the application code before the schema goes live.
As you can see, the type of schema change you're making affects whether you run migrations before or after your application code ships.
To simplify this, the next section includes a blueprint for each scenario. You can follow these steps for each case as they come up for your application. You’ll notice that the first few steps are always the same, with variation in the last few steps.
A note on Laravel migrations
Just to recap, you can still use Laravel migrations to modify your schema, but you should only run them on your application's dev environment. Your dev environment will be connected to your PlanetScale development branch, so the migrations will run on your PlanetScale development database and can be safely merged into production when ready.
Do not run them on your production server. Your production server is connected to your main production PlanetScale database, so PlanetScale is already handling it for you when you deploy your PS dev branch to production.
It’s also worth mentioning, if you do try to run migrations on production, it will fail because, in order to protect your production environment, PlanetScale does not support direct DDL on production branches, unless you disable safe migrations. We ultimately leave that decision up to you, but turning off safe migrations means you run the risk of locking tables, which can lead to downtime.
Overall workflow
The following section covers the schema change blueprint that was discussed above. We cover how to add a column/table, drop a column/table, and change a column/table name.
Add a column or table
In this scenario, you want to make sure your schema change is live in production before you start writing to it from your application code.
To add a new column or a new a table:
Create a development code branch off of your Laravel application.
Create the Laravel migrations in your application to modify the schema.
Create a PlanetScale development branch.
Connect the code dev branch of your Laravel application to your PlanetScale dev branch.
Run the Laravel migrations to make the schema change on the PlanetScale dev branch.
Deploy your PlanetScale schema change deploy request. This is where the non-blocking schema change workflow happens that was discussed earlier. Once the deployment is complete, your production database will have the new schema.
Once the schema is live, deploy the code to write to the new column.
Drop a column or table
In this scenario, you want the schema change to go live after you update your application code to ensure that your application is no longer using the column or table that you're dropping.
To drop a column or table:
Create a development code branch off of your Laravel application.
Create the Laravel migrations in your application to modify the schema.
Create a PlanetScale development branch.
Connect the code dev branch of your Laravel application to your PlanetScale dev branch.
Run the Laravel migrations to make the schema change on the PlanetScale dev branch.
Deploy the code updates so that you're no longer writing to the column or table.
Once the code is live, deploy your PlanetScale deploy request to drop the column or table.
Change a column name or table name
Changing the name of a column or table is a little more tricky and requires a multi-step process. To avoid downtime, you don’t want to change the name directly, but rather clone the column and rename it there.
Let’s look at the process in the context of changing a table name:
Create a development code branch off of your Laravel application.
Create the Laravel migrations in your application to modify the schema.
Create a PlanetScale development branch.
Connect the code dev branch of your Laravel application to your PlanetScale dev branch.
Run the Laravel migrations to make the schema change on the PlanetScale dev branch.
Deploy the PlanetScale migration that adds a new table with the new name to your production database.
Once that’s live, deploy a code update to begin writing to new table AND old table. Continue reading from the old table, as the existing data won’t be copied over yet.
Run a script to copy over the existing data from the old table to the new table.
The tables should now be in sync.
You can now deploy a code update to also read from the new table. At this point, you should not be using the old table at all anymore, making it safe to drop.
Once you confirm you're no longer using the old table, deploy your PlanetScale deploy request to drop the table.
You can use PlanetScale Insights, our in-dashboard query monitoring tool, to help investigate if a table is no longer in use.
Bonus: Revert schema changes in Laravel
Another cool benefit that comes from this online schema change method is the ability to instantly revert a schema change. If you deploy a bad schema change, you have 30 minutes to undo it by clicking a revert button in our dashboard.
How do schema reverts work
We mentioned earlier that instead of directly applying schema changes, we make a copy of the table (shadow table) and apply them to that. Once the tables are in sync, we swap them, making the shadow table the new production table.
After we swap the original table and the shadow table, instead of just dropping the original table, we actually keep it around for 30 minutes. During those 30 minutes, we continue syncing the two tables. Any changes to the production table data are copied back to the original table.
You may have guessed what comes next. With this original table still hanging out, you have the ability to swap them back again, thus undoing the schema change! You can revert a schema change with just a click of a button without losing the data that was written in the meantime.¹
You can learn more about this full process in our How schema reverts work blog post.
¹ There are some scenarios where a revert may not work. In fact, the ALTER TABLE example we used earlier where we increase the varchar size is one of these scenarios. If any data was written to the table that was larger than the original varchar size, it won’t fit once you revert. In those situations, we will attempt to revert, but if the integrity of your data would be affected we will not proceed.]]></content>
        <summary><![CDATA[Learn how to run no downtime non-blocking schema migrations in your production Laravel app with PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Run SQL script files on a PlanetScale database</title>
        <link href="https://planetscale.com/blog/run-sql-script-files-on-a-planetscale-database" />
        <id>https://planetscale.com/blog/run-sql-script-files-on-a-planetscale-database</id>
        <published>2022-08-25T15:56:00.000Z</published>
        <updated>2022-08-25T15:56:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[If you’ve ever had a large number of commands you need to run against a MySQL database, having to manually type them into the client of your choice can be a bit of a pain. Luckily, using the PlanetScale CLI, you can easily batch commands to your PlanetScale database using script files on your local dev computer!
In this guide, I’ll show you how to create an empty database and populate it with data using a sql script file. Before you follow along, please make sure you have the following:
A PlanetScale account.
The PlanetScale CLI installed and configured.
You’ll also need to have a script available to run if you don’t have one yet. I’ll be using the following script, which is a snippet from the go-bookings-api sample repository:CREATE TABLE hotels(
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL,
  address VARCHAR(50) NOT NULL,
  stars FLOAT(2) UNSIGNED
);
INSERT INTO hotels (name, address, stars) VALUES
  ('Hotel California', '1967 Can Never Leave Ln, San Francisco CA, 94016', 7.6),
  ('The Galt House', '140 N Fourth St, Louisville, KY 40202', 8.0);

Save the above SQL to a new file on your system called create_db_script.sql. Open a terminal in the same directory where you saved the file. Start by running the pscale database create command followed by a database name to create a new database.pscale database create travel_api


Since creating a database on PlanetScale creates the main branch by default, we can use this branch along with the pscale shell command to pipe in the commands from that script file saved earlier. You won’t receive any output if the script ran successfully.pscale shell travel_api main < ./create_db_script.sql


Now we can use the shell again to run commands manually on the database. You’ll notice the prompt changes to show database_name/branch_name> instead of your default terminal prompt.pscale shell travel_api main


Run the show tables command to show that the hotels table was created.SHOW TABLES;

You should see this output:+----------------------+
| Tables_in_travel_api |
+----------------------+
| hotels               |
+----------------------+

Now run a SELECT statement on hotels to see the data that was populated.SELECT * FROM hotels;


While this was a relatively simple example, imagine a scenario where you need to create and populate an entire schema using just commands. Doing it in this manner can be much simpler than manually entering all these commands in!]]></content>
        <summary><![CDATA[Learn how to run commands in batch against a PlanetScale database using the PlanetScale CLI.]]></summary>
      </entry>
    
      <entry>
        <title>How product design works at PlanetScale</title>
        <link href="https://planetscale.com/blog/how-product-design-works-at-planetscale" />
        <id>https://planetscale.com/blog/how-product-design-works-at-planetscale</id>
        <published>2022-08-22T14:23:00.000Z</published>
        <updated>2022-08-22T14:23:00.000Z</updated>
        
        <author>
          <name>Jason Long</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Our product design process is more lightweight and collaborative than many companies. We don’t have a rigid set of rules we follow. We don’t have product managers. We do just enough design exploration work to feel confident in the direction. Perhaps most interestingly, we don’t do design “handoffs” — we code right alongside the engineers on our teams.
Keeping planning light
Our company roadmap is determined by our leadership team working in tandem with engineering and product design. This is a mix of work that furthers the company’s vision, exposes more of the power of Vitess, and addresses customer feedback. These projects are prioritized and product design begins exploring possible solutions. For larger or less-defined features, sketches on paper or iPad are usually the most efficient way to start focusing in on the preferred direction.

Low-fidelity sketches of deploy requests and schema reverts
Starting with a prototype
Once a direction starts to become clear, we move into high-fidelity mockups in Figma. Sometimes it’s helpful to experiment with a specific element in Codepen. For more intricate user flows, we’ve found it useful to build prototypes in Figma to share with team members. With Figma’s prototyping features, it’s easy to create a realistic UX complete with long-running processes, state changes, and UI transitions. Designers can even observe others while they navigate through a prototype, often revealing friction points needing smoothed out.

Getting to code quickly
We don’t spend time mocking up every possible UI state. If everyone involved is feeling comfortable with the core parts of the design, we move into code and continue refining the UI there. From here on out, the process is a close collaboration with the engineering team. Our product designers are able to write HTML/JSX and CSS and we can help build out basic components in our Next.js/Typescript application. We try to address all of the structural and styling pieces, leaving the engineers with something as close to fill-in-the-blanks as possible.return (
  <>
    {true /* TODO: if anything in deploy queue */ && (
      <div className='rounded-b border-t px-3 py-2'>
        <p>
          {true /* TODO: if queue length = 1 */ && <>There is a deployment queued to deploy</>}
          {false /* TODO: if queue length > 1 */ && (
            <>There are {/* TODO: queue length */} deployments queued to deploy</>
          )}
          ({/* TODO: loop over queue, comma-separate links */}
          <Link href={/* TODO: DR link */}>
            <a>#{/* TODO: DR number */}</a>
          </Link>)
        </p>
      </div>
    )}
  </>
)

Annotating TODOs in a React component
We will often kick off feature development by adding the necessary feature flags and checks to the API and front-end. These flags allow us to enable new features for specific people and teams. Later, the entire company and early-access customers can be included before shipping to everyone.
Our employees have a high level of trust with each other and the autonomy to decide how best to approach a problem, implement a solution, and ship it. Because our product designers can code, we can avoid the standard handoff process. In our experience, this results in less friction between teams and a better product for our customers.]]></content>
        <summary><![CDATA[Learn about the lightweight and highly collaborative process our product design team follows to ship quickly at PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing the PlanetScale serverless driver for JavaScript</title>
        <link href="https://planetscale.com/blog/introducing-the-planetscale-serverless-driver-for-javascript" />
        <id>https://planetscale.com/blog/introducing-the-planetscale-serverless-driver-for-javascript</id>
        <published>2022-08-18T14:31:00.000Z</published>
        <updated>2022-08-18T14:31:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        <author>
          <name>Matt Robenolt</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today we are introducing the PlanetScale serverless driver for JavaScript, a Fetch API-compatible database driver.
This new driver and infrastructure update brings you:
The ability to store and query data in PlanetScale from environments such as Cloudflare Workers, Vercel Edge Functions, and Netlify Edge Functions.
Infrastructure improvements that enable global routing and improve connection reliability and performance.
We put together a demo application that demonstrates how to implement this with Cloudflare Workers, Vercel Edge Functions, and Netlify Edge Functions.
Until today, you could not use PlanetScale in these environments because they require external connections to be made over HTTP and not other networking protocols. Connections with other MySQL drivers speak the MySQL binary protocol over a raw TCP socket. Our new driver uses secure HTTP, which allows you to use PlanetScale in these constrained environments. The driver works in any environment that uses the Fetch API.
In addition to Cloudflare Workers, Vercel Edge Functions, and Netlify Edge Functions, other serverless environments from these and other providers like AWS typically need databases to support hundreds, thousands, or even tens of thousands of database connections. PlanetScale has always handled these high concurrent connections effortlessly, but our underlying infrastructure change provides a much faster connection path.
The next generation of PlanetScale infrastructure
Backing our new driver is a new HTTP API and global routing infrastructure to support it. Note: The HTTP API is currently not documented, but we plan on documenting it when we’re ready to officially release it.
Outside of restrictive environments, an HTTP interface for our database connectivity gives us other benefits in serverless environments, such as a modern TLS stack for faster connections with TLS 1.3, connection multiplexing with HTTP/2, protocol compression with gzip, brotli, or snappy, all of which lead to a better experience for serverless environments.
Our new infrastructure and APIs also enable global routing. Similar to a CDN, global routing reduces latency drastically in situations where a client is connecting from a geographically distant location, which is common within serverless and edge compute environments. A client connects to the closest geographic edge in our network, then backhauls over long-held connection pools over our internal network to reach the actual destination. Within the US, connections from US West to a database in US East reduce latency by 100ms or more in some cases, even more as the distance increases.
How to use the PlanetScale serverless driver for JavaScript
First, install the package in your environment:npm install @planetscale/database

You can see the driver's code and documentation here.
The driver will first have you connect with a PlanetScale-provided host, username, and password. In your database Overview page, click the Connect button and select “@planetscale/database” from the “Connect with” dropdown. You can copy and paste the host, user, and password into your code. When deploying your code, we recommend creating environment variables in the serverless platform of your choice to store these variables.import { connect } from '@planetscale/database'

const config = {
  host: '<host>',
  username: '<user>',
  password: '<password>'
}

Then, once your connection configuration is set, you will connect to and execute a SQL command on PlanetScale.const conn = connect(config)
const results = await conn.execute('SHOW TABLES')
console.log(results)

The driver also handles your SQL sanitization to help prevent security issues like SQL injection. For example, this is useful in queries like the following with a parameter:conn.execute('SELECT * FROM users WHERE email=?', ['foo@example.com'])

You can read more about the driver and its features in the PlanetScale serverless driver for JavaScript documentation.
Want to try it out?
You can check out the example application code from github and run the application to try out these features.In the app, you can choose to have the data pulled from a PlanetScale database using Cloudflare Workers, Vercel Edge Functions, or Netlify Edge Functions.

We have separated how each of these functions works. You can see the Cloudflare Workers, Vercel Edge Functions, and Netlify Edge Functions examples in their own subdirectory.
Try it out yourself
Ready to try out the driver in your serverless and edge compute platform of choice? Get started in the PlanetScale documentation.
Tweet at us @planetscale or post in our GitHub Discussion group to share your experience with the new driver.]]></content>
        <summary><![CDATA[You can now use PlanetScale in HTTP-only environments like Cloudflare Workers, Vercel Edge Functions, and Netlify Edge Functions.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing FastPage: Faster offset pagination for Rails apps</title>
        <link href="https://planetscale.com/blog/fastpage-faster-offset-pagination-for-rails-apps" />
        <id>https://planetscale.com/blog/fastpage-faster-offset-pagination-for-rails-apps</id>
        <published>2022-08-16T14:00:00.000Z</published>
        <updated>2022-08-16T14:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We’d like to introduce FastPage, a new gem for ActiveRecord that applies the MySQL “deferred join” optimization to your offset/limit queries.
Here is a slow pagination query in Rails:Post.all.order(created_at: :desc).limit(25).offset(100)
# Post Load (1228.7ms)  SELECT `posts`.* FROM `posts` ORDER BY `posts`.`created_at` DESC LIMIT 25 OFFSET 100

We add .fast_page to the query and now it’s 2.7× faster!Post.all.order(created_at: :desc).limit(25).offset(100).fast_page
# Post Pluck (456.9ms)  SELECT `posts`.`id` FROM `posts` ORDER BY `posts`.`created_at` DESC LIMIT 25 OFFSET 100
# Post Load (0.4ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` IN (1271528, 1271527, 1271526, 1271525, 1271524, 1271523, 1271522, 1271521, 1271520, 1271519, 1271518, 1271517, 1271516, 1271515, 1271514, 1271512, 1271513, 1271511, 1271510, 1271509, 1271508, 1271507, 1271506, 1271505, 1271504) ORDER BY `posts`.`created_at` DESC

Benchmark
We wanted to see just how much faster using the deferred join could be. We took a table with about 1 million records in it and benchmarked the standard ActiveRecord offset/limit query vs the query with FastPage.
Here is the query:AuditLogEvent.page(num).per(100).where(owner: org).order(created_at: :desc)

Both owner and created_at are indexed.

As you can see in the chart above, it’s significantly faster the further into the table we paginate.
How this works
The most common form of pagination is implemented using LIMIT and OFFSET.
In this example, each page returns 50 blog posts. For the first page, we grab the first 50 posts. On the 2nd page we grab 100 posts and throw away the first 50. As the OFFSET increases, each additional page becomes more expensive for the database to serve.-- Page 1
SELECT * FROM posts ORDER BY created_at DESC LIMIT 50;
-- Page 2
SELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 50;
-- Page 3
SELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 100;

This method of pagination works well until you have a large number of records. The later pages become very expensive to serve. Because of this, applications will often have to limit the maximum number of pages they allow users to view or swap to cursor based pagination.
Deferred join technique
High Performance MySQL recommends using a “deferred join” to increase the efficiency of LIMIT/OFFSET pagination for large tables.SELECT * FROM posts
INNER JOIN(select id from posts ORDER BY created_at DESC LIMIT 50 OFFSET 10000)
AS lim USING(id);

Notice that we first select the ID of all the rows we want to show, then the data for those rows. This technique works “because it lets the server examine as little data as possible in an index without accessing rows.”
The FastPage gem makes it easy to apply this optimization to any ActiveRecord::Relation using offset/limit.
To learn more on how this works, check out this blog post: Efficient Pagination Using Deferred Joins.
When should I use this?
fast_page works best on pagination queries that include an ORDER BY. It becomes more effective as the page number increases. You should test it on your application’s data to see how it improves your query times.
Because fast_page runs 2 queries instead of 1, it is very likely a bit slower for early pages. The benefits begin as the user gets into deeper pages. It’s worth testing to see at which page your application gets faster from using fast_page and only applying to your queries then.posts = Post.all.page(params[:page]).per(25)
# Use fast page after page 5, improves query performance
posts = posts.fast_page if params[:page] > 5

Thank you ❤️
This gem was inspired by Hammerstone’s fast-paginate for Laravel and @aarondfrancis’s excellent blog post: Efficient Pagination Using Deferred Joins. We were so impressed with the results, we had to bring this to Rails as well.]]></content>
        <summary><![CDATA[Introducing FastPage a new gem for ActiveRecord that speeds up deep pagination queries.]]></summary>
      </entry>
    
      <entry>
        <title>How to kill Sidekiq jobs in Ruby on Rails</title>
        <link href="https://planetscale.com/blog/how-to-kill-sidekiq-jobs-in-ruby-on-rails" />
        <id>https://planetscale.com/blog/how-to-kill-sidekiq-jobs-in-ruby-on-rails</id>
        <published>2022-08-15T14:00:00.000Z</published>
        <updated>2022-08-15T14:00:00.000Z</updated>
        
        <author>
          <name>Elom Gomez</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We’ve have all run into situations where we’d like to disable or cancel an errant Sidekiq job in production. The current solutions involve dumping the deploy queue for the specific job or deploying a hot fix to production to disable the job. Initially, these solutions tend to solve your problem, but as your app grows and your job queue gets complicated, you’ll want a solution that is more robust and almost immediate.
In this post, we will learn how we built a Sidekiq Client Middleware to kill Sidekiq jobs in production without relying on deploys.
Sidekiq Client Middleware
We currently make use of the amazing flipper gem at PlanetScale to feature flag new features being built. One of the major pros of Flipper is that it allows us to enable/disable different code paths in production without deploys. Based on this, we decided to use its feature flagging capabilities to help us disable our jobs in production. We created a middleware, lib/sidekiq_middleware/sidekiq_jobs_flipper.rb, which you can find below:module SidekiqMiddleware
  class SidekiqJobsFlipper
    def call(worker_class, job, queue, redis_pool)
      # return false/nil to stop the job from going to redis
      klass = worker_class.to_s
      if Flipper.enabled?("disable_#{klass.underscore.to_sym}")
        return false
      end

      yield
    end
  end
end

The Sidekiq client middleware runs when pushing a job to Redis. The following snippet from the middleware allows us to short circuit any jobs that have been feature flagged to be disabled:if Flipper.enabled?("disable_#{klass.underscore.to_sym}")
  return false
end

Example usage
If you want to disable jobs in production in your own application, you can use this middleware.
Create a new middleware file and paste in the middleware code from the previous section.
Add the middleware to your Sidekiq configurations in config/initializers/sidekiq.rb: Sidekiq.configure_server do |config|
  config.client_middleware do |chain|
    chain.add(SidekiqMiddleware::SidekiqJobsFlipper)
  end
 end

To disable a job in production, you would .underscore the job class name and prefix it with disable_. Let’s take the InvoiceJob below as an example:class InvoiceJob
  include Sidekiq::Worker

  def perform(...)
    ...
  end
end

In this case, the class name InvoiceJob becomes disable_invoice_job.
To disable InvoiceJob, run the following command in the console:Flipper.enable("disable_invoice_job")
]]></content>
        <summary><![CDATA[Learn how PlanetScale uses a custom middleware to kill our Sidekiq jobs in production without relying on deploys.]]></summary>
      </entry>
    
      <entry>
        <title>Database DevOps</title>
        <link href="https://planetscale.com/blog/database-devops" />
        <id>https://planetscale.com/blog/database-devops</id>
        <published>2022-08-08T17:03:57.138Z</published>
        <updated>2022-08-08T17:03:57.138Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[One of the earliest realizations we made at PlanetScale is that the world of databases had been practically untouched by DevOps practices. The benefits of DevOps are numerous, including faster product delivery and continuous integration, so why wouldn’t we want this for our database?

When you see a description of DevOps, it is usually accompanied by the cycle shown in the above graphic. However, it mostly only applies to stateless workloads.
Building out a DevOps toolchain allows you to write code, build, test, deploy, and monitor with tools like GitHub, CircleCI, Kubernetes, and Datadog. Over the last decade, these tools and many like them have become ubiquitous within the space, but databases are nowhere to be seen.
Until PlanetScale, the database has existed outside of the DevOps cycle. It has existed as an amorphous blob of state that engineers and operators have to tip toe around. Nobody has thought to ask, why can’t the database be a nimble and additive part of the software development process?
Continuous Deployment
A critical part of DevOps is being able to continually deploy code. This not only enables your business to move quickly and iterate, it means your local development isn’t far behind production. Instead of shipping stamped versions of software, you are always deploying. Continually deploying code is largely a solved problem, however, any features that have more depth than UI changes will likely require schema changes at the database level.
With the current state of legacy databases, engineers are left to manually apply database operations, such as schema changes, out of band of the typical CI/CD process. In more complex or higher scale environments, engineers may even need to open a ticket for a DBA team to apply schema changes in service of avoiding downtime or data loss. This can add weeks to a process that should be fast.
In contrast to the DevOps cycle above, the schema change process looks like this:

How PlanetScale enables DevOps
We set about to build something totally new: A database that can be natively part of the DevOps process. Database changes should be deployed and contain the semantics we’ve become used to in code deployments: automation, safeguards, and rollbacks.

PlanetScale development branches
At the beginning of the feature development process, a developer will normally create a git branch to work against. With PlanetScale, you can also create a database branch to pair with your git branch and act as your development environment. PlanetScale branches are essentially isolated copies of your database that developers can create to test schema changes before deploying to production. This ensures that developers are building against a production-like environment without risk. Another benefit of database branches is that they are cloud-hosted, meaning engineers can share the environment for collaboration or previews.
Next in the process is CI, PlanetScale also has a role to play here. Our CLI means you can automate the task of creating branches that can serve as an isolated testing environment. Schema changes can be applied and tested on real data without risk to production.
Deploy requests
When you are ready to deploy, you can use PlanetScale Deploy Requests to safely put your schema changes into production. Deploy requests allow you to comment on suggested changes, as well as require approval and sign off. Once everything is good to go, your schema changes will roll into production online and without locking. This can be automated as part of your CD process without any manual steps. If your deploy causes errors, you can instantly roll back your database deploy without data loss, and the old version of the schema will be reinstated.
Query monitoring
The final critical part of the DevOps cycle is monitoring. PlanetScale Insights gives you the ability to monitor queries in real time. You also get an interactive graph with an overlay showing when deploys have happened contrasted against your performance data. This allows you to gain insight and continually improve your application’s performance.
There we have it, the principles of the DevOps lifecycle applied to an ACID compliant relational database.
]]></content>
        <summary><![CDATA[Learn how PlanetScale enables databases to seamlessly fit into the DevOps lifecycle.]]></summary>
      </entry>
    
      <entry>
        <title>How PlanetScale prevents MySQL downtime</title>
        <link href="https://planetscale.com/blog/how-planetscale-prevents-mysql-downtime" />
        <id>https://planetscale.com/blog/how-planetscale-prevents-mysql-downtime</id>
        <published>2022-08-02T14:01:00.000Z</published>
        <updated>2022-08-02T14:01:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[The cost of downtime can range from loss of business to severe reputation damage. Database downtime feels inevitable but is often preventable.
The causes of database issues that lead to downtime can be categorized in the following ways:
Human error
System immaturity
Application issues
In this post, we will explore how PlanetScale can mitigate all three.
Downtime due to human error
We’ve all done it, dropped the wrong table or index and caused queries to dramatically slow or fail. This can lead to entire site outages, as well as cascading failure that impacts other systems. With standard MySQL, if you drop a table that you later find out was still in use, you are in a situation where you now have to restore from backup. Selecting the right backup, restoring, and bringing your site back online can take hours, which of course leads to high levels of stress for you and your team.
To help prevent this type of outage, PlanetScale warns you if the table to be dropped was recently queried. This will help you avoid the mistake of dropping a table that is in use.

If you do happen to deploy a schema that has issues, such as a sub-optimal index that causes query performance degradation or a dropped column that cause errors, we also let you roll back the schema deployment without any loss of data.
Downtime due to system immaturity
Building a highly available database is hard. It takes decades for databases to come to maturity. People often bet on solutions that have traded scalability for approachability. On the surface, this feels like the right trade off when you are just starting a project, but can very quickly backfire when user demand for your services increases.
We have built PlanetScale on top of Vitess, the database clustering system for horizontal scaling of MySQL, which was built by the database team at YouTube to power YouTube.com. Since Vitess was open sourced, it has been adopted by GitHub, Slack, Etsy, Roblox, and many more. PlanetScale are also the maintainers of Vitess.
Being used and contributed to by some of the largest sites on the internet means that Vitess has been pressure tested at scale.
The reason we chose to build PlanetScale on top of Vitess starts with our fundamental belief in the importance of developer experience. Developer experience starts with approachability but is only maintained with reliability and scalability. Building on a database platform that supports billions of users ensures that our users will not be forced to trust an immature system. We know Vitess scales and will work for the people that trust us with one of the most critical pieces of their application.
Downtime due to application issues
Bugs happen. You can’t deploy perfect software all the time. A common reason for database outages is bad application deploys causing spikes of excessive database load. This can be caused by poorly performing (slow) queries or too many queries at once.
PlanetScale Insights is a next generation monitoring solution that helps you discover bad queries in real time. When we receive a query from your application, we send it through a data pipeline that logs the query and its performance metrics. This allows you to gain an aggregated view of the queries your application sends. Finally, you can use query comments to tag and identify the source of queries.


Try it out
At PlanetScale, we’re committed to delivering a high-performance scalable database that doesn’t require you to give up developer experience. Vitess allows us to build on a proven mature solution. And we will continue to ship new in-dashboard tools and alerts to help you identify issues caused by application code or human error before they make it to production.
You can sign up for an account or contact us today to get started.]]></content>
        <summary><![CDATA[Learn how PlanetScale protects against downtime due to human error system immaturity and app issues.]]></summary>
      </entry>
    
      <entry>
        <title>Ruby on Rails: 3 tips for deleting data at scale</title>
        <link href="https://planetscale.com/blog/ruby-on-rails-3-tips-for-deleting-data-at-scale" />
        <id>https://planetscale.com/blog/ruby-on-rails-3-tips-for-deleting-data-at-scale</id>
        <published>2022-08-01T14:00:00.000Z</published>
        <updated>2022-08-01T14:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We’ve seen that as Rails applications grow, there are a few common issues that teams run into with deleting data.
In this post, you’ll learn a few strategies you can use to mitigate the risks of cleaning up data on a high scale Rails application.
How Rails deletes associated data
Rails applications at scale generally run into issues when deleting many records at once. This happens most commonly in models with many associations.
The standard way to delete associated data in Rails is to let ActiveRecord handle it via dependent: :destroy. In the following example, when the parent model (author) is deleted, all data in the dependent models will get deleted by ActiveRecord as well.class Author < ApplicationRecord
  has_many :books, dependent: :destroy
end

class Book < ApplicationRecord
  belongs_to :author
end

The database schema looks like this:ActiveRecord::Schema[7.1].define(version: 2022_06_06_171750) do
  create_table "authors", force: true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "books", force: true do |t|
    t.string   "name"
    t.text     "description"
    t.bigint   "author_id", null: false
    t.datetime "created_at"
    t.datetime "updated_at"
    t.index    ["author_id"], name: "index_books_on_author_id"
  end
end

There is an indexed foreign key, but no foreign key constraint. ActiveRecord is responsible for deleting the data.
Now that we’ve covered the typical way to delete associated data, let’s look at some tips to improve this.
Tip #1: Use ActiveRecord’s destroy_async
As of Rails 6.1, dependent: :destroy_async was added to ActiveRecord. It works similarly to dependent: :destroy, except that it will run the deletion via a background job rather than happening in request.
This protects you from triggering a large number of deletes within a single transaction. As a Rails application grows, it can be very easy to unintentionally delete a parent record and trigger a cascade of thousands of deletions. Having all of this happen within a request can lead to timeouts and added strain on your database.
Replace any usage of Foreign Key ON DELETE CASCADE
Foreign key constraints are used in databases to manage referential integrity between tables. Specifically, developers will use ON DELETE CASCADE to delete all associated records when the parent record is deleted. This is an option some Rails applications will use rather than the standard dependent: :destroy.
This works well when the child data is limited. It becomes a problem when deleting a large number of child records. A simple delete can suddenly turn into a massive operation deleting thousands of records across multiple tables. This results in the users DELETE request taking several seconds to respond or timing out. In the database, this can lead to excessive locking, increase replication lag, and more issues that will have impact on other parts of the application.
We recommend replacing any usage of foreign key constraints with :destroy_async for safer deletes at scale.
Look out for failing validations
One issue to look out for with destroy_async is the risk of validations failing in a child model when deleting data. Since it’s happening asynchronously, the user will be unaware of any errors and the job will end up in your error queue. If any child records have validations on delete, we recommend running them from the parent model. This will stop the deletion from occurring and alert the user of the issue. This is an important area to add test coverage to protect from any regressions.
Tip #2: Understanding delete vs destroy
We have two primary methods for deleting data, delete and destroy, as well as their related delete_all and destroy_all on ActiveRecord relations.
destroy — Deletes the record while also triggering the models callbacks
delete — Skips the callbacks and deletes the record directly from the database
If you have callbacks setup, then you’ll generally want to always use destroy so that they are called. It’s important though to be aware of all the activity that could be caused by those callbacks, especially when destroying a large number of records. For example, a cron job for cleaning up old data would be better suited for using delete_all to skip callbacks.
Tip #3: Safely mass deleting old data
When there is no use for data anymore, it’s a common practice to archive or delete it.
For large busy tables, deleting a large number of records at once can lock the table and have unintended consequences to the rest of the application. The safe way is to continuously run deletes in small batches.
Here is an example Sidekiq job that can be scheduled by a cron to run once per hour:# frozen_string_literal: true

class DeleteOldDataJob < BaseJob
  # We only want 1 instance of this job running at a time
  sidekiq_options unique_for: 1.hour, unique_until: :start, queue: :background, retry: false

  def perform(limit: 500)
    # Deletes 500 records
    deleted = Model.where("created_at < ?", 3.months.ago).limit(limit).delete_all

    # If more records to delete, requeue itself and run again
    if deleted == limit
      self.class.perform_async
    end
  end
end

This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there.
How to test it
This job is a good place to add test coverage to ensure you're deleting the correct data. Here’s an example pattern you can use.# frozen_string_literal: true

require "test_helper"

class DeleteOldDataJobTest < ActiveJob::TestCase
  test "deletes data over 3 months old" do
    expired = create(:data, minute: 3.months.ago - 1.hour)
    retained = create(:data, minute: 3.months.ago + 1.hour)

    DeleteOldDataJob.new.perform

    assert Data.where(id: expired.id).empty?
    assert Data.where(id: retained.id).exists?
  end

  test "requeues if more to delete" do
    create(:data, minute: 3.months.ago - 1.hour)
    create(:data, minute: 3.months.ago - 1.hour)

    assert_enqueued_sidekiq_jobs(1, only: DeleteOldDataJob) do
      DeleteOldDataJob.new.perform(limit: 1)
    end
  end
end

Rails and PlanetScale
If you do make a mistake while deleting data or making schema changes, PlanetScale offers some solutions. From the dashboard, you have the option to instantly revert a bad schema change without losing any data. We also throw a warning in the dashboard if you're dropping a table that was recently queried, so you can hopefully catch any mistakes before they happen.
To get started with Rails and PlanetScale, check out the Rails quickstart.]]></content>
        <summary><![CDATA[Learn how to delete data in large Rails apps with destroy_async delete destroy and scheduled cron jobs.]]></summary>
      </entry>
    
      <entry>
        <title>The Slotted Counter Pattern</title>
        <link href="https://planetscale.com/blog/the-slotted-counter-pattern" />
        <id>https://planetscale.com/blog/the-slotted-counter-pattern</id>
        <published>2022-07-28T16:34:56.745Z</published>
        <updated>2022-07-28T16:34:56.745Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[It is a common database pattern to increment an INT column when an event happens, such as a download or page view.
You can go far with this pattern until bursts of these types of events happen in parallel and you experience contention on a single row. When multiple transactions are trying to update the counter, you are essentially forcing these transactions to run serially, which is bad for concurrency and can cause deadlocks. You can also see dramatic increases in query time when bursts like this occur.
You can check if you are experiencing contention by running the following:SHOW ENGINE INNODB STATUS\G

In the output, you’ll see some information about granting a lock:---TRANSACTION 79853106, ACTIVE 5 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
MySQL thread id 24, OS thread handle 6281670656, query id 107 localhost root updating
UPDATE slotted_counters SET count = count + 1 WHERE id = 1
------- TRX HAS BEEN WAITING 5 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2 page no 4 n bits 184 index PRIMARY of table `github`.`downloads` trx id 79853106 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000004c27630; asc     v0;;
 2: len 7; hex 020000017d0ce9; asc     }  ;;
 3: len 4; hex 8000007b; asc    {;;
 4: len 4; hex 800001c8; asc     ;;
 5: len 4; hex 80000019; asc     ;;
 6: len 4; hex 8230df9b; asc  0  ;;

You can see that this transaction has been waiting a significant amount of time to acquire a lock to increment the counter on this single row. It is clashing with other competing transactions.
MySQL is the main database for GitHub.com, and back in the day, when a number of PlanetScale folks worked there, we had to do this kind of counting differently. We decided on using a separate table with a schema similar to this:CREATE TABLE `slotted_counters` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `record_type` int(11) NOT NULL,
  `record_id` int(11) NOT NULL,
  `slot` int(11) NOT NULL DEFAULT '0',
  `count` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `records_and_slots` (`record_type`,`record_id`,`slot`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

record_type — The type of counter (allows us to keep the table generic)
record_id — Identifies whatever we are counting, it could map to a repository id for example
slot — The slot we are going to increment
count — The count for each slot
A typical increment query would look like:INSERT INTO slotted_counters(record_type, record_id, slot, count)
VALUES (123, 456, RAND() * 100, 1)
ON DUPLICATE KEY UPDATE count = count + 1;

The idea here is that instead of incrementing a single row for a counter, we are now picking a slot and incrementing the count in that slot. This means instead of hammering a single row, we are spreading the updates across 100 rows and reducing the potential for contention.
Once we have run the above INSERT a few times, we can see the counter rows:mysql> select * from slotted_counters;

+----+-------------+-----------+------+-------+
| id | record_type | record_id | slot | count |
+----+-------------+-----------+------+-------+
|  1 | 123         |       456 |    2 |    21 |
|  2 | 123         |       456 |   52 |    99 |
|  3 | 123         |       456 |   55 |   321 |
|  4 | 123         |       456 |    0 |   442 |
|  7 | 123         |       456 |   48 |    69 |
|  8 | 123         |       456 |   20 |   661 |
|  9 | 123         |       456 |   56 |    62 |
| 10 | 123         |       456 |   18 |   371 |
| 11 | 123         |       456 |   22 |   127 |
| 12 | 123         |       456 |   58 |   33  |
| 13 | 123         |       456 |   23 |   322 |
+----+-------------+-----------+------+-------+
11 rows in set (0.00 sec)

Getting the count for record_id 456 is as simple as this SELECT query:SELECT SUM(count) as count FROM slotted_counters
WHERE (record_type = 123 AND record_id = 456);

Now we can have requests executing counter increments in parallel without causing contention and effecting concurrency.
There are a few different ways you can implement this pattern, but it comes down to the architecture of your app. One way would be to query the slotted_counters table to roll up the data and update a column stored with the rest of the data.]]></content>
        <summary><![CDATA[Handle MySQL increment counter bursts with the Slotted Counter Pattern]]></summary>
      </entry>
    
      <entry>
        <title>Behind the scenes: How we built Password Roles</title>
        <link href="https://planetscale.com/blog/behind-the-scenes-how-we-built-password-roles" />
        <id>https://planetscale.com/blog/behind-the-scenes-how-we-built-password-roles</id>
        <published>2022-07-27T14:00:00.000Z</published>
        <updated>2022-07-27T14:00:00.000Z</updated>
        
        <author>
          <name>Phani Raju</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[We recently released a new feature that allows you to use granular roles for your database passwords. When you generate a new password, you have the option to select from the following roles: read-only, write-only, read/write, admin.
We implemented password roles using Vitess Access Control Lists and VTTablet, but we ran into a couple of roadblocks on the way. In this post, we look at some of the issues we faced while implementing this, and how we were able to get around them.
How Vitess handles user authorization
Vitess supports authorization via a static configuration file supplied to its vttablet.
You can see an example of what this file looks like below:{
  "table_groups": [
    {
      "name": "planetscale user groups",
      "table_names_or_prefixes": ["%"],
      "readers": ["planetscale-reader", "planetscale-writer", "planetscale-admin"],
      "writers": ["planetscale-writer", "planetscale-writer-only", "planetscale-admin"],
      "admins": ["planetscale-admin"]
    }
  ]
}

Dissecting the ACL configuration file
table_names_or_prefixes — This is the list of tables that this policy applies to. % means all tables.
readers — This set of users can read data and schema from tables and views in the database.
writers — This set of users can write data to tables in the database.
admins — This set of users can read, update data, and alter schema in the database.
Although this configuration is static, Vitess allows you to customize this list over the runtime of a vttablet by reloading this file from disk at a specific interval using the --table-acl-config-reload-interval argument.
This static file approach is great for customers who:
Have a pre-defined set of users that can access the database. Managing the various access groups in the above file is easier if the entire set of users is well-known.
Have a small set of ACL configuration files that are custom to each of their Vitess clusters.
Can plan for a maintenance schedule when updating this file on the pod/shared volume for their Vitess deployments.
Why doesn’t this work for PlanetScale?
None of the qualifiers above apply to PlanetScale customers. As a design principle, we try to avoid keeping authentication and authorization state on the actual vttablet pods themselves.
We can run into all kinds of issues depending on a refresh interval for a file on the vttablet pod. Let’s look at some of them.
Our user story of “Customers can create passwords with roles and they work immediately” prevents the dependence on a timed refresh loop on the vttablets. Since we don’t know when the file will be refreshed next, we can’t guarantee that your credentials will work immediately.
The operator, responsible for managing the vttablet, will need to write to a file on the pod and might not be able to update all vttablet pods at once, leading to a race condition where a given credential might be admin on one pod and reader on another.
If a pod goes down and needs to be restarted, we don’t have an external ACL store to figure out what the ACL state for each database should be before bringing it up again.
We’d need to maintain a separate set of state for each customer database that cannot be common to all PlanetScale databases.
With these issues in mind, we were able to come up with a solution that gave our customers a seamless experience they have come to expect from PlanetScale.
How we use the static ACL file to implement password roles
For every password created by a user, we store the following bits of information in the credential database:
Display name
Role
password_sha1_hash
password_sha2_hash
The Role property determines which of the three vttablet ACL roles (readers, writer, or admins) you’ll get mapped to.
If you create a write-only password and connect to your PlanetScale database, the query hits the user query frontend, which is a service responsible for all user-facing functionality for PlanetScale databases.

As shown in the diagram above, this approach lets us solve all of the issues we discussed in the previous section.
Having a dynamic user credential store allows us to create/delete user mappings to roles instantly, without the need for a refresh interval.
We have a predefined set of user names which describe the access grant for each of the roles we support, e.g. planetscale-reader can only read data and schema. By mapping all PlanetScale users’ roles to the username from the Vitess ACL configuration, we can do an “on the fly” rewrite of the security principal so that connections to the database get the right access levels.
Since all authentication and authorization data is stored on an external data store, all pods that we create for a database will have the same ACL state.
Since the base ACL configuration is the same across all PlanetScale databases, debugging and fixing any issues with ACL enforcement is simplified.
Wrap up
If you’d like to learn more about password roles, please check out our Password roles documentation. You can sign up for a PlanetScale account and try it out today.
If you have any questions, make sure to find us on Twitter.]]></content>
        <summary><![CDATA[Learn how we leveraged Vitess ACLs and VTTablet to build our password roles functionality]]></summary>
      </entry>
    
      <entry>
        <title>Safely dropping MySQL tables</title>
        <link href="https://planetscale.com/blog/safely-dropping-mysql-tables" />
        <id>https://planetscale.com/blog/safely-dropping-mysql-tables</id>
        <published>2022-07-25T14:00:00.000Z</published>
        <updated>2023-10-26T14:00:00.000Z</updated>
        
        <author>
          <name>David Graham</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Dropping or removing an unused table from a database schema can be challenging. Even after triple checking that all of your apps have migrated away from querying the table, there may still be that one rogue script that accesses it.
Running the dreaded DROP TABLE statement can cause a host of unintentional problems if that table is still being used elsewhere. It completely erases the table definition, partitions, and the data in that table. Before you drop a table, you should double (or triple!) check when a table was last queried.
When was the table you want to drop last accessed?
You can manually check when a table was last accessed, but it's a bit complicated. The following query will show you the last time a table was written to, but not read:SELECT update_time FROM information_schema.tables WHERE table_name='tablename'

To see time last accessed in general, you can use the audit plugin for MySQL Enterprise edition that allows you to see which users ran which queries per connection. The Audit Record itself has some customization options, though is limited.
To see when a table was last modified, you can also check when the given <tablename>.ibd file was updated. Or you can check the .frm file for DDL changes, which can give you the last known modification.
So, you do have some options to find the last time a table was modified, but the solutions aren't very straightfoward. Doing this each time you want to drop a table could drastically delay your team's speed to production.
Using PlanetScale to safely drop MySQL tables
At PlanetScale, our mission is to create the most scalable, developer-friendly database platform. Dropping tables is never fun, but we wanted to make the process as stress-free as possible. To accomplish this, we built an in-dashboard feature that checks if tables are truly unused during deploy requests and warns you if the table to be dropped was recently queried.

Identifying table usage with Insights
On top of warning you, we also want to help you find when and where the table is being queried. If you run into this warning, you can use Insights, our in-dashboard query monitoring tool, to help identify where the table is being queried.
With Insights, you can narrow down your analysis to individual query performance. We also surface SQL comments on queries, so you can tag your queries with additional information to track down where they came from.
Instrumenting queries with comment tags can help you identify which application is still using the table. Once you remove the query from any remaining applications, you can confidently drop the table.

Queries against individual tables can always be found by going to your Insights page and using the table:<name> query syntax in the filter input box, as shown below. This reveals how many dependencies there are on the table before attempting to drop it.

Try it out
Hopefully this addition will make cleaning up unused tables a little less stressful. For more information about how to use Insights, check out our documentation.
We love hearing from you! If you have any questions or feedback, don’t hesitate to contact us.]]></content>
        <summary><![CDATA[Learn how to safely drop tables in MySQL by checking if the table is still in use and how PlanetScale makes this process much easier.]]></summary>
      </entry>
    
      <entry>
        <title>Temporal Workflows at scale with PlanetScale: Part 1</title>
        <link href="https://planetscale.com/blog/temporal-workflows-at-scale-with-planetscale-part-1" />
        <id>https://planetscale.com/blog/temporal-workflows-at-scale-with-planetscale-part-1</id>
        <published>2022-07-22T14:45:00.000Z</published>
        <updated>2022-07-22T14:45:00.000Z</updated>
        
        <author>
          <name>Savannah Longoria</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[As more services shift towards the cloud, many organizations seek microservices to streamline development, improve reliability, and accelerate feature delivery. Developing and deploying microservices can have a hidden complexity. If you are migrating away from a monolithic service that uses transactions to keep data safe and consistent, how do you translate that to a distributed microservices world?
In this blog, the first of a two-part series, we will introduce you to Temporal and PlanetScale DB and demonstrate the advantages of using these two powerful technologies to manage your workflows reliably and with less effort.
What is Temporal?
As an organization and application scale, monoliths tend to become groups of microservices, and developers have to start thinking about how these separate systems work together. If you're a software developer who began your career in the past seven years, you’ve probably already seen a lot of distributed systems concepts through your daily work. Temporal is an open source, distributed, and scalable workflow orchestration engine capable of concurrently running millions of workflows.
Temporal takes care of many distributed systems patterns for you. It allows you to code at a new, higher level of abstraction, where you don’t have to concern yourself with reimplementing these patterns. With Temporal, you get reliability, fault tolerance, and scalability out of the box, and you can focus on just coding your business logic.
Let’s break this down a little by looking at some definitions:
Workflows — Workflows hold state and describe which activities or tasks should be carried out.
Activities — Activities are tasks that might fail. For example, calling a service. They’re automatically retried, and execution is distributed via task queues to a pool of workers.
Essentially with Temporal, failure handling is taken off the hands of application developers and handled by the engine. It provides the illusion of infallible, reliable function executions and will tell our code when to run. Registering workflow implementations and the activity implementations this workflow needs to run is critical.
In the figure below, you will find an illustration of a Temporal Cluster, which consists of four independently scalable services. (Source: Temporal.io)

These independently scalable services include:
Frontend gateway (rate limiting, routing, authorizing)
History subsystem to maintain data (mutable state and timers)
Matching subsystem to host task queues for dispatching
Worker service to handle the internal background workflows
Durability
Temporal captures the progress of a workflow execution (or workflow steps) in a log called the history. In case of a crash, Temporal rehydrates the workflow; that is, Temporal restarts the workflow execution, deduplicates the invocation of all activities that have already been executed, and catches up to where it previously left off. It does all this without requiring anything special for this to happen in the application code, meaning failure handling is entirely outsourced to Temporal.
“Long-running” Workflow examples
To clearly illustrate what it’s like to program with Temporal, it’s worth discussing long-running workflows and some business use cases relevant to developers and infrastructure teams. Long-running isn’t really about some arbitrary cutoff in time — it can be short or infinitely long. A workflow might be something you already have implemented in a single service or application. Below are two examples I found helpful from Temporal’s blog:
Box uses Temporal for orchestrating file update operations. Although this can take hours for large transfers, most of these feel instantaneous to users. Ideally, we want one solution to scale from the smallest to largest use cases with no more visible latency than necessary. Box uses Temporal more for transactional and reliability guarantees around microservice orchestration, and the words "Long Running" were never even mentioned.
Checkr uses Temporal for coordinating background checks. This is a multi-staged process with a vast range in processing times, ranging from pinging a database search API to dispatching a court researcher to a courthouse, followed by analyzing each record and potentially escalating to manual QA. The process could take days, and Temporal solves this by persisting event histories as a source of truth, solving for both observability and reliability in one fell swoop.
In practice, this means you can write infinitely long-running Workflows. For example, you could use this for various e-commerce cases such as:
Coordinating actions like loyalty rewards
Subscription Charges
Setting up reminder emails over the entire lifetime of your relationship with the customer.
Where does PlanetScale fit in?
A Temporal cluster is a Temporal Server paired with a persistence layer (i.e., the data access layer). All the workflow data—task queues, execution state, activity logs—are stored in a persistent Data Store. Temporal offers two storage options:
A SQL option (namely, MySQL and PostgreSQL)
A No-SQL option (namely Cassandra)
If you choose SQL, you trade operational simplicity for scalability. If you decide on No-SQL, you trade scalability for operational complexity. If you choose PlanetScale, you get both: operational simplicity and scalability.
The database stores the following types of data:
Tasks to be dispatched
The state of Workflow Executions
The mutable state of Workflow Executions
Event History, which provides an append-only log of Workflow Execution History Events
Namespace metadata for each Namespace in the Cluster
Visibility data, which enables operations like "show all running Workflow Executions”
At the core of PlanetScale, we are MySQL with Vitess as a middleware.
Vitess was built in 2010 to solve scaling issues at YouTube. Since then, the open source project has continued to grow and now helps several companies like Slack and Square handle their massive data scaling needs.
This means that we are built for heavily distributed applications experiencing a high load. A PlanetScale database completes the simplest Temporal deployment diagram.

PlanetScale horizontally scales by combining an arbitrary number of MySQL instances and by horizontally partitioning your data over these clusters according to a customizable partitioning strategy. Since Temporal also uses horizontal partitioning (more information can be found here), Temporal maps effortlessly onto PlanetScale and can take full advantage of PlanetScale’s scalability improvements over a single MySQL instance.
Getting started with Temporal
There are four ways to install and run a Temporal Cluster quickly:
Docker: Using Docker Compose makes it easy to develop your Temporal Application locally.
Render: Our temporalio/docker-compose experience has been translated to Render's Blueprint format for an alternative cloud connection.
Helm charts: Deploying a Cluster to Kubernetes is an easy way to test the system and develop Temporal Applications.
Gitpod: One-click deployments are available for Go and TypeScript.
Temporal does not recommend using any of these methods in a full (production) environment, so we’ll only use these for development. To use PlanetScale with Temporal, you can use docker-compose and manually create temporal and temporal_visibility tables in PlanetScale.
In the next blog, we will walk you through setting up your docker-compose files to run in PlanetScale using this example.]]></content>
        <summary><![CDATA[Learn how to create a more reliable workflow with Temporal and PlanetScale]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Teams: An easier way to manage database administrator access</title>
        <link href="https://planetscale.com/blog/announcing-teams-an-easier-way-to-manage-database-administrator-access" />
        <id>https://planetscale.com/blog/announcing-teams-an-easier-way-to-manage-database-administrator-access</id>
        <published>2022-07-20T15:00:00.000Z</published>
        <updated>2022-07-20T15:00:00.000Z</updated>
        
        <author>
          <name>Iheanyi Ekechukwu</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[When it comes to something as important as database access, we believe you should have all of the tools necessary to control who and what can access your data.
Today, we're announcing Teams, an easier way to manage database administrator access within your PlanetScale organization.

What are Teams?
With Teams, you can create groups within your organization, add members, and assign them administrator access to one or more databases. You still have the option to add administrators to a database directly in the database settings page, but if you want to manage admin access all in one place, Teams is the way to go.

Teams is available on all plans.
Directory Sync
As part of this release, we're also introducing Directory Sync, which allows you to manage your PlanetScale organization's members and teams through your SSO Directory. If you have SSO through your plan, or have purchased it as an add-on, this feature is immediately available to you.

With Directory Sync and Teams, you can combine other access tools, such as Indent, enabling users to request ephemeral access to specific databases and directories.
Try it yourself
You can start creating teams today in the PlanetScale dashboard. Simply go to your organization's settings page and click "Teams" in the left-side navigation. For Directory Sync, you can configure this within the "Authentication" section of the organization settings if you have SSO enabled. Contact us if you're interested in adding SSO to your organization.
For more information, check out our Teams and Directory Sync documentation. If you have any questions, don't hesitate to contact us.]]></content>
        <summary><![CDATA[Learn how you can manage database access with Teams and Directory Sync]]></summary>
      </entry>
    
      <entry>
        <title>We now display PlanetScale system status directly in your dashboard</title>
        <link href="https://planetscale.com/blog/we-now-display-planetscale-system-status-directly-in-your-dashboard" />
        <id>https://planetscale.com/blog/we-now-display-planetscale-system-status-directly-in-your-dashboard</id>
        <published>2022-07-19T14:45:00.000Z</published>
        <updated>2022-07-19T14:45:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[For most of our customers, their database is the most critical piece of their infrastructure. On the rare occasion an issue comes up, the more information, the better. We recently shipped an improvement to the database page to make it easy to find out if there are any issues with PlanetScale itself. Any time there’s an update to our status page, you’ll now also see it in the UI as well.

How we built it
The PlanetScale dashboard is a Next.js app backed by a Ruby on Rails API. We felt this would be the perfect opportunity to make use of Vercel's new edge functions. This would allow us to proxy our status page API and cache it for super fast ~30ms response times.
If you use StatusPage, you could adapt this code to implement the same component in your own applications.
The Edge functionexport const config = {
  runtime: 'experimental-edge'
}

export default async () => {
  const res = await fetch('https://www.planetscalestatus.com/api/v2/incidents/unresolved.json')
  const json = await res.json()
  let incident = json?.incidents?.[0] || {}

  if (incident) {
    incident = { ...incident, url: `https://www.planetscalestatus.com/incidents/${incident.id}` }
  }

  return new Response(JSON.stringify({ ...incident }), {
    status: 200,
    headers: {
      'content-type': 'application/json',
      'cache-control': 's-maxage=1, stale-while-revalidate'
    }
  })
}

Take note of the cache-control header. This is an important detail that instructs Vercel to serve our users with the response from their cache while updating the cache in the background.
This ensures users always get a super fast response, and the data is up-to-date as well. It works perfectly for this use case.

The React component
In our UI, we hit the edge function to check for any statuses, and then display the most recent one, if available.import React from 'react'

import useSWR from 'swr'

import { SWR_OPTIONS, fetchSWR } from '@/utils/api'

import Icon from './Icon'

interface Incident {
  id: string
  url: string
  name: string
}

const IncidentStatus: React.FC = () => {
  const { data } = useSWR<Incident>(['/api/incidents', SWR_OPTIONS], fetchSWR)

  if (!data?.id) {
    return null
  }

  return (
    <div className='flex items-center gap-x-sm text-sm'>
      <Icon name='alert-circle' className='text-red' />
      <span className='font-medium'>{data.name}</span>
      &middot;
      <a href={data.url} className='flex items-center' target='_blank' rel='noreferrer'>
        View status
        <Icon name='external-link' />
      </a>
    </div>
  )
}

export default IncidentStatus

Let’s connect
We hope this addition is helpful to your workflow. At PlanetScale, we’re users of our own product, so we’re constantly trying to figure out new ways to improve developer experience, both for you and ourselves.
If you have any feedback or questions, we’d love to hear from you. You can contact us or find us on Twitter.]]></content>
        <summary><![CDATA[Learn about how we built the new in-app system status using Vercel edge functions and StatusPage]]></summary>
      </entry>
    
      <entry>
        <title>How do Database Indexes Work?</title>
        <link href="https://planetscale.com/blog/how-do-database-indexes-work" />
        <id>https://planetscale.com/blog/how-do-database-indexes-work</id>
        <published>2022-07-14T15:18:36.310Z</published>
        <updated>2022-07-14T15:18:36.310Z</updated>
        
        <author>
          <name>Justin Gage</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[If you’ve queried a database, you’ve probably used an index, even if you didn’t know it at the time. Database indexes help speed up read queries by creating ancillary data structures that make your scans faster. This post will walk through how database indexes work, with a particular focus on MySQL, everyone’s (well, many people’s) favorite homegrown organic database.
After you read through how databases indexes work, make sure you check out our video on how to speed things up using composite indexes with hashed functions:
Indexes speed up your read queries
Indexes are basically a way to speed up your read queries, particularly ones with filters (WHERE) on them. They’re data structures that exist in your database engine, outside of whatever table they work with, that point to data you’re trying to query.
To avoid the all-too-common librarian metaphor, imagine – perhaps a far out scenario – you’ve got all of your users in a table in your MySQL database (running on PlanetScale, of course). You’re building some functionality into your social app that allows users to search and filter for other users, which means there will be a query running in production1 that runs through your entire users table. Even worse, your app is quite successful, so there are hundreds of thousands of these “users.” Imagine that!
Unfortunately, SELECT-ing from that users table isn’t very performant right now. The filters you’re applying based on inputs in the UI – user location, account type, most recent activity, and other columns in your database – require scans of the entire users table, which is roughly O(N/2) on average. Your query is taking 5-6 seconds to run, which would be fine as a data analyst doing internal work, but isn’t up to snuff for a smooth user experience.
To remedy this, you create a database index on the “most recent activity” column in your users table with CREATE INDEX. Behind the scenes, MySQL goes and creates a new pseudo-table in the database with two columns: a value for most recent activity, and a pointer to the record in the users table. The trick here, though, is that the table gets ordered and stored as a binary tree, with ordered values for that most recent activity column. As a result, your query is O Log(n) efficient, and only takes a second or less to run.
This is the basic gist of indexes. If you know you’re going to run a specific query repeatedly and worry about read performance, creating an index (or several) can help speed that query up.
That’s the simple version though; there’s a lot more going on under the hood, and too many indexes can even degrade that performance you sought out to improve.
How database indexes work under the hood
An index is not magic – it’s a database structure that contains pointers to specific database records. Without an index, data in a database usually gets stored as a heap, basically a pile of unordered2, unsorted rows. In fact, this is actually a setting you can toggle in Microsoft SQL Server and Azure SQL Database.
In practice, data is rarely stored completely unsorted. Instead, you’ll usually use some sort of primary key – which in MySQL can be identical to an index – that could be something like an auto-incrementing integer. But data can, of course, only be sorted by one column, which limits the “binary” efficiency of sorting (with unique values) to a query that filters on that one, ordered column. An index is basically a way of letting your table be sorted by multiple columns, which lets you get binary search efficiency on multiple filter columns.
When you create an index on a column, you’re creating a new table with two columns: one for the column you indexed on, and one that contains a pointer to where the record in question is stored. Though the index will be of an identical length to the original table, the width will likely be a good degree shorter, and as such will take fewer disc blocks to store and traverse. Pointers in MySQL are pretty small, usually fewer than 5 bytes. The afore-linked now “legendary” Stack Overflow post runs through the math around the number of blocks required for storage, for those interested in a deeper look.
Unless you set it up yourself, there are likely already several indexes created in whatever database you’re using right now. You can see any indexes that exist on a particular table with:SHOW INDEX FROM table_name FROM db_name;

If you run an EXPLAIN statement on your query, you should also see information about which indexes the query plans to use. Here’s the table of possible EXPLAIN output values from the MySQL docs; note the possible_keys and key values, both relating to index selection.
Column
JSON name
Definition
id
select_id
The SELECT identifier
select_type
None
The SELECT type
table
table_name
The table for the output row
partitions
partitions
The matching partitions
type
access_type
The join type
possible_keys
possible_keys
The possible indexes to choose
key
key
The index actually chosen
key_len
key_length
The length of the chosen key
ref
ref
The columns compared to the index
rows
rows
Estimate of rows to be examined
filtered
filtered
Percentage of rows filtered by table condition
Extra
None
Additional information
This way of using EXPLAIN can also be useful for debugging when creating an index, i.e. verifying that a new one is working as intended.
While indexes always store pointers to the record any index points to, the database doesn’t always need to use those pointers. An index only scan (covering index in MySQL) refers to when the value your query is looking for is contained solely in the index, and thus doesn’t require a table lookup. In PostgreSQL, not all index types support index-only scans.
One thing to remember about indexes is that while they’re beneficial for read queries (including JOINs and aggregates), they can negatively impact your database performance as a whole if you’re not careful. Indexes take up a lot of space, even if that space is smaller than the initial table, and that space is precious; you’ll need to keep an eye on how close you’re inching towards your filesystem’s limit. They also make INSERT queries take longer and force the query engine to consider more options before choosing how to execute a query.
Different types of database indexes
We’ve been mostly referencing database indexes that are unique, but that turns out to be only one of several types. Keep in mind that any of these indexes can have more than one column (besides the pointer) in them. MySQL actually supports up to 16 columns in an index. But we digress.
Keys
You may see “key” used interchangeably with “index.” A KEY in MySQL is a type of non-unique index. Because the index isn’t unique, scanning through it won’t be as efficient as a binary tree, but it will likely be more efficient than linear search. Many columns in your tables will likely fit this bill, e.g. something like “first name” or “location.”
MySQL and Postgres have a pretty major difference between them in how they handle primary key storage. In MySQL, primary keys are stored with their data, which means they effectively take up no extra storage space. In Postgres, primary keys are treated like other indexes and stored in separate data structures.
Unique indexes
A unique index conforms to the traditional unique definition – no two identical non-NULL values – with the caveat that the values are sorted. When data fits those two definitions – unique and sorted – you can use binary search and get that sweet, sweet O log(N). Note that a primary key is a specific type of unique index, where there can only be one per table and the value cannot be NULL.
As an aside, UNIQUE indexes can be used as a way to enforce uniqueness on a particular table column. Since indexes are updated on every insert, trying to insert a non-unique value into a column with a unique index on it will throw an error. The Postgres docs call this out explicitly.
Text indexes
Using the FULLTEXT qualifier will create a text index in most popular relational databases. It can only be applied to text-type columns (CHAR, VARCHAR, or TEXT) in MySQL. Where things get interesting is using these indexes: MySQL provides a lot of functionality out of the box that starts to resemble what you’d expect from a modern text parsing / NLP library.
The main text searching keyword set in MySQL is MATCH()... AGAINST(). You can choose from a few different search methods, including:
Natural language search – no special operators, uses the built-in stopwords that you can customize. This is the default search type.
Boolean search – uses a special query language that’s roughly analogous to RegEx (but very different).
Query expansion search – runs a regular natural language search, then another one using words from the returned rows, hence the “expansion.”
Each of these methods has tradeoffs, none are one-size-fits-all.
There are other types of indexes in MySQL, like prefix indexes or descending indexes. For more complete info, check out the MySQL docs on indexes.
Creating indexes in MySQL
MySQL uses pretty standard syntax for creating indexes:CREATE [type] INDEX index_name ON table_name (column_name)

Though the command can be simple, the amount of customization available is pretty impressive. Here’s the full roster of available options from the MySQL docs:CREATE [UNIQUE | FULLTEXT | SPATIAL] INDEX index_name

    [index_type]

    ON tbl_name (key_part,...)

    [index_option]

    [algorithm_option | lock_option] ...

key_part: {col_name [(length)] | (expr)} [ASC | DESC]

index_option: {

    KEY_BLOCK_SIZE [=] value

  | index_type

  | WITH PARSER parser_name

  | COMMENT 'string'

  | {VISIBLE | INVISIBLE}

  | ENGINE_ATTRIBUTE [=] 'string'

  | SECONDARY_ENGINE_ATTRIBUTE [=] 'string'

}

index_type:

    USING {BTREE | HASH}

algorithm_option:

    ALGORITHM [=] {DEFAULT | INPLACE | COPY}

lock_option:

    LOCK [=] {DEFAULT | NONE | SHARED | EXCLUSIVE}

For example, if we wanted to create an index on the “most recent activity” column in our users table, we’d use the following options. Keep in mind that this column contains non-unique values, so we’ll skip the UNIQUE keyword when creating the index.CREATE INDEX users_most_recent_activity

ON users ({most_recent_activity} DESC)

COMMENT ‘for querying by most recent activity’

This is on the simpler side. You can customize any of the following:
Type of index – as discussed above. You can select non-unique (no keyword), unique, fulltext, etc.
Index type – stored as a binary tree or hash.
Misc. options – key block size, special parsers, comments, etc.
Algorithms and locks used
To go deeper, check out our videos on indexes in MySQL:
B+ trees
Primary keys
Secondary keys
Primary key data types
Where to add indexes
Index selectivity
Prefix indexes
Composite indexes
Covering indexes
Functional indexes
Indexing JSON columns
Indexing for wildcard searches
Fulltext indexes
Invisible indexes
Duplicate indexes
Footnotes
1 — Large read queries in production aren’t always common, but you can imagine that this comes up a lot in analytics.
2 — Well, technically ordered in the order in which you inserted them into the table.]]></content>
        <summary><![CDATA[Learn how database indexes work under the hood and how they can be used to speed up queries]]></summary>
      </entry>
    
      <entry>
        <title>Getting started with the PlanetScale CLI</title>
        <link href="https://planetscale.com/blog/getting-started-with-the-planetscale-cli" />
        <id>https://planetscale.com/blog/getting-started-with-the-planetscale-cli</id>
        <published>2022-07-12T14:58:00.000Z</published>
        <updated>2022-07-12T14:58:00.000Z</updated>
        
        <author>
          <name>Brian Morrison II</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Most modern platforms offer an excellent UI designed to get developers up and running quickly, and PlanetScale is no exception. However, a great command-line interface (also known as a CLI) can help developers in ways such as speeding up their tasks with the platform or automating tasks using scripts and DevOps tools.
In this article, we take a look at how you can quickly get up and running with the CLI, as well as perform some of the common tasks within PlanetScale like:
viewing databases
creating databases
running SQL
and deploying schema changes
Let’s jump in!
Connecting to PlanetScale
Before you can use the CLI, make sure you have installed it per the guide on our documentation portal. You can verify that you have the CLI installed by running the following command in the terminal:pscale --version

This should list the version of the PlanetScale CLI you have installed (v0.107.0 at the time of this writing).

Let’s start by connecting to the PlanetScale service. In your terminal, run the following command and you should receive a confirmation code. A browser window should open as well displaying the same code:pscale login

If you don’t already have a PlanetScale account, you can also create one straight from the CLI.


Confirm that the codes match, then click the Confirm code button in your browser. Your terminal will display the message “Successfully logged in.” if done correctly.
Working with Databases
Let’s cover some of the common commands you’ll use when working with databases.
List your databases
To print a list of your databases, run:pscale database list

As you can see, I have a single database created in my account:

For the remainder of this walk-through, we’re going to go through some real-world examples using an internal blogging platform we built with Next.js and Prisma. One of the common entities across most (if not all) blogging platforms is the Post, so let’s use this as a base to create a database and table. Here is the Post data model as defined with an ORM called Prisma:model Post {
  id          Int          @id @default(autoincrement())
  title       String       @db.VarChar(255)
  content     String       @db.Text
  contentHtml String       @db.Text
  hidden      Boolean      @default(false)
  createdAt   DateTime     @default(now())
  updatedAt   DateTime     @updatedAt
  author      User         @relation(fields: [authorId], references: [id], onDelete: Cascade)
  authorId    String
  likedBy     LikedPosts[]
  comments    Comment[]
  @@index([authorId])
  @@fulltext([title, content])
}

Create a database
To create a database, you can run the following command, where <DATABASE_NAME> is the name of the database you want to create:pscale database create <DATABASE_NAME>

In this article, we’ll create and work with a database called cli-db.

MySQL shell
Now, we need to drop into a MySQL shell within the database to create a table. To do this, run the following command:pscale shell cli-db

Your terminal prompt should change to indicate you are now connected to and running commands in the context of the database we just created.

Since this is a new database, we don’t have any tables created yet. Let’s run the following command to create a table that mirrors the Post model.CREATE TABLE `Post` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `content` text NOT NULL,
  `contentHtml` text NOT NULL,
  `hidden` tinyint(1) NOT NULL DEFAULT '0',
  `createdAt` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  `updatedAt` datetime(3) NOT NULL,
  `authorId` varchar(191) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `Post_authorId_idx` (`authorId`),
  FULLTEXT KEY `Post_title_content_idx` (`title`,`content`)
);

Show tables
When you hit enter, you shouldn’t get any output. You can check that the table exists with:SHOW TABLES;


Working with Branches
On top of managing your databases and tables, you can also manage your branches with the PlanetScale CLI.
List all branches
To demonstrate this, start by listing your existing branches on the database we created with:pscale branch list cli-db


Promote a branch
In this example, there is only one branch (main), but it’s currently not flagged as a production branch, so let’s promote it to production using:pscale branch promote cli-db main


Now that we have main set as our production branch, let’s create a branch off of main called dev. Run the following command to create that branch:pscale branch create cli-db dev

You should get a message stating the branch was created successfully.

You can also check the dashboard to verify the branch exists.

Now that you have another branch to work on, let’s modify the schema of the dev branch and merge it into the main branch. Drop into a shell again with:pscale shell cli-db

Since you have multiple branches, the CLI will ask which branch you want to enter. Select dev and hit enter.

Add a new column called tag with the following SQL command in the shell.ALTER TABLE Post ADD tag varchar(255);

You can use the DESCRIBE command to view how the table looks now.DESCRIBE Post;


As you can see, tag is now added to the schema in dev.
Create a deploy request
Now let’s merge the changes in the dev branch into main using by creating a new deploy request using the CLI.pscale deploy-request create cli-db dev

This creates the deploy request for the cli-db, and we’re stating we want to merge the dev branch into the production branch.

List all deploy requests
You can show all active deploy requests with:pscale deploy-request list cli-db


You can also see the deploy requests in the dashboard using the Deploy requests tab.

Merge a deploy request
To finish this off, let’s merge this deploy request into the main branch. In your terminal, run the following where 1 is the deploy request number shown from the previous step:pscale deploy-request deploy cli-db 1



Now you can check the schema of your main branch using the MySQL shell from the CLI. Enter into the shell with:pscale shell cli-db

Select the main branch from the terminal. Describe the Post table again to verify that our changes are now active.

To learn more about the PlanetScale CLI, you can use the CLI Reference page in our docs which lists all of the available commands and how to use them. You can also use pscale --help to list available commands for further help within your terminal.]]></content>
        <summary><![CDATA[Learn how to quickly get up and running with the PlanetScale CLI.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 8 - Closing thoughts</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-8" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-8</id>
        <published>2022-07-07T15:02:00.000Z</published>
        <updated>2022-07-07T15:02:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
We started off this series by challenging the premise that algorithms like Paxos and Raft are foundational to consensus systems. Such a premise would imply that any other algorithm would just be a variation of the original ones. These algorithms are foundational from a historical perspective, but they are not conceptually foundational.
We also showed that these algorithms are too rigid. I feel that they would struggle to adapt to the growing complexities of cloud deployments. FlexPaxos was the first advancement that highlighted that the majority quorum is just a special case of intersecting quorums. And intersecting quorums would allow you to configure systems with more flexibility.
Reconceptualizing consensus systems
In this series, we have attempted to reconceptualize the other parts of consensus systems in the following manner:
Pluggable durability
A consensus system can be designed in such a way that it assumes nothing about the durability rules. These can be specified with a plugin, and the system should be able to fulfill these requirements without breaking integrity. Of course, the requirements have to be reasonable. We covered some examples in part 3.
A system that supports pluggable durability allows you to deploy additional nodes to the system without majorly affecting its performance characteristics. For example, if you had specified the durability requirement as cross-zone, deploying additional nodes to a zone keeps the system behaving mostly the same way.
Revocation and Establishment of leadership
We have reconceptualized a leadership change as a two-step process: revocation and establishment. Intersecting quorums are only one way to achieve this goal. We have shown situations where you could achieve a leadership change by directly asking the previous leader to step down. Following this, all we have to do is perform the necessary steps to establish the new leadership. This approach does not require knowledge of intersecting quorums.
We have also shown that multiple methods can be used to change leadership, and that such methods are interoperable. For example, you could use the direct leadership demotion for planned changes, but fall back to intersecting quorums if there are failures in the system.
Handling races
There are two contrasting approaches to handling races: lock-based and lock-free. The implementations and trade-offs are very different between the two. In general, a lock-free approach (like what Paxos uses) has elegance from the fact that it does not have a time component. However, lock-based approaches offer so many other flexibilities that they win out in real-life scenarios; With lock-based approaches, you can:
Perform graceful leadership changes by requesting the current leader to step down.
Although I didn’t cover this topic, it is easier to add or remove nodes in a system.
You can perform consistent reads by redirecting the read to the current leader.
You can implement anti-flapping rules.
Due to all these advantages, most large scale systems implement a lock-based approach.
Completing and propagating requests
We studied the corner cases of propagating requests, and suggested versioning of decisions as a way to avoid confusion when there are multiple partial failures. The proposal numbers in Paxos and the term numbers in Raft are just one way to version the decisions.
We also showed that many of these failure modes can be completely avoided using anti-flapping rules.
The Vitess implementation
In Vitess, we make full use of the above options and flexibilities. For example, durability rules are a plugin for vtorc. The current plugin API is already more powerful than other existing implementations. You can specify cross-zone or cross-region durability without having to carefully balance all the nodes in the right location.
Additionally, Vitess has a graceful failover mechanism that gets used during software deployment. This automation comes built-in as part of the Vitess Operator.
Vitess allows you to direct reads to the current leader for consistent reads.
There are still a few corner cases that may require human intervention. We intend to enhance vtorc to also remedy those situations. This will put Vitess on full auto-pilot.
In closing
There are still a few topics that could be worth covering:
Failure detection
Consistent reads
Adding and removing nodes
Strictly speaking, these are outside the scope of consensus algorithms, but they need to be addressed for real-life deployments. I can cover these later with some independent posts.
It is possible that consensus could be generalized using a different set of rules. But I personally find the approach presented in this series to be the easiest to reason about.
Feel free to reach out to me on twitter @ssougou if you have comments or questions.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 — Use cases
Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Consensus Algorithms at Scale: Part 5 — Handling races
Consensus Algorithms at Scale: Part 6 — Completing requests
Consensus Algorithms at Scale: Part 7 — Propagating requests
You just read: Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[In the final installment of the consensus algorithm series we pull everything together with some final thoughts.]]></summary>
      </entry>
    
      <entry>
        <title>Deploy requests now alert on potential unwanted changes</title>
        <link href="https://planetscale.com/blog/deploy-requests-now-alert-on-potential-unwanted-changes" />
        <id>https://planetscale.com/blog/deploy-requests-now-alert-on-potential-unwanted-changes</id>
        <published>2022-07-06T15:00:00.000Z</published>
        <updated>2022-07-06T15:00:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We’d like to share a little detail we just shipped to the Deploy Request UI. We now show you if your schema change could result in the loss of any data due to a dropped column or table.

We’re on a mission to make deploy requests the safest and least stressful way to make schema changes.
Earlier this year, we release an in-dashboard feature that lets you revert a production schema change with one click, all while keeping any data that was written in the meantime.
Now, with this small addition to the UI, we hope to further protect developers from mistakes by surfacing warnings straight on the deploy request page.
Inspired by a rename
The inspiration for this feature primarily came from developer's experience renaming columns through PlanetScale. It was a common question that came up: how do I safely rename a column?
A rename is an unsafe operation when it comes to zero downtime deployments.
For a rename to work successfully, you'd have to deploy your schema change and code changes at exactly the same time. This is highly risky, if not impossible to get right.
In cases where you do want to do a rename, here’s the safe way:
Create the new column
Update your application to double write to both the old and new column
Run a script to backfill old data
Update your app to read from the new column
Remove double writes
You can now safely drop the old column
Each step is a deployment of your application. This is certainly a bit of extra work, but it’s the safest way to complete a rename without risk to production or having to take your application down.
Try it
You can give this a try today. Drop a table or column in a deploy request, and you’ll see you get alerted about the risky change.

If you have any feedback, we’d love to hear it. You can find us on Twitter or our GitHub discussion board.]]></content>
        <summary><![CDATA[We’ve updated our Deploy Request UI to alert when a schema change could produce unintended changes]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 7 - Propagating requests</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-7" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-7</id>
        <published>2022-07-01T15:00:00.000Z</published>
        <updated>2022-07-01T15:00:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
We have saved the most difficult part for last. This is where we put it all together. Let us start with a restatement of the requirements for propagation of requests during a leadership change:
Propagate previously completed requests to satisfy the new leader’s durability requirements.
Recap of parts 1-6
We have redefined the problem of consensus with the primary goal of solving durability in a distributed system, and are approaching the problem top-down.
We have shown a way to make durability an abstract requirement instead of the more rigid approach of using majority quorums.
We have defined a high level set of rules that can satisfy the properties of a consensus system while honoring arbitrary (but meaningful) durability requirements.
We have shown that conceptualizing leadership change as revocation and establishment leads to more implementation options that existing systems don’t utilize.
We have also shown that there exist two fundamentally different approaches to handling race conditions, and covered their trade-offs.
In the previous post, we covered how requests are completed as a precursor to analyzing propagation.
You can find links to each article in the series at the bottom of this article.
The simple case
For lock-based systems, and for planned changes, we have the opportunity to request the current leader to demote itself. In this situation, the current leader could ensure that its requests have reached all the necessary followers before demoting itself. Once this is done, the elector performs the leadership change and the system can resume.
We will now look at how propagation should work if there are failures.
Discovering completed requests
If a system has encountered failures, then the elector must indirectly revoke the previous leadership by requesting the followers to stop accepting any more requests from that leader. If enough followers are reached such that the previous leader cannot meet the durability criteria for any more requests, we know that the revocation is successful.
This method, apart from guaranteeing that no further requests will be completed by that leader, also allows us to discover all requests that were previously completed.
All we have to do is propagate those requests to satisfy the new leader’s criteria. But this is not as simple as it sounds.
There are many failure cases that make this problem extremely difficult:
There may be a request that is incomplete. In this situation, the elector may or may not discover this request.
An elector that discovers a tentative request may not be able to determine if that request has become durable.
Propagation of a request can fail before completion.
An elector that does not discover an incomplete request could elect a new leader that accepts a new request, which may fail before completion.
A subsequent elector may discover such multiple incomplete requests.
Another elector may discover only one of the incomplete requests, may propagate it as tentative, and fail before marking it as complete.
A final elector can discover this durable request, and a newer conflicting incomplete request, and may not have enough information to know which one to honor.
Ground rules
To address the above failure modes, let us first look at what we can and cannot do:
An elector must be able to reach a sufficient number of followers to revoke the previous leadership. If this is not possible, the elector is blocked.
An elector need not (and may not be able to) reach all the followers of a leader.
Some more inferences:
An elector is guaranteed to find all requests that have become durable.
If a request was incomplete, an elector may not find it. If not found, it is free to move forward without that request. When that request is later discovered, it must be canceled.
If an elector discovers an incomplete request, it may not have sufficient information to know if that request was actually durable or complete. Therefore, it has to assume that it might have completed, and attempt to propagate it.
If an elector discovers an incomplete request and can determine with certainty that it was incomplete, it can choose either option: act as if it was discovered, or not discovered.
Let us now discuss some options.
Versioning the decisions
It is safe to propagate the latest discovered decision. A decision to propagate a previous decision is a new decision.
We can use the following approach:
Every request has a time-based version.
A leader will create its request using a newer version than any previous request.
An elector that chooses to propagate an incomplete request will do so under a new version.
An elector that discovers multiple conflicting requests must choose to propagate the latest version.
Completed requests do not need versioning.
The above approach solves two difficult corner cases:
If we discover two conflicting requests, it means that the latest request was created because the previous elector did not discover the old one. This essentially means that the old one definitely did not complete. So, it is safe to honor the new elector’s decision.
If we propagate an existing request, it is also under a new version. It will therefore need to satisfy durability requirements under the new version without conflating itself with the old version.
Paxos uses proposal numbers to version its decisions, and Raft uses leadership term numbers.
But you can use other methods for versioning. For example, one could assign timestamps for the requests instead of using leadership terms or proposal numbers.
Anti-flapping rules
Most large-scale systems have anti-flapping rules that prevent a leadership from changing as soon as one was performed. This is because such an occurrence is usually due to a deeper underlying problem, and performing another leadership change will likely not fix it. And in most cases, it would aggravate the underlying problem.
In one of the systems that I knew of, the payload of the request was so big that it was causing the transmission to timeout. This resulted in a failure being detected and caused a leadership change. However, the new leader was also incapable of completing the request due to the same underlying problem. The problem was ultimately remedied by increasing the timeout.
Serendipitously, anti-flapping rules also mitigate the failure modes described above. Versioning of in-flight requests is less important for such systems.
MySQL and Vitess
The MySQL binlogs contain metadata about all transactions. They carry two pieces of relevant information:
A Global Transaction ID (GTID), which includes the identity of the leader that created the transaction.
A timestamp.
This metadata is faithfully propagated to all replicas. This information is sufficient to resolve most ambiguities if conflicting transactions are found due to failures.
However, the faithful propagation of the transaction metadata breaks the versioning rule that the decision of a new elector must be recorded under a new timestamp.
The Orchestrator, which is the most popular leadership management system for MySQL, has built-in anti-flapping rules. These rules mitigate the above failure modes. This is the reason why organizations have been able to avoid split-brain scenarios while running MySQL at a massive scale.
In Vitess, we use VTorc, which is a customized version of the Orchestrator, and we inherit the same safeties. But we also intend to tighten some of these corner cases to minimize the need for humans to intervene if complex failures ever happen to occur.
Stay tuned for part 8 of the series, where we will pull everything together and conclude the series with some final thoughts.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 — Use cases
Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Consensus Algorithms at Scale: Part 5 — Handling races
Consensus Algorithms at Scale: Part 6 — Completing requests
You just read: Consensus Algorithms at Scale: Part 7 — Propagating requests
Next up: Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[In part 7 of the Consensus algorithm series we combine everything we’ve worked at to cover propagating requests]]></summary>
      </entry>
    
      <entry>
        <title>Identifying slow Rails queries with sqlcommenter</title>
        <link href="https://planetscale.com/blog/identifying-slow-rails-queries-with-sqlcommenter" />
        <id>https://planetscale.com/blog/identifying-slow-rails-queries-with-sqlcommenter</id>
        <published>2022-06-29T15:23:56.168Z</published>
        <updated>2022-06-29T15:23:56.168Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        <author>
          <name>Iheanyi Ekechukwu</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[In a large Rails application, it can be tricky to track down where a slow query is coming from in the app.
Solutions to this problem have been steadily improving. First, we had the Marginalia gem, which adds comments to all your queries. This allows you to see which controller or job a query came from by reading your logs.
As of Rails 7, Marginalia has become a native feature to Rails. No more gem needed. While this is a great improvement, we can still take our Rails query comments one step further.
Rails + sqlcommenter
Sqlcommenter is a query comment format created by Google and widely adopted by many tools and languages. It’s more easily read by machines than the default format currently used by Rails.
Default Rails:SELECT * FROM `users` ORDER BY `users`.`id` DESC LIMIT 1
/*application:Api,controller:users,action:show*/

sqlcommenter:SELECT * FROM `users` ORDER BY `users`.`id` DESC LIMIT 1
/*application='Api',controller='users',action='show'*/

It’s a small change that makes our query comments machine-readable and more valuable in logging and performance monitoring tools.
Enabling in Rails 7
To make it easier to use sqlcommenter with Rails, we’ve created a gem for Rails 7 that enables sqlcommenter.
To try it out, add the following to your Gemfile:gem "activerecord-sql_commenter", require: "active_record/sql_commenter"

Then, in your Rails config/application.rb file, enable query log tagging:# config/application.rb
config.active_record.query_log_tags_enabled = true
config.active_record.query_log_tags = [ :application, :controller, :action, :job ]
config.active_record.cache_query_log_tags = true

You can learn about each config option in the Rails Query Logs documentation here.
Testing it out
Once set up, you can open up a Rails console and run a query to test it out.$ rails console
[1] pry(main)> User.first

You should see your application name in sqlcommenter format. User Load (0.6ms)  SELECT `user`.* FROM `user` ORDER BY `user`.`id` ASC LIMIT 1 /*application='ApiBb'*/

Using annotate
If you need even more detail for a specific query, Rails 7 also added the annotate method, which lets you add a comment to a query.
For example, the following query will add source='user_metrics_runner' as a comment:[3] pry(main)> User.where(name: "iheanyi").annotate("source='user_metrics_runner'")
  User Load (0.5ms)  SELECT `user`.* FROM `user` WHERE `user`.`name` = 'iheanyi'
/* source='user_metrics_runner' */

This is useful in situations where the default query log tags aren’t enough.
Using with PlanetScale Query Insights
PlanetScale Query Insights, our built-in query debugging and analysis tool, is compatible with sqlcommenter. Any query that takes over 1 second to run will get recorded and tagged with the values you’ve set in your sql comments.
For example, here is a slow query from our own application:SELECT schema_snapshot.* FROM schema_snapshot WHERE schema_snapshot.ready = true AND created_at > :created_at AND schema_snapshot.deleted_at IS NULL ORDER BY schema_snapshot.id ASC LIMIT 10000
/*application='ApiBb,job='ScheduleSnapshotJob'*/

Using Insights and tags on the slow query, we were able to find exactly where this query was coming from. This enabled us to quickly find and fix the issue in our application.

You can try out Query Insights today by signing up for a PlanetScale account and navigating to "Insights" in the dashboard.
If you're using a Rails application, be sure to check out the Rails + PlanetScale quickstart and our Rails sqlcommenter gem. This powerful combination, Rails + sqlcommenter + Insights, can greatly improve your query debugging experience.
Learn more
Rails Query Logs
PlanetScale Query Insights
Sqlcommenter
activerecord_sql-commenter
Rails PlanetScale quickstart]]></content>
        <summary><![CDATA[Learn how to use sqlcommenter with Rails]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 14</title>
        <link href="https://planetscale.com/blog/announcing-vitess-14" />
        <id>https://planetscale.com/blog/announcing-vitess-14</id>
        <published>2022-06-28T15:50:00.000Z</published>
        <updated>2022-06-28T15:50:00.000Z</updated>
        
        <author>
          <name>Vitess Engineering Team</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[We are pleased to announce the general availability of Vitess 14.
In this new release, major improvements have been made in several areas of Vitess, including usability and reliability:
Online DDL is now GA
Gen4 planner is the new default planner
VTAdmin and VTOrc are officially in beta with Vitess 14
Usability
Command-line syntax deprecation
This release marks the beginning of Vitess standardizing its command-line and flags syntax. Some former syntaxes have been deprecated and will break in the next release. For details, as well as migration instructions, please refer to the release notes.
VtctldServer and client
The new gRPC API for vtctld cluster management, VtctldServer, is ready for use. We are targeting Vitess 15 to begin deprecating the old interface, so users should begin transitioning now. Refer to the grpc-vtctld documentation for how to enable the new service.
Vitess 14 also provides a new vtctld client (vtctldclient) to correspond to the new gRPC server interface. After enabling the new service, users may begin using the new client for executing cluster management commands. Please refer to the client documentation for the list of available commands and their options. Both vtctldclient and the legacy vtctlclient provide shim mechanisms to use each other's CLI syntaxes to ease the transition, which is described in the transition documentation. Just as with the legacy service, we are targeting Vitess 15 to begin deprecating vtctlclient, so users should begin transitioning now.
VTAdmin
Vitess 14 includes the beta release of VTAdmin, the next generation of cluster management API and UIs for Vitess. VTAdmin provides a single control plane to manage multiple Vitess clusters and will replace the legacy VTCtld Web UI. We are targeting Vitess 15 for general availability, so we encourage users to try out VTAdmin and provide feedback in this release cycle. A guide on how to configure and run VTAdmin is available on the website.
Note that the new grpc-vtctld service is required for VTAdmin to make RPCs to the clusters you want to manage, so you must run your vtctld components with that service enabled.
Those interested in the details can read the original architecture RFC and join the #feat-vtadmin channel in the Vitess Slack.
GA announcements
Online DDL
Vitess-native and gh-ost-based online DDL functionality is now GA. pt-osc is still considered experimental, mainly because there has not been sufficient adoption or feedback from the community.
Online DDL has many other improvements in this release. Please refer to the release notes for details.
Query planner
The Vitess team started working on a new query planner two years ago for several reasons. This query planner, called Gen4, is the default in Vitess 14. It replaces the older query planner called V3. Please be sure to read the related section of the release notes if you want to learn more or switch back to V3. The new planner has enabled us to add support for many more queries. Some examples of new query support include UPDATE/INSERT from SELECT and cross-shard aggregation queries.
Reliability
VTOrc
VTOrc remains experimental in Vitess 14. In this release, the work to make VTOrc a first-class component of Vitess is taken a step further:
VTOrc now integrates cleanly with VTCtld and running cluster operations from VTCtld does not cause VTOrc to take unnecessary actions
Federation has been addressed in this release. It is now possible to run multiple instances of VTOrc watching the same set of keyspaces without them interfering with each other
The durability policy configuration has been refactored. Instead of being provided as command-line configuration, it is now stored in the topology server. Both VTOrc and VTCtld will read it from there and honor the provided durability policies.
Emergency Reparent Shard's capabilities have been augmented to now allow for more than one failure based on the durability policies set for the keyspace.
You can follow the progress of VTOrc by watching the original RFC and the durability RFC.
Performance
Our benchmarking system, arewefastyet, benchmarked this new version of Vitess. The comparison between v14.0.0 and v13.0.0 is available on the Vitess Benchmark page. We can observe a performance improvement of about 10%. This improvement mainly comes from the removal of internal SAVEPOINT query execution.
Please download Vitess 14 and try it out! Issues can be reported via GitHub.]]></content>
        <summary><![CDATA[Learn about what was just released in Vitess 14]]></summary>
      </entry>
    
      <entry>
        <title>Grouping and aggregations on Vitess</title>
        <link href="https://planetscale.com/blog/grouping-and-aggregations-on-vitess" />
        <id>https://planetscale.com/blog/grouping-and-aggregations-on-vitess</id>
        <published>2022-06-24T14:55:30.336Z</published>
        <updated>2022-06-24T14:55:30.336Z</updated>
        
        <author>
          <name>Andres Taylor</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[I love my job. One of the best feelings is when I find an interesting paper and use it to solve a real problem. It feels like I found a cheat code. Instead of having to do a lot of hard thinking, I can just stand on the shoulders of really big people and take a shortcut. Here, I want to share a recent project that I could solve using a public paper.
Sharding databases
Vitess is a database proxy that creates an illusion of a single database, when in reality the query is sent to multiple MySQL instances. This is called sharding.
Vitess is not just a dumb proxy layer though — it can also run some of the operations instead of sending them on. We want to delegate as much work as possible to MySQL — it is much faster than Vitess at doing all the database operations, including grouping and aggregation. When possible, we want work to be done there. While planning a query, the planner tries pushing as much work down to MySQL as possible. Sometimes it’s not possible to push work into any single MySQL instance because no single instance has all the data necessary.
In these cases, we can actually perform most of the normal database operations at the proxy level (called VTGate in Vitess lingo) — we can do joins, filter out rows, sort data, and much more. We can also do grouping and aggregation in VTGate. We have a module, called 'evalengine', that is built to exactly mimic the logic of MySQL expressions. So when needed, we can do almost anything on the VTGate, we just need the base data from MySQL.
Think globally, act locally
Back to aggregations across shards. Let’s say you have a user table that is too large to fit into a single database, so you have sharded it. Now a Vitess user asks for the number of users in the whole logical, sharded database. We could fetch all the rows and just count them, but that would be slow and inefficient. So instead we break aggregation into local and global aggregation. The local part is what we can send down to MySQL, and the global aggregation is aggregating the aggregates. So, if the user asked for SELECT count(*) FROM user, Vitess will send down a count(*) to each shard, and then do a SUM on all incoming answers.
This is something that Vitess has been able to do for a long time. But if you had joins or subqueries or anything else other than a simple SELECT ... FROM ... GROUP BY, with a single table, most of the time you were out of luck.
During one of our paper reading sessions, we looked at the paper Orthogonal Optimization of Subqueries and Aggregation, by Cesar A. Galindo-Legaria and Milind M. Joshi from Microsoft. It talks about how it’s sometimes preferable to do aggregation before performing joins. In some cases this could save on how much work the join operator had to do and so lowered the total cost of the plan. In the paper, they spent some time talking about what needed to be done to be able to push aggregation under a join.
To us, this was exactly what we were looking for. We had to do the join at the VTGate — no going around that fact. But by using the algorithm described in this paper, we were able to break the aggregation into smaller pieces (local aggregates) that could be pushed down under the join to the MySQL layer.
An example would be helpful here
So, what is the secret sauce? How do you push down count(*) under a join? I’ll use a very simple database as an example.
The database has two tables: order and order_line.
order 
id
int
office
varchar
order_line 
id
int
order_id
int
amount
float
Each order comes from a single office, and each order can have one or more order_line corresponding rows. The database is sharded, and when sharding one has to choose a sharding key. This is the column value that will be used to decide which shard the row should live in.
order is shared by id, and order_line is sharded by its own id. If it was sharded by order_id, the join could be pushed down to MySQL, since we would know that the corresponding rows existed in the same shard. Unfortunately, it isn’t, so we will have to do joins between these two tables at the VTGate level.
Let’s use this example query. It creates a report with how much has been sold per office:SELECT order.office, sum(order_line.amount)
FROM order JOIN
    order_line ON order.id = order_line.order_id
GROUP BY order.office

Route is the operator that sends a query to one or more shards. The order and the order_line join cannot be merged into a single route, so we have to do the joining and some of the aggregation at the VTGate level. The routing planner has decided that the best plan is to first query the order table and then for each row in this table, we’ll issue a query against the order_line table. So after planning how to send the queries, we have this plan:

This is a VTGate execution plan for the query above. Everything under a Route is going to be sent to the underlying MySQL as a single query. Everything above the route is evaluated at the VTGate level. The plan so far says that we’ll have to send a scatter query to the orders keyspace, hitting all shards.
The join is a nested loop join, which means that we’ll execute the query on the left-hand side (LHS) of the Join, and using that result we’ll issue queries on the right-hand side (RHS), one query per row. Now it’s time to do the aggregation planning.
We’ll take the example query and go over it from back to front. While doing this, we’ll figure out what we should send to the LHS of the join.
The original query was grouping on order.office — we can keep that column in the LHS grouping.

Since we are doing a join on order.id, we need to add that column to the select list and to the grouping. Otherwise, this column would not be available to the join.

The SUM aggregation can’t be sent to the LHS — we’ll send that to the RHS.

Since we are grouping on the left side, we need to keep track of how many rows were included in each group. It’ll make more sense when we later use these numbers to produce the final result. So the execution plan so far looks like:

Show me the results
To make it easier to follow, I’ll show what each operator will produce, and how we go about merging the separate results into the result the user asked for.
The query on the LHS route will produce results that look something like this:
order.office
order.id
count(*)
1
1
2
2
2
3
Ignore the fact that we have multiple rows per order.id. Not really important, it’s just so we can have a more interesting result to work with.
From these two rows, VTGate will issue two queries against the RHS, only changing the __order_id argument between the two.
The two results will be:
For order.id = 1,
sum(order_line.amount)
5
3
For order.id = 2,
sum(order_line.amount)
10
7
So finally, the join will produce the joined results:
order.office
count(*)
sum(order_line.amount)
1
2
5
1
2
3
2
3
10
2
3
7
It’s not returning order.id, since we only needed it for the join. This is still not the result we want. The user did not ask for count(*), and the grouping looks wrong. We can’t return multiple rows with the same order.office value.
The next step is to combine the count(*) from the LHS, and the sum(order_line.amount) from the RHS. We simply multiply them together. This is what the Project operator will take care of; it allows the use of the evalengine mentioned above to evaluate arithmetic operations at the vtgate level.

The results coming out from the Project operator will look something like this:
order.office
sum(order_line.amount)*count(*)
1
10
1
6
2
30
2
21
Finally, we just have to do a bit of grouping and sum the sums.
order.office
sum(sum(order_line.amount)*count(*))
1
16
2
51
This is the result that the user asked for.
The final plan ended up being:

Parting words
This experience is one I’ve had many times in the past. Someone out there has done a ton of work on something closely related to what we are doing, and all we have to do is adapt the algorithm to our circumstances. For the type of work that we are doing, trying to keep up to date with academia just makes sense.
More often than not, we are not even actively looking for a solution when we stumble across it while reading papers. If I remember correctly, I suggested this paper because I was looking for a way to rewrite subqueries to other operations, and came across the splitting of aggregations across joins. If you are curious, review vitessio/vitess #9643.]]></content>
        <summary><![CDATA[Vitess is a database proxy that creates an illusion of a single database when in reality the query is sent to multiple MySQL instances. ]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 6 - Completing requests</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-6" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-6</id>
        <published>2022-06-21T17:06:12.714Z</published>
        <updated>2022-06-21T17:06:12.714Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
The completion of requests is a relatively straightforward topic. However, there are a few concepts to cover before we jump into the more complex part of propagating requests during a leadership change.
Recap of parts 1-5
We have redefined the problem of consensus with the primary goal of solving durability in a distributed system, and are approaching the problem top-down.
We have shown a way to make durability an abstract requirement instead of the more rigid approach of using majority quorums.
We have defined a high level set of rules that can satisfy the properties of a consensus system while honoring arbitrary (but meaningful) durability requirements.
We have shown that conceptualizing leadership change as revocation and establishment leads to more implementation options that existing systems don’t utilize.
We have also shown that there exist two fundamentally different approaches to handling race conditions, and covered their trade-offs.
You can find links to all previous blog posts of the series at the bottom of this article.
Consensus system requirements
The primary requirement of a consensus system is that it must not forget a request it has acknowledged as accepted. To fulfill this requirement, the leader must transmit the request to enough nodes such that the durability requirements are satisfied. On the other hand, there must exist a mechanism to cancel the request if the operation fails before the durability requirements are met.
Two-phase protocol
The above requirements can be met by introducing a two-phase protocol. The leader first transmits the payload of the request as tentative to all the nodes. A tentative request is one that can later be completed or canceled.
A follower that is responsible for a leader’s durability should acknowledge receipt of tentative requests. Once the leader receives the necessary acknowledgements from its followers, the request has become durable and cannot be canceled. The leader can then issue messages to complete the tentative request.
We can view a request as having three stages:
Incomplete: The request is in-flight and has not met the durability requirements. Such a request is marked as tentative among the followers. It can later be completed or canceled.
Durable: The request is in-flight, but has met the durability requirements. This is an implicit, but important stage. We can trust that a durable request will never be canceled.
Complete: The request is complete. Followers can mark the request as complete and perform any post-completion materializations as needed.
An optimization: Once a request becomes durable, the leader is free to transmit that request as complete to followers that have not yet received the message as a single step.
A leader that fails to make a request durable can keep retrying, but it must not attempt to cancel the request. This is because an elector could be performing a leadership change and may attempt to propagate the failed request. We will analyze this process in the next post.
Completion and cancellation
Completion and cancellation are mutually exclusive: A request that was completed will never be canceled, and a request that was canceled will never be completed.
When a follower receives a message to complete a request, it can perform the necessary steps to materialize the effects of the request. For example, if the request was meant to change the value of a variable, this change can now be applied.
If a cancellation message is received, the follower can delete the request, as if it never took place.
Timing the response
A leader could respond to the client with a success message as soon as it has become durable. However, it has the option of delaying the acknowledgement until it has also sent the completion message to the followers. Waiting until completion costs two round-trips and is therefore slower than an early response. On the other hand, it improves the performance of quorum reads. This trade-off may be necessary for systems that choose to implement reads using the quorum method. This is a bigger topic that would need a separate post.
For systems that use lock-based failovers, reads can be sent to the current leader instead of performing quorum reads. This allows for the leader to respond as soon as it has received the necessary acknowledgements.
Completion of requests in MySQL
The MySQL semi-sync protocol does not support this two-phase method of completing requests. When a replica receives a request, it immediately applies it. This behavior introduces some corner cases that require mitigation. I have covered some of those in an older post: Distributed durability in MySQL.
There is another MySQL behavior that is problematic: A primary that is restarted after a crash completes all in-flight requests without verifying that they received the necessary acks. This could lead to “split-brain” scenarios.
If time permits, I will make a separate post to cover more details, and on how to handle these scenarios.
Conclusion
In the next post, we will look at request propagation, which will tie all of this together.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 — Use cases
Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Consensus Algorithms at Scale: Part 5 — Handling races
You just read: Consensus Algorithms at Scale: Part 6 — Completing requests
Next up: Consensus Algorithms at Scale: Part 7 — Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[In part 6 of the Consensus algorithms series we look at how to handle request completions]]></summary>
      </entry>
    
      <entry>
        <title>Introducing PlanetScale Insights: Advanced query monitoring</title>
        <link href="https://planetscale.com/blog/introducing-planetscale-insights-advanced-query-monitoring" />
        <id>https://planetscale.com/blog/introducing-planetscale-insights-advanced-query-monitoring</id>
        <published>2022-05-26T14:01:00.000Z</published>
        <updated>2022-05-26T14:01:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[If you’ve ever experienced the frustration of debugging slow or costly queries, we’ve got a treat for you. With PlanetScale Insights, you can now analyze and debug individual query performance during specified time frames without leaving your PlanetScale dashboard.
What is PlanetScale Insights
Insights is the next generation of our in-dashboard query statistics tool. This update brings an interactive graph that maps your PlanetScale database query performance to selected time frames for quick debugging. We’re also introducing a new section that shows important metrics for all active queries running against the database over the past 24 hours.


Insights time-based graph
With the new Insights graph, you can monitor the following metrics at a glance:
Rows read
Rows written
Query latency
Queries per second
You can hover over a point on the graph to see the metrics in 10-minute increments. This will surface the total rows read, rows written, latency, or queries per second during the selected period.
Additionally, deploy requests will show up on the graph, allowing you to quickly see the impact of the latest schema changes.
Queries during the last 24 hours
We’ve also added a new section that shows all queries that have run against your database over the past 24 hours.
If you have a lot of active queries, you can zoom in on a shorter time frame by dragging your cursor on the graph to select the period you want to analyze.
The following metrics are available for each query:
Number of times the query has run
Total time the query has run
Time per query
Rows returned
Rows read
Rows affected
We’re especially excited to surface the rows read per query! One component of PlanetScale billing includes total rows read per month, so Insights is a great resource to help you monitor how your queries affect rows read.
You can also click on a query to drill down even further. On the individual query page, you’ll find slow instances of the query, if applicable. We’ve also included a new UI for generating an EXPLAIN plan for each individual query, with the option to open it up in the web console as well.

Pricing and availability
Insights is included with all databases at no additional cost, with 7 days of query data retention.
To access Insights, select your database in the PlanetScale dashboard and click the "Insights" tab.
Insights in action
To help drive the power of Insights home, let’s see it in action.
In this example, we’ll walk through a practical use case using our very own database for an internal blogging platform.
Say we deployed a schema change to the production database and later noticed the posts page is loading slower than usual. Instead of spending hours searching for the offending query, we come to the Insights page to begin debugging.
We first look at the graph to identify any anomalies, such as any spikes in rows read or rows written. If this performance impact coincided with a deployment, the deploy request would show up on the graph next to the spike. This provides direct insight into what schema change caused the performance issues.

Based on the graph, we can see that there was a large spike right at the time we merged deploy request #289. This information gives us more confidence that this performance impact is indeed a database issue.
Next, we can drill down further by looking at the "Queries during the last 24 hours" table. We’ll sort by "Rows read" to quickly identify our largest queries and attempt to find the one that matches the spike on the graph.

Up at the top, we see one query that stands out. At 1,043,729 rows read, it’s reading many more rows than we expect. Clicking on the query brings up the detailed view, which includes a list of all instances of the query running slowly in the past 24 hours.
Taking a closer look, we notice that this query has been running slowly since the previous deployment that we saw on the graph. We can open that deploy request to see what changed.

We have now identified the problematic query and linked it back to a schema change from our deploy request. With this information, we can create a new development branch where we can attempt to fix the issue. With Insights, we can see performance metrics for any branch, including development branches. As we’re developing the fix, we can come back to the Insights graph to confirm that our changes worked.
Finally, after deploying it to production, we come back to the Insights page to confirm production performance is back to normal.
Try it yourself
You can give Insights a spin today in the PlanetScale dashboard. You can then access Insights by clicking the “Insights” tab from your database overview page.
For more information, check out our Insights documentation. We love hearing from you! If you have any questions or feedback, don’t hesitate to contact us.]]></content>
        <summary><![CDATA[Insights gives you a faster way to debug and monitor your PlanetScale database queries]]></summary>
      </entry>
    
      <entry>
        <title>Extract, load, and transform your data with PlanetScale Connect</title>
        <link href="https://planetscale.com/blog/extract-load-and-transform-your-data-with-planetscale-connect" />
        <id>https://planetscale.com/blog/extract-load-and-transform-your-data-with-planetscale-connect</id>
        <published>2022-05-25T15:00:00.692Z</published>
        <updated>2022-05-25T15:00:00.692Z</updated>
        
        <author>
          <name>James Q Quick</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[There are many reasons you may need to move and/or transform your application data: to improve database performance, consolidate data, provide access to other teams in your organization to safely query data, and other reasons specific to your use case. This may seem straightforward to set up yourself, but this can be tedious and difficult to get right without impacting your production database. With PlanetScale Connect (now in Beta), you can easily perform ELT (Extract, Load, Transform) actions on your data to fulfill your application needs.
What is PlanetScale Connect?
With the PlanetScale Connect, you can integrate with existing ELT platforms to extract data from your PlanetScale database and safely load it into other destinations for analysis, transformation, and more. For the initial release of this feature, we will support Airbyte Open source as the ELT tool of choice, with plans to expand on this in the future.
Within Airbyte, you’ll be able to select your PlanetScale database as a source. Then, you’ll choose from hundreds of connectors (full list of Airbyte connectors), including Google BigQuery, AWS Redshift, Snowflake, and more. During this configuration, you can perform transformations on your data before loading it into its final destination. This gives you complete control to migrate your data, transform it, and upload it to a new data source with just a few clicks and configurations.
Benefits of ELT pipelines
For additional context into our PlanetScale Connect launch, let’s examine some key benefits of implementing an established ELT pipeline.
Context
Offloading your application data to a more suitable data store improves how you maintain and query historical data. For example, your production application may only need readily available data from the previous two months. This means you can offload older data to a different data store that can be queried against without impacting the performance of your main application.
Consolidation
Oftentimes, not every single piece of data that gets stored in a database is needed forever. In these cases, ELT provides a prime opportunity to get rid of unnecessary data during the transformation phase before it is loaded into the new data source.
Data enrichment
In addition to consolidating data, you may also find yourself in need of enriching data as part of the transformation process. For example, you may grab additional data from internal and/or external APIs to add additional context and detail to your existing data.
Productivity
After creating an ELT pipeline that generates the desired outcome, there is no more manual intervention necessary for the process to continue. Your team can continue to work on the highest priority items while ensuring your data pipeline continues to run.
Accuracy
By leveraging an ELT pipeline, you can guarantee your data is always consistent and accurate. This provides the flexibility for upstream application schemas to change while maintaining a consistent format for downstream applications.
How It Works
For PlanetScale Connect to function as a source for an ELT platform, it needs to address three key issues.
Schema discovery
ELT sources should support discovering the schema across all keyspaces in a PlanetScale database and return that in the myriad of formats the ELT tools expect (specially-formatted JSON documents in most cases).
Initial data dump
ELT sources should be able to efficiently return a full data dump of a PlanetScale database. This is incredibly important considering the negative impact an inefficient solution would have on a production database.
Incremental data synchronization
ELT sources should be able to handle the concept of “incremental sync” where it maintains a cursor to describe where and when the data was last synced. This would then be used to query only data that has changed or been added since the previous sync.
Want to see it in action?
If you’d like to try out PlanetScale Connect or just want to learn more, refer to the PlanetScale Connect docs. In the meantime, if you have any feedback on the feature, please let us know.]]></content>
        <summary><![CDATA[Use PlanetScale Connect to easily perform ELT (Extract, Load, Transform) actions on your data to fulfill your application needs.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing PlanetScale Portals: Read-only regions</title>
        <link href="https://planetscale.com/blog/introducing-planetscale-portals-read-only-regions" />
        <id>https://planetscale.com/blog/introducing-planetscale-portals-read-only-regions</id>
        <published>2022-05-24T15:00:10.621Z</published>
        <updated>2022-05-24T15:00:10.621Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[If you deploy your applications in regions around the globe and need to keep database read latency low, you will want to physically have a database nearby. With PlanetScale Portals, you can now create read-only regions to support your globally distributed applications and better serve your users worldwide.
Put your data where your users and applications are
Portals allow you to read data from regions closest to where you globally deploy your applications. Whether your application is deployed near Northern Virginia, Frankfurt, São Paulo, or any of our other PlanetScale regions, Portals can now provide lower read latency for your applications!
Today, each database in PlanetScale reads and writes from a single region. But with Portals, you can add as many distributed read-only regions as you want.
Without PlanetScale, it can be a hassle to set up a globally deployed database on your own. You might have to do capacity planning, deal with complicated pricing structures, and worry about your replication strategy or performance. Instead, with a couple of clicks, allow PlanetScale to operate your additional read-only regions for you.
An example of the power of Portals
To make this more concrete, let’s look at the read latency between different regions:
For an application deployed to Frankfurt, talking to a Northern Virginia database can add nearly an extra ~90ms PER query. By adding a read-only region in Frankfurt, we can reduce that to ~3ms per query. (Data based on select * from books limit 10.)
Connecting to your new read-only regions
You connect to your new read-only regions with a connection string, just like connecting to any other PlanetScale database branch or other MySQL databases. This connection string is specific to both your read-only region and production branch.
Let’s break down how this might look like in a Ruby on Rails application deployed in both São Paolo and Frankfurt.
Code example
While the following example uses Ruby on Rails, depending on your application framework and how you deploy your application, there are often similar solutions for other technology stacks. For example, read the Fly.io and PlanetScale guide on using Fly's Global Application Platform alongside PlanetScale’s read-only regions to deploy database regions close to your applications.
In a Rails application, you can set up a connection to your read-only region.
First, modify your database.yml to include both a primary and read-only region connection.default: &default
  adapter: trilogy
  encoding: utf8mb4
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  username: root
  password:
  socket: /tmp/mysql.sock

development:
  primary:
    <<: *default
    database: multi_region_rails_development
  primary_replica:
    <<: *default
    database: multi_region_rails_development
    replica: true

test:
  primary:
    <<: *default
    database: multi_region_rails_test
  primary_replica:
    <<: *default
    database: multi_region_rails_test
    replica: true

This will allow you to send queries to your read-only region or take advantage of Rails "automatic role switching" to route queries for you.ActiveRecord::Base.connected_to(role: :reading) do
  books = Book.where(author: "Taylor")
  # all code in this block will be connected to the read-only region
end

You can set up your production application to connect to your nearest PlanetScale region for reads. This will result in your app having low-latency reads.
In this example, we have our connection details stored in Rails credentials.<%
  # Our application has a region environment variable.
  # We check this variable and connect to the closest DB region.
  region = ENV["APP_REGION"]

  # When in Frankfurt, we use our Frankfurt region.
  # When in São Paolo, => São Paolo region.
  region_replica_mapping = {
      "fra" => Rails.application.credentials.planetscale_fra,
      "gra" => Rails.application.credentials.planetscale_gra
  }

  # If no specific region exists, we’ll connect to the primary.
  db_replica_creds = region_replica_mapping[region] || Rails.application.credentials.planetscale
%>

production:
  primary:
    <<: *default
    username: <%= Rails.application.credentials.planetscale&.fetch(:username) %>
    password: <%= Rails.application.credentials.planetscale&.fetch(:password) %>
    database: <%= Rails.application.credentials.planetscale&.fetch(:database) %>
    host: <%= Rails.application.credentials.planetscale&.fetch(:host) %>
    ssl_mode: verify_identity
  primary_replica:
    <<: *default
    username: <%= db_replica_creds.fetch(:username) %>
    password: <%= db_replica_creds.fetch(:password) %>
    database: <%= db_replica_creds.fetch(:database) %>
    host: <%= db_replica_creds.fetch(:host) %>
    ssl_mode: <%= Trilogy::SSL_VERIFY_IDENTITY %>
    replica: true

Once this is in place, we can now have our globally deployed app read data from our globally deployed database. This will result in much faster GET requests for anyone in that region. Any writes will still go to the primary.
Automatic role switching and reading your own writes
We can take this one step further by having all our read queries hit the read-only region without specifying it in our code. We can also tell Rails to read from our primary if the user recently wrote to the database. This protects our users from ever reading stale data due to replication lag.
To do this, we need to set reading/writing roles for our models:# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class

  connects_to database: { writing: :primary, reading: :primary_replica }
end

Then we can enable automatic role switching by adding the following to our production config.# config/environments/production.rb
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session

This tells Rails to send all reads to our read-only region and writes to our primary. After each write, it will set a cookie that will send all reads to the primary for 2 seconds, allowing users to read their own writes.
(Also, thank you to PlanetScale software engineer, Mike Coutermarsh, for help with the Ruby on Rails code in this section.)
Pricing
Any database on a Base or Enterprise plan can create read-only database regions. The pricing for Portals is based on storage costs and row reads.
Storage costs
Your storage costs will increase linearly with the number of read-only regions you purchase. For example, if your production branch is 10GB, each read-only region added will increase your total storage cost by 10GB. Portals’ storage costs are prorated by month. If you added a read-only region to your 10GB branch on the 15th, you’ll get billed for 5GB of usage.
Adding new read-only regions will always be billed as standalone storage and will not count toward your included storage.
Row reads
Queries issued to your read-only region will contribute to your total billable row reads per month. To make it easier to track the cost, your invoice details will show a new line for rows read from any read-only region.
Try it out today
PlanetScale Portals is available in beta today. You can create a new read-only region in any PlanetScale database on a Base or Enterprise plan.
Sign up or log into your PlanetScale account and go to your database’s production branch page to add a region. Read more in the Portals docs.
If you have feedback, tweet at us @planetscale or post in our GitHub Discussion group.]]></content>
        <summary><![CDATA[Put your data where your users and applications are.]]></summary>
      </entry>
    
      <entry>
        <title>The operational relational schema paradigm</title>
        <link href="https://planetscale.com/blog/the-operational-relational-schema-paradigm" />
        <id>https://planetscale.com/blog/the-operational-relational-schema-paradigm</id>
        <published>2022-05-09T17:48:00.000Z</published>
        <updated>2022-05-09T17:48:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[The relational model has been in existence for over forty years, a rare feat in the software development world. Relational databases commonly serve as backends for small, medium, to the largest apps and products in the world. And while relational databases have optimized well for speed, concurrency, latency, and overall read/write performance, they have not adapted as much for metadata changes at scale. Specifically, many organizations are struggling to keep development velocity, agility, and confidence when deploying schema changes.
Thirty years ago, developers would plan a schema change months ahead. One would only deliver a handful of changes a year. Developers would work with the database administrators to approve schema changes and plan the transition into the new model. Companies would take the system down for maintenance, sometimes for hours or days, to apply those changes.
The current landscape
Today, those maintenance windows are unacceptable to most organizations. Users expect services to be highly available and operational around the clock. On the other hand, today’s developers are used to accelerated deployment flows and want to continuously deploy schema changes, also known as schema migrations, sometimes multiple times per day.
But relational databases have not stepped up to meet developers’ needs. Schema changes pose an operational barrier to continuous deployment and remain alien to developers’ flows. As a result, developers regard relational databases unlike other production systems and evolve patterns to try and minimize schema migrations or avoid them altogether by modifying their code in suboptimal manners. Schema deployments for large tables frequently remain a manual endeavor and are considered risky operations.
We believe this to be the result of negligence; that relational databases can and should meet modern development practices for schema deployments, thus allowing for more automation, control, velocity, and, as result, confidence in the process.
The suggested paradigm
We believe the following core tenets to be essential to schema migrations.
Non-blocking
Some relational databases, and for some types of schema migrations, place a write lock on the migrated table, effectively rendering it inaccessible to the app. This in turn commonly manifests as an outage scenario. An ALTER TABLE migration for large tables can be measured in hours or even days. These blocking migrations are unacceptable to modern development flows and modern apps, and databases must offer non-blocking migrations that allow full access to the migrated table throughout the operation.
Lightweight
Even when available, non-blocking schema changes are typically aggressive in resource consumption and will attempt to utilize as much disk IO operations, memory, and CPU to run to completion. This competes with resources needed by the apps and often leads to degraded app performance. Schema changes should be able to yield to the app’s needs.
Asynchronous
Atomic or transactional migrations are appreciated, but they imply a connection to be held active for the duration of the migration, which we measure by hours or days. Deployment tools, or even scripts, should not be required to hold on to those connections for such long periods. The behavior upon connection loss is normally not what the developer wants. Databases should be able to receive a schema change request and move to run it asynchronously.
Scheduled
Migrations may conflict with each other, either due to running on the same tables or simply because of the excessive resource consumption incurred. Databases should provide a mechanism for scheduling migrations. The database should be able to determine which migrations are safe to run concurrently and which are not.
Interruptible
Even if lightweight, a migration still has an impact and footprint. Disk space and disk I/O operations are most notable. It is sometimes required to stop that impact. It should be possible to interrupt a running migration at no immediate cost. A several hour long rollback or flushing of pages are examples of undesired cost, at a time resources are needed the most.
Trackable
The database should be able to provide an estimate of a long migration’s progress or ETA.
Failure agnostic
A database should be able to resume a migration interrupted due to database failure. As an example, it should be possible for an operator to reboot the database server without compromising a days-long migration. The same is true for unexpected failures. Operations teams should not postpone maintenance work due to developer’s deployments, and developers should not withhold deployments due to planned operations.
If a database offers a multi-node design, i.e., a cluster of nodes, then migrations should be agnostic to cross-node failovers and should not be bound to the specific node where they started.
Revertible
Schema migrations should be treated as first-class deployments. As such, the database system should be able to undeploy a migration, thus restoring the pre-migration schema. Developers should have the confidence that if a schema deployment goes wrong, they can revert it and go back to a known good state.
Redeployable
Much like code deployments, schema deployments should be idempotent. The developer or the deployment system should be able to submit the same migration request twice (or more) in a row, and the database should resolve the excessive requests to ensure the migration runs once, as the developer would expect.
Databases should potentially support declarative schema deployments, where a developer submits a desired state rather than an imperative command. Declarative schema deployments are idempotent by nature.
The resulting flow
With these principles in place, developers have the confidence that their schema migrations will not put substantial load on production servers. That their deployment tools will not have to block for hours while running the change. That the database will gracefully schedule their migration while other deployments are in place. They can track the progress of the migration at any time and interrupt it if the need arises, at no additional cost.
Developers are free from operational considerations. They do not need to be concerned about planned maintenance or unplanned failovers.
They can feel confident in their deployments knowing they can redeploy their change, again and again, or revert it altogether and go back to the last known state in case of trouble.
These all suggest a relaxed development flow that gives developers ownership of their schema changes and the confidence to deploy with velocity.]]></content>
        <summary><![CDATA[An exploration of the current landscape of schema change methodology and what the future should look like.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 5 - Handling races</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-5" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-5</id>
        <published>2022-04-28T15:49:00.000Z</published>
        <updated>2022-04-28T15:49:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
In this post, we will look at how to make revocation and establishment of leadership work if there are race conditions. We will also cover forward progress requirements.
Recap of parts 1-4
Durability is the main reason we want to use a consensus system.
Since durability is use-case dependent, we made it an abstract requirement that the consensus algorithms assume nothing about the durability requirements.
We started off with the original properties of a consensus system as defined by Paxos and modified it to make it usable in practical scenarios. Instead of converging on a value, we changed the system to accept a series of requests.
We narrowed our scope down to single leader systems.
We came up with a new set of rules that are agnostic of durability. The essential claim is that a system that follows these rules will be able to satisfy the requirements of a consensus system. Specifically, we excluded some requirements like majority quorum that have previously been used as core building blocks in consensus algorithms.
We looked at a number of practical scenarios where it is difficult to make a simplistic majority quorum approach work well. A flexible consensus system would accommodate those use cases more comfortably.
We conceptualized the leadership change process into two distinct concerns: Revoke and Establish. This opens up more implementation options that previous traditional algorithms did not accommodate.
Two approaches to resolving race conditions
If we had only one agent that performed leadership changes, there would be no reason to worry about races. But this is not practical because a network partition could isolate that agent and prevent it from performing the necessary actions.
Introducing more than one agent to perform leadership changes requires us to handle race conditions. There are two approaches to resolving races: either the first agent wins, or the last one wins. The determination of who is first or last can vary depending on the approach used. We will drill down on this as we analyze each option.
An approach that makes the first agent win essentially prevents later agents from succeeding. This is equivalent to obtaining a lock. We will therefore call this approach lock-based. The other one will be called the lock-free approach.
We will analyze these approaches separately. As we will see below, this difference is quite fundamental, and it is surprising that it has not been called out explicitly for consensus algorithms.
The elector
Let us make a quick detour to introduce a new term.
At YouTube, each shard had fifteen eligible primaries. Having all of them scan for failures and perform active failovers was not practical. Instead, we had one agent in each region that scanned for failures and also performed leadership elections.
In other words, it is not necessary for a candidate to elect itself as leader. A separate agent can perform all the necessary steps to promote a candidate as leader. We will call this the elector. In many situations, like in the case of YouTube, this separation may be necessary. Unsurprisingly, Vitess has also inherited this separation: VTorc is a Vitess component that acts as the elector.
It is beneficial to conceptualize the role of an elector as being distinct from that of a candidate. This does not preclude a candidate from taking on such a role. This terminology will be useful for the sections below.
Lock-based approach
Obtaining a lock simplifies the problem of changing leadership. It guarantees that the elector that succeeds at getting the lock is the only one capable of making changes to the system.
However, the lock-based approach introduces a problem with forward progress. For example, an elector may successfully obtain a lock and then crash or become partitioned out of the rest of the system. This will prevent all other electors from ever being able to repair this situation. To resolve this, lock-based systems must introduce a time component: any elector that obtains a lock must complete its task within a certain period of time, after which the lock is automatically released.
A lock-based approach generally converges faster than approaches that are lock-free. This is because the first node that attempts a leadership change is likely to have made the most progress towards completing the task. Under most circumstances, giving the first elector the chance to succeed will complete the leadership change with the least disruption.
The act of obtaining a lock requires the participation of multiple nodes. This ensures that forward progress is possible if some nodes fail or if there is a network partition. Coincidentally, this problem shares some properties of distributed consensus. For this reason, the existing nodes that are participating in the quorum could be reused to obtain the lock. In fact, this is exactly what Raft does, but it is implicit.
Locking as a separate concern
One may argue that a system like Raft does not obtain a lock even though it makes the first elector win. This is because the act of obtaining a lock is shadowed by other actions it takes. If you subtract out the other actions (revoke, establish, and propagate) in the code that performs an election, it will be evident that what is left is the act of obtaining a distributed lock.
An algorithm has the option of implementing the locking function as an explicit and separate step. Choosing to do this gives us more flexibility because we can fine-tune this step without conflicting with the concerns of revocation or establishment.
How to obtain a lock
The primary requirement of using the existing nodes to obtain a lock is that the set of nodes each elector reaches has to intersect with those of the others. A majority-based system automatically ensures this intersection. This allows for algorithms like Raft to piggy-back the locking as a side-effect of performing revocation and establishment.
For systems that do not use the majority approach, the fact that revocation has to complement establishment ensures that the electors will have to reach an intersecting set of nodes. So, this property also allows us to perform locking as a side-effect of revocation and establishment.
But this is not our only option. Any mechanism that ensures that only one elector can act will work equally well. Here are some alternatives:
Use a simple majority for the purpose of obtaining a lock, while using a more sophisticated approach for revocation and establishment.
In Vitess, we use an external system like etcd to obtain such a lock. The decision to rely on another consensus system to implement our own may seem odd. But the difference is that Vitess itself can complete a massive number of requests per second. Its usage of etcd is only for changing leadership, which is in the order of once per day or week.
Humans could decide to manually authorize an elector to perform a leadership change, essentially giving it a “lock”.
Proposal numbers
If you are using a lock-based approach, there is no need to use proposal numbers or leadership terms for the purpose of electing a leader. But we may still need to assign such a number to facilitate the propagation of requests. We will discuss this in a subsequent blog.
Graceful leadership changes
Once a lock is obtained, the elector is guaranteed that no one else will be changing the system while they hold the lock. A big advantage of this situation is that the current leader can be discovered, which allows us to use the graceful method of changing leadership described in the previous blog.
The clock
In lock-based systems, we have to rely on accurate clocks. The system also has to make sure that sufficient tolerances are built into timeouts to account for normal clock skews, which are typically in the milliseconds. In general, it is advisable to use “many seconds” of granularity to sequence events.
Consistent reads
Relying on locks lets us exploit the time component to give leaders a lease, thereby guaranteeing no leader change during the lease period. This lets the users perform efficient, consistent reads by accessing the leader without the need for a quorum read. The existing leader could continue to renew the lease as it completes more requests, which leads to prolonged stability.
Lock-free approach
In the case of a lock-free approach, the newest elector must win over an older one. This requires the algorithm to assign a time-based order to the electors that are racing with each other. In Paxos, these are referred to as proposal numbers. In Raft, these are known as term numbers. To facilitate reasoning, we can view these numbers as timestamps.
How does this work?
The core of the lock-free approach is that the followers that accept an establishment or revocation request must remember the timestamp of the elector that issued the request and should reject requests with older timestamps.
There are two possible ways a lock-free approach would converge:
The elector with the older timestamp completes its election before the one with the newer timestamp. Following this, the one with the newer timestamp will end up revoking that leadership and establishing its own.
The elector with the newer timestamps completes first. Then the one with the older timestamp will fail at its attempt, and the leadership established by the newer timestamp will prevail.
To cover the above scenarios, every elector must assume that there may be another elector with an older timestamp attempting a leadership change. It must therefore attempt to revoke leadership from all potential candidates, not just the current known leader. The completion of this process ensures that all possible leaderships (present and future) with an older timestamp are invalidated. This addresses the case where an old elector is slow at performing its actions. This also adds safety against clock skews: a new leader with an incorrect older timestamp will just fail at completing a leadership change, as if it was an older leader.
Pros and Cons
The main advantage of a lock-free approach is that it naturally supports forward progress. If an existing elector fails, a different elector can initiate a new round without knowledge of the state or age of an older elector. For this reason, there is no need to depend on timeouts.
The disadvantage of a lock-free approach is that there is no certainty of a stable leader. This is because it is possible for a leadership to end between the time you discover it and give it a request.
Consistent Reads
The absence of a stable leader makes consistent reads complicated. We essentially have to resort to quorum reads.
Recommendations
The elegance of a lock-free approach may seem tempting, but the lack of a stable leader complicates everything else. Having to reach a quorum for consistent reads is a major drawback for scaling systems.
Weighing these options, a lock-based system should be preferred for large scale consensus systems. Having a stable leader simplifies many other operational parts of the system.
In Vitess, the current leader for each cluster is published through its topology, and a large number of workflows rely on this information to perform various tasks. Any operation that does not want the leader to change just has to obtain a lock before doing its work.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 — Use cases
Consensus Algorithms at Scale: Part 4 — Establishment and revocation
You just read: Consensus Algorithms at Scale: Part 5 — Handling races
Next up: Consensus Algorithms at Scale: Part 6 — Completing requests
Consensus Algorithms at Scale: Part 7 — Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[In part 5 of our consensus algorithms series we discuss how we handle race conditions and forward progress requirements.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 4 - Establishment and revocation</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-4" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-4</id>
        <published>2022-04-06T15:19:00.000Z</published>
        <updated>2022-04-06T15:19:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
The Leader election process is the less frequently used part of the consensus process. However, it is the more complex one. We will therefore drill into this part first.
Recap of parts 1-3
Durability is the main reason we want to use a consensus system.
Since Durability is use-case dependent, we made it an abstract requirement that the consensus algorithms assume nothing about the durability requirements.
We started off with the original properties of a consensus system as defined by Paxos and modified it to make it usable in practical scenarios. Instead of converging on a value, we changed the system to accept a series of requests.
We narrowed our scope down to single leader systems.
We came up with a new set of rules that are agnostic of durability. The essential claim is that a system that follows these rules will be able to satisfy the requirements of a consensus system. Specifically, we excluded some requirements like majority quorum that have previously been used as core building blocks in consensus algorithms.
We looked at a number of practical scenarios where it is difficult to make a simplistic majority quorum approach work well. A flexible consensus system would accommodate those use cases more comfortably.
Consensus Algorithms at Scale - Part 1
Consensus Algorithms at Scale - Part 2
Consensus Algorithms at Scale - Part 3
Conflating too many concerns
Traditional algorithms like Paxos and Raft try to do too many things at once. The cleverness of those approaches is commendable. However, such implementations are too rigid, and you cannot make modifications to specific parts of the algorithm without breaking something else.
What we are going to do now is separate those concerns, and talk about how to address them individually. We can still choose to conflate them, but it should be a conscious decision.
An important revelation is that all leader-based consensus algorithms perform the following actions when electing a new leader:
Revoke a previously existing leadership
Establish a new leader
An additional constraint is that a revoke must precede the establishment step. Otherwise, we will end up with more than one leader.
Majority-based consensus algorithms satisfy this constraint atomically: When a leader successfully recruits all the necessary followers, it automatically achieves the goal of revoking the previous leadership.
Because the revoke was implicitly achieved, it was never called out as a separate concern. More importantly, it was never called out as a concern that could be separated.
In other words, it is not necessary to perform the two operations as part of the same action. This separation becomes more important for consensus systems that are not majority-based.
To limit complexity, this section will start by focusing on establishment and revocation of leadership. Once we have analyzed these two actions, we will layer in the rest of the concerns, which are forward progress, race handling, and propagation of requests.
Even though the two actions can be performed separately, there exists a strong relationship between them: Leadership is established when all the parameters are in place for a leader to successfully complete requests. Any change that invalidates this condition is a revocation.
Proposal Numbers
In traditional consensus algorithms, the establishment of leadership is achieved by requesting followers to accept a specific proposal number. If a candidate manages to perform this action on the majority of the nodes, then the leadership is deemed as established.
To revoke such a leadership, the new candidate pushes a different proposal number to those followers, which implicitly revokes the previous leader’s ability to propagate requests to those nodes. When the majority of the followers are reached, the revocation of the previous leadership and the establishment of the new one are simultaneously achieved.
Without Proposal Numbers
The usage of proposal numbers is only one of many methods of establishing and revoking leadership. For example, in MySQL, the replication mechanism could also be used to achieve the same objectives: Pointing a semi-sync replica at a primary is an act of leadership establishment. Requesting such a replica to stop replicating or to replicate from a different source would achieve the objective of revocation.
Knowing the current leader
Depending on how we handle races, the current leader may not be known. If so, the revocation must be performed against all potential leaders. In other words, the election process must reach enough nodes to be sure that no existing leader can complete their requests. This will become more clear in the next blog where we will cover race conditions
Direct Leader Demotion
Now that we have identified revocation as a possible separate action, we can look at more than one way to revoke an existing leadership.
If the current leader is known, requesting that leader to step down also results in a valid revocation. This method is generally more graceful because the leader has the opportunity to complete in-flight requests and also inform clients of an imminent change in leadership.
Demoting the existing leader is meaningful only for planned changes, like a software rollout. If a leader becomes unreachable due to a crash or a network partition, we have to fall back to requesting the followers to stop accepting requests from the current leader to achieve revocation.
In Vitess, we have two operations that can perform a leadership change: PlannedReparentShard (PRS) and EmergencyReparentShard (ERS). For software rollouts, we use PRS to demote the current primary to a replica before performing the update. But we use ERS if we detect that the primary database is down or not reachable.
If a PRS is issued, the low level vttablet component of vitess goes into a lameduck mode where it allows in-flight transactions to complete, but rejects any new ones. At the same time, the front-end proxies (vtgate) begin to buffer such new transactions. Once PRS completes, all buffered transactions are sent to the new primary, and the system resumes without serving any errors to the application.
Why use two approaches?
A typical cluster could be completing thousands of requests per second. In contrast, a software rollout is likely a daily event. In further contrast, a node failure may happen once a month or even less frequently.
It is important that we optimize for the common case. This means that we want leadership changes to be graceful during software rollout. Ideally, the application should see no errors during this time. The approach of demoting the current leader gives us this opportunity.
Interchangeability
Can we assume that two different algorithms are interchangeable? The answer is yes. Let us assume that a leadership is established by satisfying conditions A and B. One algorithm achieves revocation by making condition A false, and the other by making condition B false. In both cases, it is a successful revocation.
Once revocation is complete, both algorithms have to make conditions A and B true for the new leader, which will allow for subsequent rounds to use any method of revocation.
Other approaches
We can think of innumerable other ways to establish and revoke leadership, and they would all be valid as long as the revocation and establishment conditions are accurately satisfied. As an extreme example, cutting the network cable that connects a leader to its followers is also a valid way to revoke an existing leadership.
I know of one incident at Google where we had to dispatch a human to physically shut down a machine where a leader had gone rogue.
In the next blog post, we will discuss possible options for handling races and ensuring forward progress. At that time, we will re-evaluate these approaches.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 — Use cases
You just read: Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Next up: Consensus Algorithms at Scale: Part 5 — Handling races
Consensus Algorithms at Scale: Part 6 — Completing requests
Consensus Algorithms at Scale: Part 7 — Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[In part 4 of the consensus algorithms series we look at how algorithm leaders are established and revoked.]]></summary>
      </entry>
    
      <entry>
        <title>Generics can make your Go code slower</title>
        <link href="https://planetscale.com/blog/generics-can-make-your-go-code-slower" />
        <id>https://planetscale.com/blog/generics-can-make-your-go-code-slower</id>
        <published>2022-03-30T00:00:00.000Z</published>
        <updated>2022-03-30T00:00:00.000Z</updated>
        
        <author>
          <name>Vicent Marti</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Go 1.18 is here, and with it, the first release of the long-awaited implementation of Generics is finallyready for production usage. Generics are a frequently requested feature that has been highly contentiousthroughout the Go community.]]></content>
        <summary><![CDATA[Go 1.18 is here, and with it, the first release of the long-awaited implementation of Generics is finallyready for production usage. Generics are a frequently requested feature that has been highly contentiousthroughout the Go community.]]></summary>
      </entry>
    
      <entry>
        <title>Why we chose NanoIDs for PlanetScale’s API</title>
        <link href="https://planetscale.com/blog/why-we-chose-nanoids-for-planetscales-api" />
        <id>https://planetscale.com/blog/why-we-chose-nanoids-for-planetscales-api</id>
        <published>2022-03-29T17:30:00.000Z</published>
        <updated>2022-03-29T17:30:00.000Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When we were first building PlanetScale’s API, we needed to figure out what type of identifier we’d be using. We knew that we wanted to avoid using integer IDs so that we wouldn’t reveal the count of records in all our tables.
The common solution to this problem is using a UUID (Universally Unique Identifier) instead. UUIDs are great because it’s nearly impossible to generate a duplicate and they obscure your internal IDs. They have one problem though. They take up a lot of space in a URL: api.planetscale.com/v1/deploy-requests/7cb776c5-8c12-4b1a-84aa-9941b815d873.
Try double clicking on that ID to select and copy it. You can’t. The browser interprets it as 5 different words.
It may seem minor, but to build a product that developers love to use, we need to care about details like these.
Nano ID
We decided that we wanted our IDs to be:
Shorter than a UUID
Easy to select with double clicking
Low chance of collisions
Easy to generate in multiple programming languages (we use Ruby and Go on our backend)
This led us to NanoID, which accomplishes exactly that.
Here are some examples:
izkpm55j334u
z2n60bhrj7e8
qoucu12dag1x
These are much more user-friendly in a URL: api.planetscale.com/v1/deploy-requests/izkpm55j334u
ID length and collisions
An ID collision is when the same ID is generated twice. If this happens seldomly, it’s not a big deal. The application can detect a collision, auto-generate a new ID, and move on. If this is happening often though, it can be a huge problem.
The longer and more complex the ID, the less likely it is to happen. Determining the complexity needed for the ID depends on the application. In our case, we used the NanoID collision tool and decided to use 12 character long IDs with the alphabet of 0123456789abcdefghijklmnopqrstuvwxyz.
This gives us a 1% probability of a collision in the next ~35 years if we are generating 1,000 IDs per hour.
If we ever need to increase this, the change would be as simple as increasing the length in our ID generator and updating our database schema to accept the new size.

PlanetScale makes deploying database schema changes a breeze with branching, deploy requests, and a Git-like development workflow.
Generating NanoIDs in Rails
Our API is a Ruby on Rails application. For all public-facing models, we have added a public_id column to our database. We still use standard auto-incrementing BigInts for our primary key. The public_id is only used as an external identifier.
Example schema
We add the public_id column as well as a unique constraint to protect from duplicates.CREATE TABLE `user` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `public_id` varchar(12) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_public_id` (`public_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Auto generating IDs
We built a concern that could be shared across all our models to autogenerate IDs for us. In Rails, a concern is a shared module that can be shared across models to reduce duplication. Whenever our application creates a new record, this code runs and generates the ID for us.# app/models/user.rb
class User < ApplicationRecord
  # For each model with a public id, we include the generator
  include PublicIdGenerator
end

Here is the generator that creates the ID and handles retries in the small chance of a duplicate.# app/models/concerns/public_id_generator.rb

require "nanoid"

module PublicIdGenerator
  extend ActiveSupport::Concern

  included do
    before_create :set_public_id
  end

  PUBLIC_ID_ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"
  PUBLIC_ID_LENGTH = 12
  MAX_RETRY = 1000

  PUBLIC_ID_REGEX = /[#{PUBLIC_ID_ALPHABET}]{#{PUBLIC_ID_LENGTH}}\z/

  class_methods do
    def generate_nanoid(alphabet: PUBLIC_ID_ALPHABET, size: PUBLIC_ID_LENGTH)
      Nanoid.generate(size: size, alphabet: alphabet)
    end
  end

  # Generates a random string for us as the public ID.
  def set_public_id
    return if public_id.present?
    MAX_RETRY.times do
      self.public_id = generate_public_id
      return unless self.class.where(public_id: public_id).exists?
    end
    raise "Failed to generate a unique public id after #{MAX_RETRY} attempts"
  end

  def generate_public_id
    self.class.generate_nanoid(alphabet: PUBLIC_ID_ALPHABET)
  end
end

Generating NanoIDs in Go
NanoID generators are available in many languages. At PlanetScale, we also have a backend service in Go that needs to generate public IDs as well.
Here is how we do it in Go:// Package publicid provides public ID values in the same format as
// PlanetScale’s Rails application.
package publicid

import (
	"strings"

	nanoid "github.com/matoous/go-nanoid/v2"
	"github.com/pkg/errors"
)

// Fixed nanoid parameters used in the Rails application.
const (
	alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
	length   = 12
)

// New generates a unique public ID.
func New() (string, error) { return nanoid.Generate(alphabet, length) }

// Must is the same as New, but panics on error.
func Must() string { return nanoid.MustGenerate(alphabet, length) }

// Validate checks if a given field name’s public ID value is valid according to
// the constraints defined by package publicid.
func Validate(fieldName, id string) error {
	if id == "" {
		return errors.Errorf("%s cannot be blank", fieldName)
	}

	if len(id) != length {
		return errors.Errorf("%s should be %d characters long", fieldName, length)
	}

	if strings.Trim(id, alphabet) != "" {
		return errors.Errorf("%s has invalid characters", fieldName)
	}

	return nil
}

Wrap up
Creating a great developer experience is one of our big priorities at PlanetScale. These seemingly small details, like being able to quickly copy an ID, all add up. NanoIDs were able to solve our application requirements without degrading developer experience.
Resources
NanoID
NanoID Collision Calculator
Rails Concerns]]></content>
        <summary><![CDATA[Learn why PlanetScale used NanoID to generate obscure and URL friendly identifiers.]]></summary>
      </entry>
    
      <entry>
        <title>Revert a migration without losing data</title>
        <link href="https://planetscale.com/blog/revert-a-migration-without-losing-data" />
        <id>https://planetscale.com/blog/revert-a-migration-without-losing-data</id>
        <published>2022-03-24T12:01:46.798Z</published>
        <updated>2022-03-24T12:01:46.798Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Bad migrations happen every day and data loss can be devastating. PlanetScale’s new schema revert feature gives you the power to revert changes after a schema migration with no downtime and zero data loss.
What is it?
Have you ever made a schema change, like changing a column’s data type or dropping a table, that broke your production application causing unplanned downtime, or even worse, a full-scale outage? And once it was live, you immediately wished you could go back in time, without losing any data that was added during the broken period?
Your wish has come true. PlanetScale’s new schema revert feature allows you to revert a migration with zero data loss.
This feature enables something never possible before: The ability to revert your database’s schema changes in less than a minute with the press of a button with no downtime or data loss. Previously, if you dropped the wrong index, running a new migration might take you a few hours to fix. Even worse, if you dropped the wrong column or table and had to restore from a backup, it would take days or weeks to roll out. And what happens to your data and your application while it is being fixed?
Schema reverts open the door to a new level of velocity and power for your engineering team. Along with existing PlanetScale features such as Database Branching and Deploy Requests, you can make changes (and fix changes) in your database faster and safer than before. It not only saves you time, but it makes schema changes less scary because you can treat your database like you treat your code. Unless you built custom in-house tooling for your database, these workflows have not been possible before.
How does it work?
After you have completed a deploy request, you will be able to revert to a clean state of the database before the schema changes. This includes any data that may have been removed in the deploy request. Any database enrolled in the limited beta will see a “Revert changes” button available on the relevant deploy request page for 30 minutes after the changes have been deployed. Once the “Revert changes” button is selected, it is deployed immediately.
For example, if you remove a column and its associated data in a deploy request, and then revert it, your column and its associated data will appear again in seconds.
This is possible because of VReplication in Vitess, the database clustering and management system that powers PlanetScale databases alongside MySQL. Vitess' VReplication also powers features like Database Imports. VReplication uses a lossless sync in the background between valid states of the database. It copies data from the source to the destination table in a consistent fashion. VReplication's implementation is unique because it allows us to go down to the MySQL transaction level, ensuring no data is lost and that your database schema returns to its previous state before the schema change. All in just seconds.
For a more in-depth look at how this works, take a look at the Behind the scenes: how schema reverts work blog post.
Want to see it in action?
Sign up for the limited beta to test it out yourself! Sign up or log into PlanetScale and opt into the limited beta in your database’s Settings tab. In the Beta features section, select “Enroll this database in the schema revert limited beta.” Once you create a database branch with changes to your schema, deploy the deploy request, and then revert!
For more information, check out our Deploy requests – Revert a schema change documentation.]]></content>
        <summary><![CDATA[Learn how PlanetScale lets you revert changes to your database after a migration with no downtime and zero data loss.]]></summary>
      </entry>
    
      <entry>
        <title>Behind the scenes: How schema reverts work</title>
        <link href="https://planetscale.com/blog/behind-the-scenes-how-schema-reverts-work" />
        <id>https://planetscale.com/blog/behind-the-scenes-how-schema-reverts-work</id>
        <published>2022-03-24T12:00:00.000Z</published>
        <updated>2022-03-24T12:00:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Today, we released a new feature that allows you to instantly revert a recently deployed schema change without losing data that was written to the original schema during the time between deploying and reverting.
This is much more than a simple rollback.
Imagine you just deployed a schema change to your production database that drops a column on a table. You refresh your production application in the browser only to discover, to your horror, that your application is down.
Instead of scrambling to rollback to a previous state before the deployment, possibly losing data in the process, this lets you “undo” that schema change with the click of a button. If any data was written to your database while that bad schema change was live, we automatically retain that data even after you revert.
While this may sound like science fiction, it’s completely real and required a huge amount of underlying work to accomplish. This article will shed some light on what’s happening under the hood and how we built this feature.
The example at the end will walk you through exactly what happens in the time between deploying and reverting.
How do online schema change tools work?
To understand how migration reverts work, we must first look into how online schema change tools work.
All online schema change techniques follow a common high-level pattern:
Instead of altering your production table, they create an empty shadow table with the same schema as the production table but no data.
They implement the schema change on the shadow table. This is a cheap operation, given the table has no data.
They copy data from the original table, track incoming changes, and apply them to the shadow table.
Finally, when the tables are in sync, the tools cut over. They move aside your original table and replace it with the shadow table.
In short, online schema change tools copy the production table without data, apply the schema change to the copy, sync the data, and swap the tables.
At PlanetScale, we leverage the power of Vitess' VReplication internals to run online schema changes. VReplication is a core component of Vitess, our underlying database system, that works behind the scenes to accomplish online schema changes, migration reverts, data imports with no downtime, and more.
Now that you know how online schema changes work in general, let’s look at how VReplication uniquely implements this, and in doing so, paves the way for migration reverts.
VReplication schema changes
Some online schema change tool techniques differ in implementation. VReplication, in particular, has some key design differences:
It tracks the progress of both backfill (existing data) and ongoing changes (new incoming data) rather than just backfill.
It uses precision logic to map every single transaction (a single, complete operation on the database) with the database position during that change using MySQL GTID (Global Transaction Identifier). This allows us to track these existing and incoming changes between the two tables with exact precision based on the time we started the new transaction.
It switches back and forth between copy state and ongoing change tracking based on these position markers.
It couples copy state and its progress transactionally. Likewise, it couples changelog events and their progress transactionally.
Unlike any other schema change solution, Vitess does not terminate upon migration completion. This point is important when it comes to reverting schema changes.
In summary, VReplication has a transactionally accurate journal of the state of migration. At any time, it knows exactly which rows have been copied and which changelog events have been processed.
This is unique in the world of online schema change tools, and this unique feature is what allows us to instantly revert schema changes without losing data that was written to the original schema.
How VReplication allows us to revert schema changes
Let’s revisit the online schema change flow in the context of VReplication and drive this process home with an example.
Suppose in your deploy request you issue the following statement:ALTER TABLE users DROP COLUMN title;

Here’s what happens behind the scenes right after you deploy:
We first make a copy of the users table that’s in production without any of the data. We essentially only copy the schema, which still includes the title column. This is called a shadow table.

We apply the ALTER TABLE users DROP COLUMN title statement to the shadow table, dropping the title column.

We begin copying the data from the production table to the shadow table.

Data is continuously copied, including new data, with the goal to get these two tables in sync.
As you can imagine, syncing data between these two tables is a huge task, especially because data is still being added to the production table during this process. So, if one row is copied over to the shadow table, you may think the work is done. But what if a new write comes in, changing that user’s name a few seconds later? That update goes to the current production table, so the production table and shadow table are again out of sync.
This is where VReplication shines. VReplication solves this problem by copying existing and incoming data in batches interchangeably. As mentioned earlier, VReplication tracks these existing and incoming changes between the two tables with exact precision based on the time we started the new transaction.
When we begin copying a set of rows, we run START TRANSACTION WITH CONSISTENT SNAPSHOT, which takes that snapshot and essentially freezes time while we copy the rows over. This is done using the GTID, Global Transaction ID, which captures the existing state down to the transaction level.
Once we’re done copying the first set of existing rows, we switch to copying incoming data. We only care about data that satisfies two conditions: it came in after the GTID point that we just used and it contains changes to the data we’ve already written to the shadow table.


There are most likely new INSERTs in the binlog, but we don’t need to copy those over right now because we’ll encounter them eventually when we’re back in copy mode.
Once we’re finished with the set of incoming changes from the binlog, we capture a new GTID and switch back to copying from the production database.

In the seconds it took to do that, more incoming traffic has arrived. We apply any changes to the shadow table and then continue copying data. This way, we know that we’ve consumed all events up until that GTID capture.

Once the data is copied over, we issue a hard stop on the process, known as the cut-over period.

Cut-over period
The final and most critical step in the migration is the cut-over, where the original table is swapped away and replaced by the shadow table.
The cut-over is the single step where a write lock is explicitly imposed on the table. Until the swap is complete, no writes can take place. It’s the “freeze point”, where both tables are in perfect sync. However, since downtime is unacceptable during this process, writes will still be allowed from the application’s perspective. We’ll hold all new writes that come in and apply them once the swap has taken place.
The Vitess migration flow marks the database position at that freeze point. It then swaps the two tables: the shadow table replaces the original table, and the original table replaces the shadow table. Again, at this point, the tables are in sync.
The migration process completes and lets traffic access the new table. However, the story does not end here. We keep both the old table as well as the VReplication state. In fact, we use them right away.
Preparing for revert
Shortly after migration completion, PlanetScale prepares an open-ended revert. The revert process tracks ongoing changes to the table and applies them to a shadow table. That should sound familiar. Indeed, we already have a shadow table in place. It is already populated with data, and we know that it was in full sync with what we now call the new table at cut-over time.
So in the previous example, once the deployment is complete, you decide you need to revert it. Here’s what happens next:
You click “Revert changes”.
In the time between the tables being swapped and you clicking “Revert”, we already prepared for the revert process in the background. Remember how we swapped the production table with the shadow table? That old production table is now your shadow table.
The important part to recognize is that this shadow table is already complete with data and the previous schema that you want to revert to. So we don’t need to go through that same lengthy data copy process again to swap these tables! We only need to track new changes, which this shadow table has been doing in the background, regardless of if you eventually click revert or not.
Since the swap, we’ve continued syncing the shadow table with the production table.

So once you click revert, all we need to do is swap them again! It goes through the exact same cut-over process, and the shadow table becomes the production table again and vice versa.

You now have your original schema, your users table with the title column, and your application should work again. With this process, you retained any new data that was added during that period, which would have been a huge hassle in traditional rollback and restore methods.
One more thing to note is that in step 10 of the diagram, Savannah doesn’t have a title. This is because that entry was added after the tables were swapped, so the title column didn’t exist in production. This is expected and something you can clean up after the revert, if necessary.
We explained this process in terms of issuing an ALTER statement, but there are even more nuances we had to consider to revert CREATEs and DROPs. Stay tuned for a future blog post on those topics, and a sneak peek at another feature that this underlying process brings to the table.
Wrap up
Hopefully, this article has shed some light on how our schema revert feature works.
Our goal at PlanetScale is to continuously improve the developer experience by providing a scalable and easy-to-use database solution. Keep in mind that all of this is done in seconds, behind the scenes, with just a click of a button.
If you’d like to see this in action, you can enroll your database in the limited beta today from your database Settings page. For more information, see the “Revert a schema change” section of our Deploy requests documentation.]]></content>
        <summary><![CDATA[Learn how we used VReplication to allow for migration reverts with data retention.]]></summary>
      </entry>
    
      <entry>
        <title>How to Prevent SQL Injection Attacks in Node.js</title>
        <link href="https://planetscale.com/blog/how-to-prevent-sql-injection-attacks-in-node-js" />
        <id>https://planetscale.com/blog/how-to-prevent-sql-injection-attacks-in-node-js</id>
        <published>2022-03-03T17:27:00.000Z</published>
        <updated>2022-03-03T17:27:00.000Z</updated>
        
        <author>
          <name>James Q Quick</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Although the tooling around databases has come a long way, it is still your responsibility to protect them against attacks. In this article, you’ll learn to prevent SQL injection attacks in Node.js using the mysql2 npm package.
What is a SQL injection attack
A SQL injection attack happens when a user injects malicious bits of SQL into your database queries. Most commonly, this happens when allowing a user to pass input to a database query without validation which can alter the original intended query. By injecting their own SQL, the user can cause harm by:
reading sensitive data
modifying sensitive data
deleting sensitive data
As you can probably imagine, these types of attacks can have negative impacts on your applications and your business. In fact, you’ve probably heard of some major companies being involved in data breaches in the past couple of years. This can lead to loss of customers, revenue, application uptime, and more.
Examples of SQL injection attacks
You now have a general understanding of what SQL injection attacks are, but I think it would be good to see a few specific examples.
Let’s explore a developer related-scenario where, hypothetically, you build an application that stores code repositories. Just like GitHub, these user-created repositories can be either public or private. Furthermore, a user has the ability to search public repositories by tag. For simplicity, let’s assume that each repository only has one tag.
In your application logic, you would use the user’s search term to generate the SQL query. In your Node.js code, you might be tempted to use ES6 template literal strings to interpolate that value directly to your query string like so:const query = `SELECT * FROM Repository WHERE TAG = '${userQuery}' AND public = 1`

For a search of "javascript", your SQL query string might look like this:SELECT * FROM Repository WHERE TAG = 'javascript' AND public = 1;

In this case, you are attempting to select all public repositories that have “javascript” as their tag. Assuming reasonable user input, this works fine, but what if the user were to search for something like this javascript';--. Now, things start to become dangerous.
The -- is the SQL code for a comment. This means that it would then shortcut the rest of the query. So, the unvalidated query would look like this.SELECT * FROM Repository WHERE TAG = 'javascript';--' AND public = 1;

Since the part after the "--" would be ignored, the query that gets executed looks more like this.SELECT * FROM Repository WHERE TAG = 'javascript';

As you can see, this removed the additional clause in the query which previously prevented private repositories from being included. You can imagine this being a significant problem for intellectual property.
One other type of SQL injection attack to be aware of is one that can add a secondary statement to the query. Let’s stay with the same example, but say the user searches for javascript'; DROP TABLE Repository;--. Then, the query would become:SELECT * FROM Repository WHERE TAG = 'javascript'; DROP TABLE Repository;--' AND public = 1;

In this example, the original query is terminated with the ; but then followed by a second query that would drop the entire Repository table. NO GOOD!
For notes on a few other examples of SQL injection attacks, check the W3Schools SQL Injection page.
Configuring the mysql2 client in Node.js
For a quick reference, let’s take a look at how to set up the mysql2 client in Node.js. You’ll first want to install the package:npm install mysql2

For an in-depth tutorial on creating an API with Node.js, mysql2, and PlanetScale, check out Create a Harry Potter API with Node.js, Express, and MySQL
Once you have this package installed, you can initialize the client.import mysql from 'mysql2/promise'
const connection = await mysql.createConnection(process.env.DATABASE_URL)

This sample code uses environment variables for the database connection string. You’ll need to also install the dotenv package for testing this locally. It also uses the promises-based version of the library so that you can use modern async/await syntax.
From there, you can make queries like so:    const query = 'SELECT * FROM Repository WHERE TAG = 'javascript' AND public = 1';
    const [rows] = await connection.query(query);

If you’re working with Express.js, you could then define an endpoint that accepts user input as userQuery, queries the database, and returns the repositories in JSON format.import express from 'express';
import mysql from 'mysql2/promise';

const connection = await mysql.createConnection(process.env.DATABASE_URL);
 const app = express();

app.get('/repositories/:userQuery', async (req, res) => {
    const {userQuery} = req.params;
    const query = 'SELECT * FROM Repository WHERE TAG = '${userQuery}' AND public = 1';
    const [rows] = await connection.query(query);
   res.json(rows);
});

app.listen(3001, () =>{
  console.log('App is running');
});

Preventing SQL injection attacks
There are a few common ways to prevent SQL injection attacks:
Don’t allow multiple statements
Use placeholders instead of variable interpolation
Validate user input
Allowlist user input
Don’t allow multiple statements if you can avoid it
Conveniently, number 1 is handled by the mysql2 client (and many other database clients). It prevents multiple statements from being executed by default. So, even if the user submits an input that attempts to terminate a query and run a second one, the second one won’t run. This is the default configuration, but you can override that if you choose.
Although this configuration property is available, it is typically not recommended to allow multiple statements unless absolutely necessary.const connection = await mysql.createConnection({
  uri: process.env.DATABASE_URL,
  multipleStatements: true
})

To emphasize the need for more levels of protection, refer to the example above where injecting a comment (ex. javascript';--) into the SQL allowed the user to read from private repositories. Since that was done using only one statement, setting multipleStatements: false still wouldn’t be enough.
Use placeholders
Therefore, you should never accept raw input from a user and input it directly into your query string. Instead, you should use placeholders (?) (or parametrized queries) which would look like this (notice the ? as the placeholder):const query = 'SELECT * FROM Repository WHERE TAG = ? AND public = 1'
const [rows] = await connection.query(query, [userQuery])

By using placeholders, the malicious SQL will be escaped and treated as a raw string, not as actual SQL code. The end result query would look like this:  SELECT * FROM Repository WHERE TAG = `javascript';--` AND public = 1;

Thanks to using placeholders, the malicious SQL is not run and instead, is treated as a search query as intended.
Input validation
In addition to using placeholders, you can add logic in your applications to prevent invalid user input. Let’s stick with the example of querying public repositories by tag. For demo purposes, you can assume that you should not have a tag that includes special characters or numbers. In other words, tags should only use capital and lowercase letters (A-Z, a-z).
This means you can add logic to your application to validate that user input matches the correct formatting (no numbers and no special characters). To do this, you can create a regex pattern to match the user input. If it doesn’t match, return an error. app.get('/repositories/:userQuery', async (req, res) => {

    const {userQuery} = req.params;
    const onlyLettersPattern = /^[A-Za-z]+$/;

    if(!userQuery.match(onlyLettersPattern)){
      return res.status(400).json({ err: "No special characters and no numbers, please!"})
    }

    ...
  });

Now the code doesn’t even get to the SQL part unless a valid input is passed. You can apply this method with any sort of validation that is relevant to your data. For example, if you allow the user to query by an id property which should be a number, you can throw an error if the input isn’t a valid number.  app.get('/repositories/:id', async (req, res) => {
    const {id} = req.params;

    if(isNaN(Number(id))) {
      return res.status(400).json({ err: "Numbers only, please!"})
    }
...

Allowlisting
One last option you have is to use allowlisting, a specific type of input validation. Allowlisting is useful if you know every possible valid user input. From there, you can easily reject anything else.
For example, let’s say for your repository tags, there are only three valid tags: “javascript”, “html”, and “css”. If that’s the case, then you can check whether or not the user input is "allowlisted" by comparing it against known valid inputs. app.get('/repositories/:userQuery', async (req, res) => {

    const {userQuery} = req.params;
    const validTags = ["javascript", "html", "css"];

    if(!validTags.includes(userQuery)){
      return res.status(400).json({err: "Valid tags only, please!"});
    }

    ...
  });

Yes, this example is a bit simplified with just three valid tags, but this works at scale as well. A more realistic scenario might be that you store all known tags in their own table in your database. Then, to validate the user input, you can check against all the tag records in your database.
Wrap up
Hopefully, this helped give you a good overview of what SQL injection attacks are and how to prevent them. They can be detrimental to your application and business, so it’s important to plan ahead when accepting user input for your database queries to prevent any negative side effects.]]></content>
        <summary><![CDATA[Don’t let SQL injection attacks hurt your business.]]></summary>
      </entry>
    
      <entry>
        <title>Database schema design 101 for relational databases</title>
        <link href="https://planetscale.com/blog/schema-design-101-relational-databases" />
        <id>https://planetscale.com/blog/schema-design-101-relational-databases</id>
        <published>2022-03-02T02:14:00.000Z</published>
        <updated>2022-03-02T02:14:00.000Z</updated>
        
        <author>
          <name>Camila Ramos</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Getting started with a relational database can seem like a daunting task. Whether you’re coming from a NoSQL database or you’ve never used a database before, I’m going to talk you through designing a relational database and am hoping to answer the following questions:
What is a relational database?
How are relationships made in the database?
What are the steps to take to ensure an efficient database?
What is a relational database?
A relational database is one way to store data related to each other in a pre-defined way. By pre-defined, we mean that at the time of the creation of the database, you can identify the relationships that exist between different entities or groups of data. Relational databases are great for storing structured data that should model the relationship between real-life entities.
The anatomy of a relational database:
Tables: Data representing an entity organized into columns in rows.
Properties: Attributes that you want to store about an entity.
Relationships: The relationships between tables.
Indexes: Useful for connecting tables and making quick look-ups.
A relational database is made up of two or more tables with a variable number of rows and columns. Tables are unique to the entity they represent. Each column represents one specific property associated with each row in the table, and the rows are the actual records stored in that table. To illustrate the magic of a relational database, we’ll be designing a database for a retailer that wants to manage their products, customers, orders, and employees.
Design a database for a new retailer in town. This retailer really cares about customer relationships and wants to reward customers who meet a spending goal and gift these top customers on the 1 year anniversary of their first purchase. This retailer needs a way to organize products by price and category to make smart recommendations to their customers based on their age. This retailer also wants to track the best-performing employees to reward those with the highest sales with a raise at the end of the year.
Designing the database schema
The schema is the structure that we define for our data. The schema defines the tables, relationships between tables, fields, and indexes.
The schema will also have a significant impact on the performance of our database. By dedicating time to the schema design, we will save ourselves a headache in the future. One tool that will help us design our schema is an ERD, entity-relationship diagram. We’ll use Lucidchart to build out our ERD, and you can sign up for free. This diagram will allow us to visualize our entities and their relationships.
Here are the major to-dos when designing our schema we will cover in this post:
Understand business needs
Identify entities
Identify properties/fields on those entities
Define relationships between tables
Step 1: Understand business needs
The first step in designing a relational database schema is to understand the needs of the business. This will help us determine what type of information we should be storing. For example, if we are working with a retailer that wants to offer an anniversary gift for clients on their first anniversary, we would have to store the date a customer joins.
A recap of the requirements for our customers:
Store customer spending to-date
Store customer anniversary date of first purchase
Store customer’s age
Store employee sales total in dollar amount
Store products and include a category and price property
Step 2: Define entities, aka tables

Once that is clear, the next step is defining the entities we want to store data about. These entities will also be our tables. Following the retailer example, our entities should be:
customers
products
orders
employees
This could extend to add more entities like stores if there are multiple storefront locations, manufacturers, etc., depending on the needs of the business. For this blog post, we’ll just be working with the four entities we defined above to meet the needs of our fictitious client. We can represent an entity in our ERD with a rectangle and the table/entity name at the top.
Step 3: Define properties, aka fields

Once we’ve identified our entities, we should define what fields we want to store about these entities. One important thing to keep in mind is that each table, or entity, should have one unique, identifying property. This unique value is known as the primary key, and this helps us differentiate records from each other. For example, if we have two customers with the same name or same birthdate, we would have to spend some time figuring out which customer is the one we intend to work with.
Two common ways to come up with a primary key:
Programmatically generate a unique value
Assign an integer that automatically increases with each new entry
All of these are straightforward and were taken directly from the specs that the business gave us. For example, the business wants to know which customer made the purchase, which employee made the sale, and which products were in the order. In the Orders table, you will noticed that we reference a customerID, employeeID, and productID to meet those needs.
Step 4: Define relationships

Once we’ve defined our entities and their properties, we can think about how these tables relate to each other. The cornerstone of relational databases is that tables are often related. A parent table will have a unique primary key column, and a child table will have its own primary key and then a parent_id column that references the parent table. We have already inadvertently done this when we defined the properties in the preceding step. For example, the customers table has a customerID, which is the primary key. In the Orders table, we set an orderID as the primary key and reference the customerID to denote which customer made the order. Similarly, we also have a column referencing the Employees table, employeeID, to denote which employee made the sale.
When a primary key appears in another table, that field is called a foreign key in that table. The relationship between primary keys and foreign keys creates the relationship between tables.
You’ve done it
We’ve covered the main steps to take when designing your database schema: understand the business needs, define entities, define properties, and define relationships. Designing your database schema can be scary because with traditional relational databases, some schema changes can bring your whole application down and cause you to lose data. With PlanetScale’s branching feature, you can branch your schema like your code. Test your schema changes in an isolated environment, and once you are happy with your new schema, you can merge your changed branch into your main production branch without experiencing any downtime or data loss. Sign up for a PlanetScale account to get started.]]></content>
        <summary><![CDATA[This database schema design guide walks you through walk through the basics of creating and designing schemas for relational databases.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 13</title>
        <link href="https://planetscale.com/blog/announcing-vitess-13" />
        <id>https://planetscale.com/blog/announcing-vitess-13</id>
        <published>2022-02-22T19:10:00.000Z</published>
        <updated>2022-02-22T19:10:00.000Z</updated>
        
        <author>
          <name>Florent Poinsard</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[The Vitess maintainers are pleased to announce the general availability of Vitess 13.
This article originally appeared on the Vitess blog.
Major themes
In this release, the Vitess maintainers have made significant progress in several areas, including query serving and cluster management.
Compatibility
This release comes with major compatibility improvements. We added support for a large number of character sets and improved our evaluation engine to perform more evaluations at the VTGate level. Gen4 planner is no longer experimental and we have used it to add support for a number of previously unsupported complex queries.
Cluster management
VTOrc is now more tightly integrated with other components in Vitess. It has been enhanced with errant GTID detection during emergency failovers. User-initiated emergency failovers are now more robust and should almost always succeed.
Multi-column Vindexes
While we have had some support for multi-column vindexes since Vitess 5.0, this release brings better support and performance improvements. We can now route to a subset of shards when a partial column list is provided in a WHERE clause instead of scattering to all shards.
Website docs
Previously, we had only one version of the documentation on the website. Docs for older releases were archived as PDFs, which made it difficult for users to find information relevant to the specific release they might be running. We now have versioned docs on the website for this release and the past two releases. Eventually, we will have documentation for all supported releases.
Please download Vitess 13 and try it out! Issues can be reported via GitHub.]]></content>
        <summary><![CDATA[Learn about the Vitess 13 release.]]></summary>
      </entry>
    
      <entry>
        <title>How we made PlanetScale’s background jobs self-healing</title>
        <link href="https://planetscale.com/blog/how-we-made-planetscale-background-jobs-self-healing-with-sidekiq" />
        <id>https://planetscale.com/blog/how-we-made-planetscale-background-jobs-self-healing-with-sidekiq</id>
        <published>2022-02-17T15:05:34.027Z</published>
        <updated>2022-02-17T15:05:34.027Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[When building PlanetScale, we knew that we would be using background jobs extensively for tasks like creating databases, branching, and deploying schema changes.
Early on, we decided we had two hard requirements for this system.
If we lose all data in the queues at any time, we can recover without any loss in functionality.
If a single job fails, it will be automatically re-run.
These requirements came about from our past experience with background job systems. We knew that it was "not if, but when" that we’d hit failure scenarios and need to recover from them.
Our background jobs perform critical actions for our users, such as rolling out schema changes to their production databases. This is an action that cannot fail, and if it does, we need to recover quickly.
Our stack
The PlanetScale UI (Next.js app) is backed by a Ruby on Rails API. All of our Rails background jobs are run on Sidekiq.
Much of what you’ll read here is Sidekiq specific (with code samples) but can be applied to most any background queueing system.
Scheduler jobs running on a cron
For each job we have, we set up another job whose responsibility is to schedule it to run.
This is the core design decision that allows us to be self-healing. If for any reason we lost a job, or even the entire queue, the scheduling jobs will re-queue anything that was missed.

We’ve put this decision to the test a couple of times already this year. We were able to dump our queues entirely without impacting our user experience.
Storing state in the database
When using a background job system, a common pattern is to let jobs only be queued by a user action.
For example: In our application, a user can create a new database. This action enqueues a background job to do all the setup.
What if something goes wrong with that job? What if Redis failed and we lost everything in our queues? These were scenarios that worried us.
The solution is storing the state in our PlanetScale database. When creating a database for a user, we also create a record in our databases table immediately. This record starts with a state set to pending.
This allows us to have a scheduled job that runs once a minute and checks if any databases are in a pending state. If they are, that triggers the creation job to get enqueued again:class ScheduleDatabaseJobs < BaseJob
  sidekiq_options queue: "background"

  def perform
    Database.pending.find_each do |database|
      DatabaseCreationJob.perform_async(database.id)
    end
  end
end

As long as the scheduler job is running, we could dump our entire queue and still recover.
Disabling schedules with feature flags
We added the ability to stop our scheduled jobs from running at any time. This has come in useful during an incident where we’ve wanted control over a specific job type.
To do this, we added middleware that checks a feature flag for each job type. If the flag is enabled, we skip running the job:# lib/sidekiq_middleware/scheduled_jobs_flipper.rb
module SidekiqMiddleware
  class ScheduledJobsFlipper
    def call(worker_class, job, queue, redis_pool)
      # return false/nil to stop the job from going to redis
      klass = worker_class.to_s
      if BaseJob::SCHEDULED_JOBS.key?(klass) && Flipper.enabled?("disable_#{klass.underscore.to_sym}")
        return false
      end

      yield
    end
  end
end
# initializers/sidekiq.rb
Sidekiq.configure_server do |config|
  config.client_middleware do |chain|
    chain.add(SidekiqMiddleware::ScheduledJobsFlipper)
  end
end

Bulk scheduling jobs
Scheduling jobs one-by-one works well when you only have a few thousand. Once our app grew, we noticed we were spending a lot of time sending individual Redis requests. Each request is very fast, but requests add up when run thousands of times.
To improve this, we started bulk scheduling jobs:# Job is set to run every 5 minutes
class ScheduleBackupJobs < BaseJob
  sidekiq_options queue: "background"

  def perform
    # Schedule backup jobs in batches of 1,000
    BackupPolicy.needs_to_run.in_batches do |backup_policies|
      BackupJob.perform_bulk(backup_policies.pluck(:id))
    end
  end
end

Adding jitter to job scheduling
We don’t want certain types of jobs all running at once because they may overwhelm an external API. In those cases, we spread them out over a time period using perform_with_jitter. Below is a custom method we added to our job classes to handle this:# This will run sometime in the next 30 minutes
CleanUpJob.perform_with_jitter(id, max_wait: 30.minutes)
# app/jobs/application_job.rb
MAX_WAIT = 1.minute
def self.perform_with_jitter(*args, **options)
  max_wait = options[:max_wait] || MAX_WAIT
  min_wait = options[:min_wait] || 0.seconds
  random_wait = rand(min_wait...max_wait)
  set(wait: random_wait).perform_later(*args)
end

Handling uniqueness
With the way we schedule jobs, it’s possible for us to get multiple of the same job in our queues. We handle this in a few ways.
1. Exit quickly
We store state in our database and quickly exit a job if it no longer needs to be run:def perform(id)
  user = user.find(id)
  return unless user.pending?
  # ...
end

2. Use database locks
We avoid race conditions, such as when multiple jobs are updating the same data at once:backup.with_lock do
  backup.restore_from_backup!
end

3. Use sidekiq unique jobs
Sidekiq Enterprise includes the ability to have unique jobs. This will stop a duplicate job from ever being enqueued:class CheckDeploymentStatusJob < BaseJob
  sidekiq_options queue: "urgent", retry: 5, unique_for: 1.minute, unique_until: :start
  #...
end

Learn more
Sidekiq Background Jobs
Database locks in Rails]]></content>
        <summary><![CDATA[How to build self-healing background jobs into your application with background queueing systems like Sidekiq.]]></summary>
      </entry>
    
      <entry>
        <title>Build a Laravel application with a MySQL database</title>
        <link href="https://planetscale.com/blog/build-a-laravel-application-with-a-mysql-database" />
        <id>https://planetscale.com/blog/build-a-laravel-application-with-a-mysql-database</id>
        <published>2022-02-15T16:10:00.000Z</published>
        <updated>2022-02-15T16:10:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[In this tutorial, you’ll learn how to build a mood tracker application with Laravel 9, connect it to a PlanetScale MySQL database, make database schema changes, and deploy your database branch to production. You’ll use PlanetScale for the database, which gives you a fully managed MySQL-compatible database, unlimited scalability, a Git-like development workflow, zero downtime schema changes, and more.
This blog post is over a year old and may be out of date.

Here are some of the highlights of what you’ll create:
A production-ready MySQL database in PlanetScale
2 database branches: main for production and dev for development
2 tables, moods and entries, to store your daily entries and mood options
Database migrations and a seeder to run on your development database
Deploy non-blocking schema changes from development to production with PlanetScale branching and deploy requests.
If you just want to get up and running with Laravel and PlanetScale, check out our Laravel quickstart instead. You can also find the final code for this application on GitHub if you'd prefer to just poke around.
Let’s get started!
Prerequisites
To run this tutorial, you will need the following:
Beginner Laravel knowledge — This tutorial uses Laravel v9.0.
PHP — This tutorial uses v8.1
Composer
npm
A PlanetScale account
Laravel installation
There are several ways to install a new Laravel 9 application.
This tutorial will use the Composer installation method, but feel free to choose whichever method you prefer.
To install with Composer, make sure you have PHP and Composer installed on your machine.composer create-project laravel/laravel laravel-mood-tracker

Once installed, enter into your folder, open it in your code editor, and start the application:cd laravel-mood-tracker
php artisan serve

You can view your application at http://localhost:8000/.
PlanetScale setup
Before you dive into the code, let’s get your database set up. Head over to PlanetScale to sign up for an account if you haven’t already.
Create a database
Next you’ll be prompted to create a new database. Give it a name, select the region closest to you or your application, and click "Create database".
While you're here, go to the "Settings" tab on your database, and click the checkbox next to "Automatically copy migration data", select "Laravel" from the dropdown, and click "Save database changes". This isn’t required, but it allows PlanetScale to track your Laravel migrations so that new database branches are automatically created with the latest migrations.
Working with branches
One of PlanetScale’s powerful features is the ability to create branches, similar to the Git model.
Your database comes with one default branch, main. This branch is designated as a development branch to start, with the goal to move it to production after you make any necessary schema changes.
To demonstrate how this works, let’s promote the main branch to production now, and then create a new branch, dev, for development. The main branch is empty at this point, but once you're done with all your changes on the dev branch, you can roll them up to main.
To create a new branch:
Select your database from the main page
Click "New branch"
Name it dev or whatever you want
Click "Create branch"

Next, promote the main branch to production. A production branch with safe migrations enabled protects you from making direct schema changes (meaning less chance of mistakes!) and includes an additional replica to improve availability.
Promote main to production:
Go back to the database overview page and click the Branches tab
Click the main branch
Click the "Promote a branch to production" button
Make sure main is selected and click "Promote branch"
That’s it! You now have a live database with two branches hosted on PlanetScale.
Your database is currently empty, so let’s get it connected to your Laravel app so you can start writing data to it.
Connect your database
You can connect your development database to your Laravel application in one of two ways: using the PlanetScale proxy or with a username and password. This tutorial will show you how to connect with a username and password, but you can use either option.
Connect with username and password
Back in your PlanetScale dashboard, click on the "Branches" tab of your database and select dev.
Click the "Connect" button
Click "New password"
Select "Laravel" from the dropdown that’s currently set to "General"
Copy the full set of credentials. Make sure you copy it, as you won’t be able to see the password again when you leave the page. You can always generate a new one if you do forget to store it.
Configure your Laravel 9 app
Next, let’s connect your Laravel application. Open up your .env file in your code editor, find the database connection section, and paste in the credentials you copied from the PlanetScale dashboard.
It should look something like this:DB_CONNECTION=mysql
DB_HOST=xxxxxxxxxx.us-east-3.psdb.cloud
DB_PORT=3306
DB_DATABASE=laravel-mood-tracker
DB_USERNAME=xxxxxxxxxxx
DB_PASSWORD=pscale_pw_xxxxxx-xx-xxxxxxxxxxxxxxxxxxxxxxxx
MYSQL_ATTR_SSL_CA=/etc/ssl/cert.pem

The value for MYSQL_ATTR_SSL_CA will depend on your system. You can find more information on our Secure Connections page.
Finally, run your Laravel application with:php artisan serve

Your PlanetScale development branch is now connected to your Laravel application! Let’s add a schema and some data to see it in action.
Models and migrations and seeders, oh my!
For this mood tracker application, you're going to need two models: Mood and Entry.
In addition, each model will have a corresponding controller, migration, and/or seeder and factory file. Here’s an overview of what each file is used for and how they interact:
Models — To use Laravel's ORM, Eloquent, each table has a Model that lets Eloquent know how to interact with that table. It holds information about table relationships, what attributes can be modified, and more.
Controllers — Each model also has a corresponding controller. These controllers hold the logic for handling requests in your application. For example, to view all entries in your mood tracker, your application will use the specified controller class to figure out how to grab the data.
Migrations — Your migrations not only define the database schema that your application uses, but also act as version control. If someone new joins your team, all they have to do to get the current version of your schema (as well as all history) is run the existing migrations. Any time you need to modify the schema, you’ll create a migration to do so.
Seeders — Your seeder files allow you to run your development database with some initial data, which is beneficial for testing your application. You can create your seeders with specific data, auto-generated data, or a combination. Either way, it’s helpful to know that the entire team working on the application can easily use the same sample data.
Factories — The factory files work with your models and seeder files to define the seed data. You can’t just use any data when you seed your database. It must match the database schema requirements such as type, length, uniqueness, etc. With the factory, you can use these definitions from your Model class to control what data is created and how much.
You can actually create skeletons for all of these files with a single command! Run the following to create these files:php artisan make:model Mood -mcrs
php artisan make:model Entry -mfcr
php artisan make:controller HomeController

The moods table will be pretty static, so you don’t need a factory file for that, just for entries. Likewise, you won’t need a seeder file for the entries table, as you’ll call the factory straight from the main seeder file. You're also making a standalone HomeController file to generate the homepage. You’ll see a total of 9 new files structured as follows:├ app
├── Http
│   ├── Controllers
│       └── MoodController.php
│       └── EntryController.php
│       └── HomeController.php
├── Models
│   ├── Mood.php
│   └── Entry.php
├── database
│   ├── factories
│       └── EntryFactory.php
│   └── migrations
│       └── xxxx_xx_xx_xxxxxx_create_moods_table.php
│       └── xxxx_xx_xx_xxxxxx_create_entries_table.php
│   └── seeders
│       └── MoodSeeder.php
└

Let’s set up these files now so you can get a better sense of what the data will look like.
Create migrations
First, delete all of the existing migration files in database/migrations except the two files for moods and entries. There’s also a sneaky one hanging out in vendor/laravel/sanctum/database/migrations that you can delete as well.
Open up the migration file for the moods table under database/migrations/xxxx_xx_xx_xxxxxx_create_moods_table.php and replace it with the following:<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

class CreateMoodsTable extends Migration
{
    /**
     * Run the migrations.
     *
     * @return void
     */
    public function up()
    {
        Schema::create('moods', function (Blueprint $table) {
            $table->id();
            $table->string('name');
            $table->string('color');
        });
    }

    /**
     * Reverse the migrations.
     *
     * @return void
     */
    public function down()
    {
        Schema::dropIfExists('moods');
    }
}

Once ran, this will create the moods table with columns id, name, and color, as described below:
id (UNSIGNED BIGINT) — Auto-increments to identify the mood
name (VARCHAR) — Name of the mood
color (VARCHAR) — Color used to represent the mood
Next, open the migration file for entries at database/migrations/xxxx_xx_xx_xxxxxx_create_entries_table.php and replace it with:<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

class CreateEntriesTable extends Migration
{
    /**
     * Run the migrations.
     *
     * @return void
     */
    public function up()
    {
        Schema::create('entries', function (Blueprint $table) {
            $table->id();
            $table->date('date');
            $table->text('notes');
            $table->foreignId('mood_id');
        });
    }

    /**
     * Reverse the migrations.
     *
     * @return void
     */
    public function down()
    {
        Schema::dropIfExists('entries');
    }
}

Here are the columns for the entries table:
id (UNSIGNED BIGINT) — Auto-increments to identify the entry
date (DATE) — The date of the entry
notes (TEXT) — Any notes that go along with the entry
mood_id (UNSIGNED BIGINT) — The corresponding mood for this entry
While normally you could add the constrained() method to the foreign key (mood_id) to enforce referential integrity, we've purposely left it out here. PlanetScale previously did not support foreign key constraints enforced at the database level because we believe they aren't worth the trade-offs in performance and scalability.
PlanetScale now supports foreign key constraints, so you can use the constrained() method if you would prefer to.
Run migrations
Now it’s time to run the migrations. In your terminal in the Laravel project directory, run the following:php artisan migrate

Since you're connected to your PlanetScale database, these migrations are now live on your dev branch (or whatever you configured in your .env file)!
To confirm this, go to your PlanetScale dashboard, click on your database, click "Branches", select the dev branch, click "Schema", and click "Refresh schema". You should see three tables: entries, migrations, and moods.

You can also view your tables in the PlanetScale MySQL console by clicking "Console" and running:SHOW tables;
DESCRIBE entries;
DESCRIBE moods;

Set up factories and seeders
Let’s add some data to your database. Open up database/seeders/MoodSeeder.php and replace it with the following:<?php

namespace Database\Seeders;

use Illuminate\Database\Seeder;
use Illuminate\Support\Facades\DB;

class MoodSeeder extends Seeder
{
  /**
   * Run the database seeds.
   *
   * @return void
   */
  public function run()
  {
    DB::table('moods')->insert([
      'name' => 'Happy',
      'color' => '#FEC8DF',
    ]);

    DB::table('moods')->insert([
      'name' => 'Sad',
      'color' => '#75CFE0',
    ]);

    DB::table('moods')->insert([
      'name' => 'Angry',
      'color' => '#F5C691',
    ]);

    DB::table('moods')->insert([
      'name' => 'Productive',
      'color' => '#C5E8B4',
    ]);

    DB::table('moods')->insert([
      'name' => 'Normal',
      'color' => '#FFEFC9',
    ]);

    DB::table('moods')->insert([
      'name' => 'Calm',
      'color' => '#BBA1D5',
    ]);
  }
}

As mentioned before, the moods table will be pretty static for now, so you can just explicitly create the data in the seeder since there isn’t much to it.
For the entries seed data, you’ll want to generate several records with some random values instead of hard-coded data like in moods. This is where factories come into play. Open up database/factories/EntryFactory.php and replace it with:<?php

namespace Database\Factories;

use Illuminate\Database\Eloquent\Factories\Factory;
use App\Models\Mood;

class EntryFactory extends Factory
{
    /**
     * Define the model's default state.
     *
     * @return array
     */
    public function definition()
    {
        return [
            'notes' => $this->faker->realText($maxNbChars = 300),
            'mood_id' => Mood::inRandomOrder()->value('id'),
        ];
    }
}

The text for each entry is being generated using Faker PHP. A random id from the Mood model is assigned for mood_id. The date entry is a little more complicated because it needs to be unique and in a specific format. You can’t use the Faker library for this because it can only generate a unique DATETIME value, not DATE. You’ll create the random date values in the next step.
Finally, modify your main database/seeders/DatabaseSeeder.php file as follows:<?php

namespace Database\Seeders;

use Illuminate\Database\Seeder;
use App\Models\Entry;
use Illuminate\Support\Carbon;

class DatabaseSeeder extends Seeder
{
  /**
   * Seed the application's database.
   *
   * @return void
   */
  public function run()
  {
    $this->call(MoodSeeder::class);

    // create an array of random unique dates in the format y-m-d
    $randomDates = [];
    while (count($randomDates) < 15) {
      $date = Carbon::today()->subDays(rand(0, 31))->format('Y-m-d');
      if (!in_array($date, $randomDates))
        array_push($randomDates, $date);
    }

    foreach($randomDates as $date) {
      Entry::factory()->create([
        'date' => $date
      ]);
    }
  }
}

This first runs the MoodSeeder.php file that you filled out earlier. Next, you're creating an array of 15 random, unique dates using the Carbon library. Finally, you loop through that array, call the database/factories/EntryFactory.php file that you created in the previous step, and add the random date to each entries record. The EntryFactory uses the create() method to create new database records based on the Entry model.
Set up models
The final step before seeding is to set up the models. First, open up app/Models/Mood.php and replace it with:<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Facades\Cache;

class Mood extends Model
{
    public $timestamps = FALSE;
    protected $fillable = ['name', 'color'];

    // Get the entries for a specific mood.
    public function entries()
    {
        return $this->hasMany(Mood::class);
    }

    // Clear moods cache upon modifying a mood entry
    protected static function boot()
    {
        parent::boot();

        static::saving(function() {
            Cache::forget('moods');
        });
    }
}

Here, you're first specifying what attributes can be modified in fillable. Forgetting to set this is a common mistake that can be difficult to debug as a beginner, so any time you add a column that you may write to, make sure to update it here!
You're also setting timestamps to FALSE so that Laravel doesn’t automatically create created_at and updated_at columns in the moods table.
Models also allow you to define Eloquent relationships. In the Mood model, you're defining the one-to-many relationship between entries and moods. Each mood can have several entries, but each entry will only have one mood that corresponds to it. This is reflected in the entries() function using the hasMany() method. Each Mood has many entries.
The boot() method is used to clear the cache upon saving a new mood. You’ll see where this comes into play when you update your controllers.
Next, open app/Models/Entry.php and replace it with the following:<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Facades\Cache;

class Entry extends Model
{
    use HasFactory;

    public $timestamps = FALSE;
    protected $fillable = ['date', 'notes', 'mood_id'];

    public function mood()
    {
        return $this->belongsTo(Mood::class);
    }

    // Clear entries cache upon modifying an entry
    protected static function boot()
    {
        parent::boot();

        static::saving(function() {
            Cache::forget('entries');
        });
    }
}

This is similar to the Mood model. You're also creating the inverse relationship using belongsTo(). Each Entry belongs to exactly one Mood.
Defining these relationships now will allow you to use Eloquent, Laravel's ORM, to easily work with your data.
Seed your database
Finally, it’s time to seed your database! In the terminal in your project folder, run the following:php artisan db:seed

This will run the main database/seeders/DatabaseSeeder.php file.
To view your seeded data and confirm that it worked, head back to your PlanetScale dashboard, select the database, click "Branches", select the dev branch, and click "Console". Run the following queries:SELECT * FROM moods;

You should see the data for the moods table you created.SELECT * FROM entries;

For the entries table, you have: 15 records with random, unique dates from the past 2 months, random text under notes, and a randomly selected mood_id that matches one of the moods in the moods table.
Now that your development branch is loaded up with some mock data, let’s set up the resource controllers so you can create and modify the data.
Add controllers
While the database is set up and ready to go, the application doesn’t actually do anything yet. Let’s fix that!
EntryController
First, open app/Http/Controllers/EntryController.php and replace it with:<?php

namespace App\Http\Controllers;

use App\Models\Entry;
use App\Models\Mood;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Cache;

class EntryController extends Controller
{
    /**
     * Display a listing of the resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function index()
    {
        // Store all entries in cache for 1 hour (3600 seconds)
        $entries = Cache::remember('entries', 3600, function() {
            return Entry::orderBy('date', 'ASC')->get();
        });

        return view('entries.index')
            ->with('entries', $entries);
    }

    /**
     * Show the form for creating a new resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function create()
    {
        $moods = Cache::remember('moods', 3600, function() {
            return Mood::all();
        });

        return view('entries.create')
            ->with('moods', $moods);
    }

    /**
     * Store a newly created resource in storage.
     *
     * @param  \Illuminate\Http\Request  $request
     * @return \Illuminate\Http\Response
     */
    public function store(Request $request)
    {

        // Date must be in format Y-M-D. Must also not already exist in entries table date column
        // Selected mood_id must exist in the moods table under column id
        $request->validate([
            'date' => 'required|date_format:Y-m-d|unique:entries,date,',
            'notes' => 'string|nullable',
            'mood_id' => 'required|exists:moods,id',
        ]);

        Entry::create($request->all());

        return redirect()->route('entries.index')
            ->with('success', 'Entry created.');

    }

    /**
     * Display the specified resource.
     *
     * @param  \App\Models\Entry  $entry
     * @return \Illuminate\Http\Response
     */
    public function show(Entry $entry)
    {
        return view('entries.show')
        ->with('entry', $entry);
    }

    /**
     * Show the form for editing the specified resource.
     *
     * @param  \App\Models\Entry  $entry
     * @return \Illuminate\Http\Response
     */
    public function edit(Entry $entry)
    {
        $moods = Cache::remember('moods', 3600, function() {
            return Mood::all();
        });

        return view('entries.edit', compact('entry', 'moods'));
    }

    /**
     * Update the specified resource in storage.
     *
     * @param  \Illuminate\Http\Request  $request
     * @param  \App\Models\Entry  $entry
     * @return \Illuminate\Http\Response
     */
    public function update(Request $request, Entry $entry)
    {
        // Date must be in format Y-M-D. Must also not already exist in entries table date column
        // Selected mood_id must exist in the moods table under column id
        $request->validate([
            'date' => 'required|date_format:Y-m-d|unique:entries,date,'. $entry->id,
            'notes' => 'string|nullable',
            'mood_id' => 'required|exists:moods,id',
        ]);

        $updatedEntry = $request->all();
        $entry->update($updatedEntry);

        return redirect()->route('entries.show', [$entry->id])
            ->with('success', 'Entry updated.');
    }

    /**
     * Remove the specified resource from storage.
     *
     * @param  \App\Models\Entry  $entry
     * @return \Illuminate\Http\Response
     */
    public function destroy(Entry $entry)
    {
        $entry->delete();

        return redirect()->route('entries.index')
            ->with('success', 'Entry deleted.');
    }
}

Then EntryController has the following methods:
index() — Display all entries
create() — Display the form to create a new entry
store() — Validate new entry input and save it to the database
show() — Display a single entry
edit() — Display the form to update an entry
update() — Validate updated entry input and update in the database
destroy() — Delete an entry
Let’s go over a few notable details of this controller that you’ll also see in the MoodController.
Caching
The index() method grabs ALL entries from the database. While this isn’t a huge number for this sample application, it could potentially turn into a huge performance and cost hit as the data grows.
There are a few ways to improve the performance, but one quick solution is to cache the data.$entries = Cache::remember('entries', 3600, function() {
    return Entry::orderBy('date', 'ASC')->get();
});

Laravel makes caching easy using the Cache::remember() method. This will check the cache to see if the data already exists there, and if not, it will pull from the database and store it in the cache for 3600 seconds as entries.
For a more in-depth primer on Laravel caching, check out Introduction to Laravel caching.
Views
With every method, you’ll see a return statement at the end that either returns a view or a redirect along with some data.return view('entries.index')
    ->with('entries', $entries);

In the above example, after the method executes, the user will be routed to the entries index page found at resources/views/entries/index.php (you’ll create this soon). The data for $entries will also be passed to the view.
Form validation
The store() and update() methods both require some kind of form validation before storing the entries to the database. You should never trust user input, so backend form validation is essential for, well, validating that the user's input is correct.$request->validate([
    'date' => 'required|date_format:Y-m-d|unique:entries,date,'. $entry->id,
    'notes' => 'string|nullable',
    'mood_id' => 'required|exists:moods,id',
]);

Laravel makes even the most complex form validation a breeze. The above code snippet is from the update() method.
Let’s examine the date validation. The first two, required and date_format, are pretty straightforward.
The next one, unique:table,column, is a little more complex. You don’t want repeated dates in this application, so you must check that the date is unique when validating. However, if you're updating an existing entry, your application will compare the user's updated input to the existing entry. If the user is only updating the text, then the date will be the same, so it will fail validation. To get around this, you can pass in the current id and it will check that all dates are unique except for the date on the specified entry.
HomeController
Next, set up the HomeController, which will be used to grab the data for the homepage. Open up app/Http/Controller/HomeController.php and paste in the following:<?php

namespace App\Http\Controllers;

use App\Models\Entry;
use App\Models\Mood;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Cache;

class HomeController extends Controller
{
    /**
     * Display a listing of the resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function index()
    {
        $entries = Cache::remember('entries', 3600, function() {
            return Entry::orderBy('date', 'ASC')->get();
        });

        $moods = Cache::remember('moods', 3600, function() {
            return Mood::all();
        });

        return view('home', compact('entries', 'moods'));
    }
}

MoodController
Finally, open up app/Http/Controllers/MoodController.php and paste in the following:<?php

namespace App\Http\Controllers;

use App\Models\Mood;
use Illuminate\Http\Request;

class MoodController extends Controller
{
    /**
     * Display a listing of the resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function index()
    {
        $moods = Mood::all();

        return view('moods.index')
            ->with('moods', $moods);
    }

    /**
     * Show the form for creating a new resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function create()
    {
        return view('moods.create');
    }

    /**
     * Store a newly created resource in storage.
     *
     * @param  \Illuminate\Http\Request  $request
     * @return \Illuminate\Http\Response
     */
    public function store(Request $request)
    {
      $this->validate($request, [
        'name' => 'required|string',
        'color' => 'required|string',
      ]);

      Mood::create($request->all());

      return redirect()->route('entries.mood')
        ->with('success', 'Mood created.');
    }

    /**
     * Display the specified resource.
     *
     * @param  \App\Models\Mood  $mood
     * @return \Illuminate\Http\Response
     */
    public function show(Mood $mood)
    {
        return view('moods.show')
            ->with('mood', $mood);
    }

    /**
     * Show the form for editing the specified resource.
     *
     * @param  \App\Models\Mood  $mood
     * @return \Illuminate\Http\Response
     */
    public function edit(Mood $mood)
    {
      return view('moods.edit')->with('mood', $mood);
    }

    /**
     * Update the specified resource in storage.
     *
     * @param  \Illuminate\Http\Request  $request
     * @param  \App\Models\Mood  $mood
     * @return \Illuminate\Http\Response
     */
    public function update(Request $request, Mood $mood)
    {
        $this->validate($request, [
            'name' => 'required|string',
            'color' => 'required|string',
        ]);

        $updatedMood = $request->all();
        $mood->update($updatedMood);

        return redirect()->route('moods.show', [$mood->id])
            ->with('success', 'Mood updated.');
    }

    /**
     * Remove the specified resource from storage.
     *
     * @param  \App\Models\Mood  $mood
     * @return \Illuminate\Http\Response
     */
    public function destroy(Mood $mood)
    {
        $mood->delete();

        return redirect()->route('moods.index')
            ->with('success', 'Mood deleted.');
    }
}

This is very similar to the EntryController, so for the sake of brevity, I won’t expand on any of the details.
Set up routes
Now that you have controllers created, let’s set up the routes. Open up routes/web.php and replace it with:<?php

use Illuminate\Support\Facades\Route;
use App\Http\Controllers\MoodController;
use App\Http\Controllers\EntryController;
use App\Http\Controllers\HomeController;

Route::get('/', [HomeController::class, 'index']);
Route::resource('moods', MoodController::class);
Route::resource('entries', EntryController::class);

When you made your Entry and Mood controllers earlier, you specified them as resource controllers by using the -r flag. This means the controllers are pre-built to handle all CRUD (create, read, update, destroy) operations.
You can then use the Route::resource() method to generate all of the routes needed to create, read, update, and delete in just a single line.
A nifty way to see all of the routes this will create is by running:php artisan route:list

You should see something like this:+--------+-----------+----------------------+-----------------+---------------------------------------------+-------------+
| Domain | Method    | URI                  | Name            | Action                                      | Middleware  |
+--------+-----------+----------------------+-----------------+---------------------------------------------+-------------+
|        | GET|HEAD  | /                    |                 | App\Http\Controllers\HomeController@index   | web         |
|        | GET|HEAD  | entries              | entries.index   | App\Http\Controllers\EntryController@index  | web         |
|        | POST      | entries              | entries.store   | App\Http\Controllers\EntryController@store  | web         |
|        | GET|HEAD  | entries/create       | entries.create  | App\Http\Controllers\EntryController@create | web         |
|        | GET|HEAD  | entries/{entry}      | entries.show    | App\Http\Controllers\EntryController@show   | web         |
|        | PUT|PATCH | entries/{entry}      | entries.update  | App\Http\Controllers\EntryController@update | web         |
|        | DELETE    | entries/{entry}      | entries.destroy | App\Http\Controllers\EntryController@destroy| web         |
|        | GET|HEAD  | entries/{entry}/edit | entries.edit    | App\Http\Controllers\EntryController@edit   | web         |
|        | GET|HEAD  | moods                | moods.index     | App\Http\Controllers\MoodController@index   | web         |
|        | POST      | moods                | moods.store     | App\Http\Controllers\MoodController@store   | web         |
|        | GET|HEAD  | moods/create         | moods.create    | App\Http\Controllers\MoodController@create  | web         |
|        | GET|HEAD  | moods/{mood}         | moods.show      | App\Http\Controllers\MoodController@show    | web         |
|        | PUT|PATCH | moods/{mood}         | moods.update    | App\Http\Controllers\MoodController@update  | web         |
|        | DELETE    | moods/{mood}         | moods.destroy   | App\Http\Controllers\MoodController@destroy | web         |
|        | GET|HEAD  | moods/{mood}/edit    | moods.edit      | App\Http\Controllers\MoodController@edit    | web         |
+--------+-----------+----------------------+-----------------+---------------------------------------------+-------------+

At this point, you have a working database seeded with mock data and complete CRUD functionality. All you need to do now is create the application views so that your users can interact with the application.
Tailwind setup
The layouts for this app use Tailwind for styling, as well as some of the pre-built TailwindUI components, so you’ll need to add Tailwind as a dependency.
Here’s how you pull Tailwind into your app:
Run the following in the root of your Laravel project to install it:npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init

Open up webpack.mix.js and add tailwindcss as a PostCSS plugin:mix.js('resources/js/app.js', 'public/js').postCss('resources/css/app.css', 'public/css', [
  require('tailwindcss') // <---- new code
])

Open tailwind.config.js and update module.exports as follows:module.exports = {
  content: ['./resources/**/*.blade.php', './resources/**/*.js', './resources/**/*.vue'],
  theme: {
    extend: {}
  },
  plugins: []
}

Open resources/css/app.css and paste in:@tailwind base;
@tailwind components;
@tailwind utilities;

In a new terminal tab, start the build process with:npm run watch

Create views
I’m going to speed through this section as there are quite a few files and a lot of copying and pasting. You’ll begin to see your app come to life with every new view added, so feel free to run your application now and watch as the magic happens!php artisan serve

First, create the following folders and files in the resources/views directory:
home.blade.php
layout.blade.php
entries/index.blade.php
entries/create.blade.php
entries/edit.blade.php
entries/show.blade.php
moods/index.blade.php
moods/create.blade.php
moods/edit.blade.php
moods/show.blade.php
You can paste this in your terminal so you don’t have to make them manually:cd resources/views
mkdir entries && mkdir moods
touch home.blade.php
touch layout.blade.php
touch entries/index.blade.php
touch entries/create.blade.php
touch entries/edit.blade.php
touch entries/show.blade.php
touch moods/index.blade.php
touch moods/create.blade.php
touch moods/edit.blade.php
touch moods/show.blade.php

Reference the final GitHub repository if you want to confirm yours is laid out correctly.
layout.blade.php
Open up the resources/views/layout.blade.php file. This is going to be the main layout that will be reused across the rest of the views. Content from the other views will be injected where you see @yield('content').
Paste in the following:<!doctype html>
<html class="h-full bg-gray-100" lang="{{ str_replace('_', '-', app()->getLocale()) }}">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>@yield('title') - Mood Tracker</title>
    <link href="{{ asset('css/app.css') }}" rel="stylesheet" />
  </head>
  <body class="h-full">
    <div class="min-h-full">
      <nav class="bg-gray-800">
        <div class="mx-auto max-w-7xl px-4 sm:px-6 lg:px-8">
          <div class="flex h-16 items-center justify-between">
            <div class="flex items-center">
              <div class="flex-shrink-0">
                <a class="text-light text-2xl font-semibold" href="/">Year in Pixels</a>
              </div>
              <div class="hidden md:block">
                <div class="ml-10 flex items-baseline space-x-4">
                  <a
                    href="/moods"
                    class="hover:text-light rounded-md px-3 py-2 text-sm font-medium text-gray-300 hover:bg-gray-700"
                    >Moods</a
                  >

                  <a
                    href="/entries"
                    class="hover:text-light rounded-md px-3 py-2 text-sm font-medium text-gray-300 hover:bg-gray-700"
                    >Entries</a
                  >
                </div>
              </div>
            </div>
            <div class="absolute inset-y-0 right-0 flex items-center pr-2 sm:static sm:inset-auto sm:ml-6 sm:pr-0">
              <a
                href="/moods/create"
                class="bg-stone-500 hover:bg-stone-700 text-light mx-6 inline-flex justify-center rounded-md border border-transparent px-4 py-2 text-sm font-medium shadow-sm"
              >
                New mood
              </a>
              <a
                href="/entries/create"
                class="text-light inline-flex justify-center rounded-md border border-transparent bg-gray-600 px-4 py-2 text-sm font-medium shadow-sm hover:bg-gray-700"
              >
                New entry
              </a>
            </div>
          </div>
        </div>
      </nav>

      <main>
        <div class="container mx-auto px-8 py-16">@yield('content')</div>
      </main>
    </div>
  </body>
</html>

home.blade.php
Next, open up resources/views/home.blade.php and paste in the following:@section('title', 'Home')
@extends('layout')

@section('content')
  <div class="grid grid-cols-6">
    <div class="moods col-span-4 gap-2">
      <div class="grid grid-cols-2 lg:grid-cols-6 gap-2">
        @foreach($entries as $entry)
        <a href="/entries/{{$entry->id}}">
          <div class="w-16 h-16 justify-center items-center flex" style="background-color:{{ $entry->mood->color }}">
            <span class="text-black text-lg">
              {{ \Carbon\Carbon::parse($entry->date)->format('m/d') }}
            </span>
          </div>
        </a>
        @endforeach
      </div>
    </div>
    <div class="legend col-span-2 pl-0 md:pl-20">
      <span class="font-semibold">Legend</span>
      <ul>
      @foreach($moods as $mood)
        <li>
          <span class="w-3 h-3 inline-block" style="background-color:{{ $mood->color }}"></span>
          {{ $mood->name }}
        </li>
      @endforeach
      </ul>
    </div>
  </div>
</body>
</html>
@endsection

This file uses the HomeController to render the homepage. If you look at that controller, you’ll see the moods and entries data are both retrieved from the database and passed to this view. You're then looping through both to display on the page.
entries/index.blade.php
This file, along with the next three files, will render the views for the following pages:
Landing page that shows all entries
Page to view a single entry
Page to edit an entry
Page to create a new entry
Open up entries/index.blade.php and paste in the following:@section('title', 'Entries')
@extends('layout')

@section('content')
<div class="flex flex-col">
  <div class="-my-2 overflow-x-auto sm:-mx-6 lg:-mx-8">
    <div class="py-2 align-middle inline-block min-w-full sm:px-6 lg:px-8">
      <div class="shadow overflow-hidden border-b border-gray-200 sm:rounded-lg">
        <table class="min-w-full divide-y divide-gray-200">
          <thead class="bg-gray-50">
            <tr>
              <th scope="col" class="px-6 py-3 text-left text-xs font-medium text-secondary uppercase tracking-wider">
                Date
              </th>
              <th scope="col" class="px-6 py-3 text-left text-xs font-medium text-secondary uppercase tracking-wider">
                Mood
              </th>
              <th scope="col" class="px-6 py-3 text-left text-xs font-medium text-secondary uppercase tracking-wider">
                Notes
              </th>
              <th scope="col" class="relative px-6 py-3">
                <span class="sr-only">Edit</span>
              </th>
              <th scope="col" class="relative px-6 py-3">
                <span class="sr-only">Delete</span>
              </th>
            </tr>
          </thead>
          <tbody class="bg-white divide-y divide-gray-200">
          @foreach($entries as $entry)
            <tr>
              <td class="px-6 py-4 whitespace-nowrap">
                <div class="flex items-center">
                  <div class="flex-shrink-0 h-10 w-10" style="background-color:{{ $entry->mood->color }}">
                  </div>
                  <div class="ml-4">
                    <div class="text-sm font-medium text-secondary">
                    {{ \Carbon\Carbon::parse($entry->date)->format('M d, Y') }}
                    </div>
                  </div>
                </div>
              </td>
              <td class="px-4 py-4 whitespace-nowrap">
                <div class="text-sm text-dark">
                {{ $entry->mood->name }}
                </div>
              </td>
              <td class="px-4 py-4 whitespace-wrap">
                <p class="text-xs text-secondary">
                    {{ $entry->notes }}
                </p>
              </td>
              <td class="px-4 py-4 text-right text-sm font-medium">
                <a href="/entries/{{ $entry->id }}/edit" class="text-indigo-600 hover:text-indigo-900">Edit</a>
              </td>
              <td class="px-4 py-4 text-right text-sm font-medium">
                <form action="{{ route('entries.destroy', $entry->id) }}" method="POST">
                @csrf
                @method('DELETE')
                  <button type="submit" class="text-red-600 hover:text-red-900">Delete</button>
              </form>
              </td>
            </tr>
            @endforeach
          </tbody>
        </table>
      </div>
    </div>
  </div>
</div>
</body>
</html>
@endsection

This loops through all entries and displays them. It also includes the button to get to the individual edit page and a button for deletion.
entries/edit.blade.php
Open up entries/edit.blade.php and paste in:@section('title', 'Edit entry')
@extends('layout')

@section('content')

<div class="mt-10 sm:mt-0">
  <div class="md:grid md:grid-cols-3 md:gap-2">
    <div class="md:col-span-1">
      <div class="px-4 sm:px-0">
        <h3 class="text-lg font-medium text-dark">Edit your mood entry</h3>
        <p class="mt-1 text-sm text-gray-600">
          How are you feeling?
        </p>
      </div>
    </div>
    <div class="mt-5 md:mt-0 md:col-span-2">
      <form action="{{ route('entries.update', $entry->id) }}" method="POST">
      @csrf
      @method('PATCH')
        <div class="shadow overflow-hidden sm:rounded-md">
          <div class="px-4 py-5 bg-white sm:p-6">
            <div class="grid grid-cols-6 gap-6">
              <div class="col-span-6 sm:col-span-3">
                <label for="date" class="block text-sm font-medium text-gray-700">Date</label>

                <input type="text" name="date" id="date" value="{{ $entry->date }}" class="mt-1 focus:ring-indigo-500 focus:border-indigo-500 block w-full shadow-sm sm:text-sm border-gray-300 rounded-md py-2 px-3">
                <span class="text-gray-400 text-xs">Date must be in format YYYY-MM-DD</span>
              </div>

              <div class="col-span-6 sm:col-span-3">
                <label for="mood" class="block text-sm font-medium text-gray-700">Mood</label>
                <select id="mood_id" name="mood_id" class="mt-1 block w-full py-2 px-3 border border-gray-300 bg-white rounded-md shadow-sm focus:outline-none focus:ring-indigo-500 focus:border-indigo-500 sm:text-sm">
                    @foreach ($moods as $mood)
                      <option
                        value="{{ $mood->id }}"
                        {{ ( $mood->name == $entry->mood->name) ? 'selected' : '' }} >
                        {{ $mood->name }}
                      </option>
                    @endforeach
                </select>
              </div>
              <div class="col-span-6">
                <label for="notes" class="block text-sm font-medium text-gray-700">Notes</label>
                <textarea name="notes" id="notes" rows="6" class="mt-1 p-3 focus:ring-indigo-500 focus:border-indigo-500 block w-full shadow-sm sm:text-sm border-gray-300 rounded-md">{{ $entry->notes }}</textarea>
              </div>
            </div>
            @if ($errors->any())
            <div class="bg-red-100 border border-red-400 mt-8 text-red-700 px-4 py-3 rounded relative" role="alert">
              <strong class="font-semibold">Please fix the following issues with your input:</strong>
                <ul>
                  @foreach ($errors->all() as $error)
                    <li>{{ $error }}</li>
                  @endforeach
                </ul>
            </div>
            @endif
          </div>

          <div class="px-4 py-3 bg-gray-50 text-right sm:px-6">
            <button type="submit" class="inline-flex justify-center py-2 px-4 border border-transparent shadow-sm text-sm font-medium rounded-md text-light bg-indigo-600 hover:bg-indigo-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-indigo-500">
              Update
            </button>
          </div>
        </div>
      </form>
    </div>
  </div>
</div>
@error('title')
    <div class="alert alert-danger">{{ $message }}</div>
@enderror
</body>
</html>
@endsection

Since the resource controller uses PATCH, you're using a hidden PATCH method on the form with @method('PATCH').
You're also validating this input on the backend in EntryController.php, so if the input is invalid, the error messages are displayed here with {{ $error }}.
Another thing to note is this view needs to show the existing entry that’s being updated, so you're setting the value for each input with the entry data that’s passed in from the database.
entries/create.blade.php
Next, update the view used to create a new entry. Open up entries/create.blade.php and paste in the following:@section('title', 'New entry')
@extends('layout')

@section('content')

<div class="mt-10 sm:mt-0">
  <div class="md:grid md:grid-cols-3 md:gap-6">
    <div class="md:col-span-1">
      <div class="px-4 sm:px-0">
        <h3 class="text-lg font-medium text-dark">Create a new entry</h3>
        <p class="mt-1 text-sm text-gray-600">
          How are you feeling today?
        </p>
      </div>
    </div>
    <div class="mt-5 md:mt-0 md:col-span-2">
      <form action="{{ route('entries.store') }}" method="POST">
      @csrf
        <div class="shadow overflow-hidden sm:rounded-md">
          <div class="px-4 py-5 bg-white sm:p-6">
            <div class="grid grid-cols-6 gap-6">
              <div class="col-span-6 sm:col-span-3">
                <label for="date" class="block text-sm font-medium text-gray-700">Date</label>
                <input type="text" name="date" id="date" placeholder="2021-12-05" class="mt-1 focus:ring-indigo-500 focus:border-indigo-500 block w-full shadow-sm sm:text-sm border-gray-300 rounded-md py-2 px-3">
              </div>

              <div class="col-span-6 sm:col-span-3">
                <label for="mood" class="block text-sm font-medium text-gray-700">Mood</label>
                <select id="mood_id" name="mood_id" class="mt-1 block w-full py-2 px-3 border border-gray-300 bg-white rounded-md shadow-sm focus:outline-none focus:ring-indigo-500 focus:border-indigo-500 sm:text-sm">
                    @foreach ($moods as $mood)
                      <option value="{{ $mood->id }}">{{ $mood->name }}</option>
                    @endforeach
                </select>
              </div>
              <div class="col-span-6">
                <label for="notes" class="block text-sm font-medium text-gray-700">Notes</label>
                <textarea name="notes" id="notes" class="mt-1 px-3 py-3 focus:ring-indigo-500 focus:border-indigo-500 block w-full shadow-sm sm:text-sm border-gray-300 rounded-md"></textarea>
              </div>
            </div>
            @if ($errors->any())
            <div class="bg-red-100 border border-red-400 mt-8 text-red-700 px-4 py-3 rounded relative" role="alert">
              <strong class="font-semibold">Please fix the following issues with your input:</strong>
                <ul>
                  @foreach ($errors->all() as $error)
                    <li>{{ $error }}</li>
                  @endforeach
                </ul>
            </div>
            @endif
          </div>
          <div class="px-4 py-3 bg-gray-50 text-right sm:px-6">
            <button type="submit" class="inline-flex justify-center py-2 px-4 border border-transparent shadow-sm text-sm font-medium rounded-md text-light bg-indigo-600 hover:bg-indigo-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-indigo-500">
              Create
            </button>
          </div>
        </div>
      </form>
    </div>
  </div>
</div>
</body>
</html>
@endsection

This is similar to the edit view, but without any existing data being pulled in. You can even get crafty and consolidate the two views, but I personally prefer to keep them separate so that it’s easier to read.
entries/show.blade.php
Finally, create the view that’s used to display a single entry. Open up entries/show.blade.php and paste in:@section('title', 'Entry')
@extends('layout')

@section('content')

<div class="bg-white shadow overflow-hidden sm:rounded-lg">
  <div class="px-4 py-5 sm:px-6">
    <h3 class="text-lg font-medium text-dark">
      Entry
    </h3>
    <a href="/entries/{{ $entry->id }}/edit" class="text-gray-400 text-xs text-right">Edit</a>
  </div>
  <div class="border-t border-gray-200">
    <dl>
      <div class="bg-gray-50 px-4 py-5 sm:grid sm:grid-cols-3 sm:gap-4 sm:px-6">
        <dt class="text-sm font-medium text-secondary">
          Date
        </dt>
        <dd class="mt-1 text-sm text-dark sm:mt-0 sm:col-span-2">
            {{ \Carbon\Carbon::parse($entry->date)->format('M d, Y') }}
        </dd>
      </div>
      <div class="bg-white px-4 py-5 sm:grid sm:grid-cols-3 sm:gap-4 sm:px-6">
        <dt class="text-sm font-medium text-secondary">
            Mood
        </dt>
        <dd class="mt-1 text-sm text-dark sm:mt-0 sm:col-span-2">
            {{ $entry->mood->name }}
        </dd>
      </div>
      <div class="bg-gray-50 px-4 py-5 sm:grid sm:grid-cols-3 sm:gap-4 sm:px-6">
        <dt class="text-sm font-medium text-secondary">
            Notes
        </dt>
        <dd class="mt-1 text-sm text-dark sm:mt-0 sm:col-span-2">
            {{ $entry->notes }}
        </dd>
      </div>
    </dl>
  </div>
</div>
</body>
</html>
@endsection

Mood views
The views for creating, updating, and displaying the moods are almost identical to those for entries, so you copy them in straight from the final repo. You can find the code for them in this section on the GitHub repository.
Add, update, and delete data
Your application is now complete and ready to play with! Let’s test it out.
Make sure you still have everything running:
Start the PHP server:php artisan serve

Run the build process for Tailwind:npm run watch

Navigate to http://localhost:8000 to view your app.
Add an entry
Click on the "New entry" button at the top right and fill out the form. Try to choose a date that’s already been taken or leave the required mood field blank and you’ll get a validation error, as expected. Once you submit a valid entry, you’ll be taken back to the entries index page where you’ll see the entry listed.
You can also click the "Edit" button to modify an entry, or the "Delete" button to get rid of one. This sample app doesn’t have a delete confirmation built in, so don’t click unless you're sure you want to delete it!
Deploy development database branch to production
PlanetScale offers branching capabilities, similar to the Git model. When you started working on this application, you created a development branch off of the main production branch. You’ve spent this whole time working in that development database branch, making schema changes as needed. But that production database branch is still empty.
So the next step is to merge this development branch into production. You do this by opening a PlanetScale deploy request (similar to a GitHub pull request). You can view your schema diff here and PlanetScale will check to make sure there are no merge conflicts.
Once everything is good, you can deploy the changes straight to production with zero downtime. It’s really that simple!
Let’s create a deploy request and merge this dev branch into production.
Create a deploy request
In your PlanetScale dashboard, select the database, click "Branches", and select the dev branch. On the Overview page, you’ll see the Deploy Request form. Make sure "Deploy to" is set to main. Write a comment to go with your deploy request, and then click "Create deploy request".

Once created, you’ll see a schema diff under "Schema changes" that shows you exactly what changes this deploy request will introduce if merged. This allows you and/or your team to carefully review schema changes before pushing them to production.
Deploy schema changes to production
Once the changes are approved, it’s time to merge the deploy request.
Click "Add changes to the deploy queue". As the changes are deploying, you’ll see the deployment progress for each table. One really cool feature to note here is that these schema changes are being updated with zero downtime or locking. The branching feature allows PlanetScale to offer non-blocking schema changes, so your production application will continue to work seamlessly as these changes are deployed. PlanetScale is handling it all in the background.
You can now go to your main branch, click "Schema", and you’ll see the schema you just created in development now live in production!
Recap
If you’ve reached the end, congratulations! You should have a working mood tracker application complete with all CRUD functionality and a production MySQL database.
Throughout the tutorial, you learned how to:
Create Laravel 9 controllers, models, migrations, factories, and seeders
Create Laravel forms
Validate input from Laravel forms
Work with Laravel Eloquent ORM
Connect your Laravel application to a MySQL database
Create PlanetScale deploy requests
Please let me know if you have any questions! You can find me on Twitter at @hollylawly. Thanks for reading!]]></content>
        <summary><![CDATA[Learn how to build a Laravel CRUD application connect it to a MySQL database and seed it with data.]]></summary>
      </entry>
    
      <entry>
        <title>How to seed a database with Prisma and Next.js</title>
        <link href="https://planetscale.com/blog/how-to-seed-a-database-with-prisma-and-next-js" />
        <id>https://planetscale.com/blog/how-to-seed-a-database-with-prisma-and-next-js</id>
        <published>2022-02-11T17:11:00.000Z</published>
        <updated>2022-02-11T17:11:00.000Z</updated>
        
        <author>
          <name>James Q Quick</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Next.js and Prisma are a popular combination for creating modern fullstack web applications. Next.js enables all of the power of React while adding support for Server Side Rendering (SSR), API routes, and more. Prisma is an ORM for JavaScript and TypeScript that allows developers to interact with SQL datbases without having to write raw SQL statements. In this tutorial, we’ll see how to seed a PlanetScale database using Prisma in a Next.js project.
Check out the Prisma quickstart for more information on setting up Next.js and Prisma.
What is database seeding
Database seeding involves populating your database with an initial set of clean data. This is extremely useful in a couple of different use cases:
Initial project setup
Let’s say you join a new project. You start by cloning the source code. Then, you may need to create a new database to work with, which means no data to start with. This makes exploring the app tedious as you have to manually create users, new records, etc. Well, you can automate that by seeding your database.
Automated testing
Another useful scenario is automated testing. Each time you run a new round of tests (this may be triggered manually or as part of your CI/CD workflow), you can seed your database with a controlled data set. This way, you can be sure that each time your tests run, they are being run against the exact same set of data.
Branching in PlanetScale
PlanetScale comes with the unique feature of database branching (similar to Github branches) which allows you to apply your schema to a new database instance. After you create a new branch, you can use a seed script to populate your new database branch with starter data.
To learn more, refer to the official branching documentation.
Setup
To get started, clone our Next.js starter repository.git clone https://github.com/planetscale/nextjs-starter

This project is already configured with Prisma (we’ll look at the data models in a second). To work with this project and see the seeding take place, you’ll need to create a new database in PlanetScale and a new connection string.
The database
Create a PlanetScale database in the dashboard or by using the CLI. Then, create a connection string for your database by following the connection strings documentation.
Choose Prisma in the dropdown while creating your password to automatically generate the correct format needed for working with Prisma.
Copy the .env.example file as .env:cp .env.example .env

Then, update the DATABASE_URL property with the following format.mysql://<USERNAME>:<PLAIN_TEXT_PASSWORD>@<ACCESS_HOST_URL>/<DATABASE_NAME>?sslaccept=strict

In this starter code, we have two different models configured, Product and Category in the schema.prisma file.model Product {
    id          Int       @id @default(autoincrement())
    name        String
    description String
    price       Decimal
    image       String
    category    Category? @relation(fields: [category_id], references: [id])
    category_id Int
}

model Category {
    id          Int       @id @default(autoincrement())
    name        String
    description String
    products    Product[]
}

Before you can run a seed script, you’ll need to push this schema to your database.npx prisma db push

Create the seed script
The prisma directory is a convenient place to include a seed script since this is where the schema.prisma file referenced above is located. Inside of this directory of the starter code, you’ll see a seed.js file. Notice also the data.js file which exports sample data that you will use when the seed script is run.
Although the seed script is finished in the starter repository, let’s break down the steps of how you would create it yourself from scratch. First, you’ll need to import the PrismaClient and the categories and products data. Then, you’ll need to generate a new client by calling PrismaClient().const { PrismaClient } = require('@prisma/client')
const { categories, products } = require('./data.js')
const prisma = new PrismaClient()

You will need to use the CommonJS syntax for imports and exports in your JavaScript files. This is different from the ECMAScript modules syntax you're used to using inside of a Next.js project. This is because this file is being run on its own, outside of the running Next.js application.
After you’ve got your imports, create a load() function. This where the actual database seeding will take place. Make sure to mark the function as async since you will use the await keyword inside of it. Also, don’t forget about error handling. Go ahead and add a try/catch/finally block inside of your function to handle errors and disconnect from your database after the seeding has completed.const load = async () => {
  try {
  } catch (e) {
    console.error(e)
    process.exit(1)
  } finally {
    await prisma.$disconnect()
  }
}
load()

With the load() function set up, you can start to add data to your database by passing the categories and products arrays to the appropriate createMany() function.await prisma.category.createMany({
  data: categories
})
console.log('Added category data')

await prisma.product.createMany({
  data: products
})
console.log('Added product data')

Your script should now be set up to add data, but one thing you’ll want to do first is delete any existing data. This way, you can verify that your database will be populated in exactly the same way each time it is seeded. Before the lines you just added for creating data, call deleteMany() for both tables.await prisma.category.deleteMany()
console.log('Deleted records in category table')

await prisma.product.deleteMany()
console.log('Deleted records in product table')

Lastly, the dummy data maintains a relationship between an individual product and its corresponding category with the category_id property. Because of this, this category_id property is prepopulated with the product records. However, since the id properties of products and categories are auto-incremented, you’ll need to manually reset them to 0. This will ensure that each category_id will correspond to the appropriate category record.
You can reset the auto-incremented values by calling the prisma.$queryRaw function and passing the appropriate SQL statement like so.await prisma.$queryRaw`ALTER TABLE Product AUTO_INCREMENT = 1`
console.log('reset product auto increment to 1')

await prisma.$queryRaw`ALTER TABLE Category AUTO_INCREMENT = 1`
console.log('reset category auto increment to 1')

Here’s what the full file looks like.const { PrismaClient } = require('@prisma/client')
const { categories, products } = require('./data.js')
const prisma = new PrismaClient()

const load = async () => {
  try {
    await prisma.category.deleteMany()
    console.log('Deleted records in category table')

    await prisma.product.deleteMany()
    console.log('Deleted records in product table')

    await prisma.$queryRaw`ALTER TABLE Product AUTO_INCREMENT = 1`
    console.log('reset product auto increment to 1')

    await prisma.$queryRaw`ALTER TABLE Category AUTO_INCREMENT = 1`
    console.log('reset category auto increment to 1')

    await prisma.category.createMany({
      data: categories
    })
    console.log('Added category data')

    await prisma.product.createMany({
      data: products
    })
    console.log('Added product data')
  } catch (e) {
    console.error(e)
    process.exit(1)
  } finally {
    await prisma.$disconnect()
  }
}

load()

Configure the seed command
There are a couple of different ways to configure your seed script to run.
Add a new script in the package.json
The first option is to define your own script inside of the package.json. Inside of the scripts section add the following line."seed": "node prisma/seed.js"`

This will enable you to run npm run seed to run your seed script. Go ahead and give it a try! You should see success log messages in your console.
Add a prisma.seed field in package.json
The second way to configure your seed script is to tap into the Prisma configuration in your package.json. For this to work you can add the following line at the top level of your package.json"prisma": {
  "seed": "node prisma/seed.js"
},

With that configuration added, you can now trigger your seed script by running npx prisma db seed. Give that a shot!
So far, this is a pretty similar result to what we had in the previous step. However, there is a bit more happening behind the scenes. Because the prisma.seed property is defined, Prisma will automatically run the seed command when either or the following commands are run: npx prisma migrate dev or prisma migrate reset.
Whether or not you want this to happen is totally up to you. Personally, I prefer to choose when the seeding should take place, so I would prefer the first option by configuring it in the scripts section.
Wrap up
Hopefully, this tutorial gave you a good understanding of how to automically populate your PlanetScale database by configuring a seed script with Prisma. If you have any additional questions, let us know on Twitter.]]></content>
        <summary><![CDATA[Use Prisma and Next.js to automatically populate your database with data.]]></summary>
      </entry>
    
      <entry>
        <title>Defining the database maturity model</title>
        <link href="https://planetscale.com/blog/defining-the-database-maturity-model" />
        <id>https://planetscale.com/blog/defining-the-database-maturity-model</id>
        <published>2022-02-10T13:00:00.000Z</published>
        <updated>2022-02-10T13:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[]]></content>
        <summary><![CDATA[Database issues experts have faced at companies like GitHub, DigitalOcean, and Etsy and define the stages of database growth that you should be aiming towards at each stage of your company.]]></summary>
      </entry>
    
      <entry>
        <title>Introduction to Laravel caching</title>
        <link href="https://planetscale.com/blog/introduction-to-laravel-caching" />
        <id>https://planetscale.com/blog/introduction-to-laravel-caching</id>
        <published>2022-02-09T16:30:00.000Z</published>
        <updated>2022-02-09T16:30:00.000Z</updated>
        
        <author>
          <name>Holly Guevara</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[What is caching
A common pain point in applications is optimizing and reducing the number of trips you have to take to the database. Say you have an e-commerce admin dashboard. Maybe you have a page that displays all inventory — every product, associated category, vendors, and more. A single page like this may perform dozens of calls to your database before the page can even display any data. If you don’t think about how to handle this, your application can quickly become slow and costly.
One option to reduce the number of times you have to go to the database is through caching. Caching allows you to store specific data in application memory so that next time that query is hit, you already have the data on hand and won’t have to go back to the database for it. Keep in mind, this is different from browser caching, which is user-based. This article covers application caching, which happens at the application level and cannot be cleared by the user.
Laravel has robust built-in functionality that makes caching a breeze.
Let’s see it in action!
Set up a database
For this demonstration, you will use a PlanetScale MySQL database to get a practice database up and running quickly. I promise this setup will be fast and painless!
Create a PlanetScale account.
Create a new database either in the onboarding flow or by clicking "New database" > "Create new database".
Give your database a name and select the region closest to you.
Select a cluster size and storage size.
Enter your payment information, then click "Create database".

Once it’s finished initializing, you’ll land on the Overview page for your database.

Click on the "Branches" tab and select the main branch. This is a development branch that you can use to modify your schema.
PlanetScale has a database workflow similar to the Git branching model. While developing, you can:
Create new branches off of your main branch
Modify your schema as needed
Create a deploy request (similar to a pull request)
Merge the deploy request into main

Leave this page open, as you’ll need to reference it soon.
Set up Laravel app
Next, let’s set up the pre-built Laravel 9 application. This comes with a simple CRUD API that displays random bogus sentences (we’re going to think of them as robot quotes) along with the quote author's name. The data for both of these columns are auto-generated using Faker. There is currently no caching in the project, so you’ll use this starter app to build on throughout the article.
For this tutorial, you’ll use the default file-based cache driver, meaning the cached data will be stored in your application's file system. This is fine for this small application, but for a bigger production app, you may want to use a different driver. Fortunately, Laravel supports some popular ones, such as Redis and Memcached.
Before you begin, make sure you have PHP (this article is tested with v8.1) and Composer (at least v2.2) installed.
Clone the sample application:git clone -b starter https://github.com/planetscale/laravel-caching

Install the dependencies:composer install

Copy the .env.example file to .env:mv .env.example .env

Next, you need to connect to your PlanetScale database. Open up the .env file and find the database section. It should look like this:DB_CONNECTION=mysql
DB_HOST=<ACCESS HOST URL>
DB_PORT=3306
DB_DATABASE=<DATABASE_NAME>
DB_USERNAME=<USERNAME>
DB_PASSWORD=<PASSWORD>
MYSQL_ATTR_SSL_CA=/etc/ssl/cert.pem

For DB_DATABASE, you can use your PlanetScale database name directly if you have a single unsharded keyspace. If you have a sharded keyspace, you'll need to use @primary. This will automatically direct incoming queries to the correct keyspace/shard. For more information, see the Targeting the correct keyspace documentation.
Go back to your PlanetScale dashboard to the main branch page for your database.
Click "Connect" in the top right corner.
Click "Generate new password".
Select "Laravel" from the dropdown (it’s currently set to "General").
Copy this and replace the .env content highlighted in Step 4 with this connection information. It’ll look something like this:DB_CONNECTION=mysql
DB_HOST=xxxxxxxx.xx-xxxx-x.psdb.cloud
DB_PORT=3306
DB_DATABASE=xxxxxxxx
DB_USERNAME=xxxxxxxxxxxxx
DB_PASSWORD=pscale_pw_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYSQL_ATTR_SSL_CA=/etc/ssl/cert.pem

Make sure you save the password before leaving the page, as you won’t be able to see it again.
Your value for MYSQL_ATTR_SSL_CA may differ depending on your system.
Run the migrations and seeder:php artisan migrate
php artisan db:seed

Start your application:php artisan serve

You can now view your non-cached data from the PlanetScale database in the browser at http://localhost:8000/api/quotes.
Project structure overview
Before diving in, let’s explore the project and relevant files.
Quote controller
The sample application has a single controller, app/Http/Controllers/QuoteController.php, that has methods to display, update, create, and delete quotes. Since this is an API resource controller, you don’t need to include the usual show and edit controllers that only return views.
You’ll take a closer look at this soon once you add caching, but right now, nothing is cached.
Quote model
There’s also a model, app/Models/Quote.php, where you can define how Eloquent interacts with the quotes table. Since you only have one table in this application, there are no relationships or interactions, so the model is pretty barebones right now:class Quote extends Model
{
    use HasFactory;

    public $timestamps = FALSE;
    protected $fillable = [
        'text',
        'name'
    ];

}

You’ll revisit it soon, though, once you implement caching.
Quote migration
Next up is the initial quotes migration file, database/migrations/2022_01_215158_create_quotes_table.php. When you ran the migrations in the previous step, this file created the quotes table with the specified schema:public function up()
{
    Schema::create('quotes', function (Blueprint $table) {
        $table->id();
        $table->text('text');
        $table->string('name');
    });
}

Quote factory and seeder
Finally, there’s a factory and seeder. The factory, database/factories/QuoteFactory.php, uses Faker to create mock sentence and author data. The seeder, database/seeders/DatabaseSeeder.php, then runs this factory 100 times to create 100 rows of this Faker-generated data in the quotes table.public function definition()
{
    return [
        'text' => $this->faker->realText(100, 3),
        'name' => $this->faker->name()
    ];
}

When you ran php artisan migrate and php artisan db:seed in the previous steps, these are the files that were ran.
Queries without caching
Before you add caching, it’s important to see how the application currently perform so you know the effect that caching has. And how will you do that if you don’t know what your query performance was before adding caching?
Let’s run some queries and see how long they take to complete.
Get all data
Open up the app/Http/Controllers/QuoteController.php file and go to the index() method. Replace it with:public function index()
{
    $startTime = microtime(true); // start timer
    $quotes = Quote::all(); // run query
    $totalTime = microtime(true) - $startTime; // end timer

    return response()->json([
        'totalTime' => $totalTime,
        'quotes' => $quotes
    ]);
}

The PHP function, microtime(true), provides an easy way to track the time before and after the query. You can also use an API testing tool like Postman to see the time it takes to complete.
Let’s call the API endpoint to check how long it currently takes to pull all of this data from the database.
Open or refresh http://localhost:8000/api/quotes in your browser. You’ll now see a totalTime value that displays the total time in seconds that it took to execute this query.
The total time will fluctuate, but I’m personally getting anywhere between 0.9 seconds and 2.3 seconds!
Of course, you'd want to paginate or chunk your data in most cases, so hopefully, it wouldn’t take several seconds to grab in the first place. But caching can still greatly reduce the time it takes to get data from this endpoint after the initial hit.
Let’s add caching now.
Add caching to your Laravel app
Open up app/Http/Controllers/QuoteController.php, bring in the Cache facade at the top of the file, and replace the $quotes = Quote::all(); in index() with:// ...
use Illuminate\Support\Facades\Cache;

// ...
public function index() {
    // ...
    $quotes = Cache::remember('allQuotes', 3600, function() {
        return Quote::all();
    });
    // ...
}
// ...

Now let’s hit that API endpoint again. Refresh the page at http://localhost:8000/api/quotes. If this is your first time running the call, you’ll have to refresh again for the caching to take effect.
Check out the new time I’m getting: 0.0006330013275146484 seconds!
Before caching, this exact same query took between 0.9 seconds and 2.3 seconds. Incredible, right?
Even though this seems to be a massive improvement on the surface, there are still some issues that you need to tackle. Let’s first dissect the Cache::remember() method and then go over some gotchas with this addition.
If at any time you need to clear the cache manually while testing, you can run the following in your terminal:php artisan cache:clear

Caching with remember()
The Cache::remember() method first tries to retrieve a value from the cache. If that value doesn’t exist, it will go to the database to grab the value, and then store it in the cache for future lookups. You will specify the name of the value and how long it stores it, as shown below:$quotes = Cache::remember('cache_item_name', $timeStoredInSeconds, function () {
    return DB::table('quotes')->get();
});

This method is super handy because it does several things at once: checks if the item exists in cache, grabs the data if not, and stores it in the cache once grabbed.
If you prefer just to grab the value from cache and do nothing if it doesn’t exist, use:$value = Cache::get('key');

If you want to grab the value from cache and pull it from the database if it doesn’t exist, use:$value = Cache::get('key', function () {
    return DB::table(...)->get();
});

This one is similar to remember(), except it doesn’t store it in the cache.
Inconsistent data in the cache
So what are the problems that you need to deal with? Let’s see one of them in action.
Refresh the [http://localhost:8000/api/quotes](http://localhost:8000/api/quotes) page in the browser one more time to make sure the cache hasn’t expired. Now, add a new record to the quotes table by pasting the following in your terminal:curl -X POST -H 'Content-Type: application/json' -d '{
  "text": "If debugging is the process of removing software bugs, then programming must be the process of putting them in.",
  "name": "Edsger Dijkstra"
}' http://localhost:8000/api/quotes -i

You should get a HTTP/1.1 200 OK response along with the newly added record. Now go back to your Quotes page in the browser and refresh. The new data you added isn’t there! That’s because you just wrote this item to the database, but you're not actually going to the database to retrieve it. The cache has no idea it exists.
You can confirm it was added to the database by going back to your PlanetScale dashboard, select the database, click "Branches", and click "Console".
Run the following command and you should see 101 records:select * from quotes;


Scroll to the bottom and you’ll see the newly added quote. You can also query it directly by id:select * from quotes where id=101;


Solving the write problem
If it’s important for your application to always show the most up-to-date data, one quick way to fix this is using the Quote model's booted() method.
Open up app/Models/Quote.php and replace it with:<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Facades\Cache;

class Quote extends Model
{
    use HasFactory;

    public $timestamps = FALSE;
    protected $fillable = [
        'text',
        'name'
    ];

    protected static function booted()
    {
        static::saving(function() {
            Cache::forget('allQuotes');
        });
    }
}

The booted() method allows you to perform some action when a model event fires. In this case, when saving() is called, you're instructing Eloquent to forget the cache item named allQuotes.
Let’s test it out. Refresh your quotes page in the browser one more time to make sure everything is cached, then add another quote with:curl -X POST -H 'Content-Type: application/json' -d '{
  "text": "Sometimes it pays to stay in bed on Monday, rather than spending the rest of the week debugging Mondays code.",
  "name": "Dan Salomon"
}' http://localhost:8000/api/quotes -i

Now refresh again, and you’ll see the time to make this query has increased significantly, indicating that the database has been hit. You’ll also now see the new entry in the output! With this, you can be confident that the displayed quotes are always up-to-date.
The saving() method works on both new and updated records. However, if you delete something, the cache won’t be cleared. You can use the deleted() method to handle that.
First, try to delete something (replace the last number with the id of the item you want to delete):curl -X DELETE http://localhost:8000/api/quotes/1 -i

Refresh the quotes page, and the item should still be there. Now, go back to your Quote model and add this in the boot() method underneath saving():static::deleted(function() {
    Cache::forget('allQuotes');
});

Run that cURL command one more time, but with a different id, to trigger a delete event. Refresh again and those records should now be gone!
More Laravel caching strategies
As you’ve seen, when it comes to caching, you're going to have to think about a few things specific to your application before diving in.
Retrieving data
How long do you want to store the data in cache? The answer to this depends on the data. Is this data relatively static? Or does it change a lot? When a set of data is requested, how important is it that it’s always up to date? These questions will help you decide how long to cache the data and if you should cache it at all.
Storing/updating/deleting data
This tutorial covered caching retrieved data. But you can also store new or updated data in the cache as well. This is called write caching. Instead of going to the database every time you need to add or update something, you store it in the cache and eventually update it all at once.
Deciding the time at which you update actually leads to more questions. What if you wait too long and there’s a system failure? All of the cached data is gone. What if someone makes a request to see a product price, but that price has changed, and you’ve been storing the update in cache? The data between the cache and database is inconsistent, and the user won’t see the latest price, which will cause problems.
The solution to write caching will depend on your application's needs. Write-back and write-through caching are some options. Check out this excellent primer for more information.
Conclusion
As you’ve seen, caching can immensely speed up your application, but not without caveats. When implementing caching, it’s important to think about how often your data will be accessed and how important immediate data consistency (from the user's perspective) is in your application.
Hopefully, this guide has shown you how to get started with Laravel caching. Make sure you check out the Laravel Cache Documentation for more information. And if you're interested in learning more about PlanetScale’s next-level features like database branching and deploy requests, check out our documentation. You can find the final code complete with caching in this GitHub repo. Thanks for reading!]]></content>
        <summary><![CDATA[Learn how to speed up your Laravel applications with caching.]]></summary>
      </entry>
    
      <entry>
        <title>Using the PlanetScale CLI with GitHub Actions workflows</title>
        <link href="https://planetscale.com/blog/using-the-planetscale-cli-with-github-actions-workflows" />
        <id>https://planetscale.com/blog/using-the-planetscale-cli-with-github-actions-workflows</id>
        <published>2022-02-03T16:11:19.285Z</published>
        <updated>2022-02-03T16:11:19.285Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Branching your database schema can be helpful when needing to develop and test changes in an isolated environment separate from your production database. Branching in PlanetScale makes it a reality, but what if you want to integrate this workflow deeper into your existing development and testing workflows in GitHub?
In this blog post, you will learn how to increase your database productivity and set up and use PlanetScale CLI (also known as pscale) with GitHub Actions workflows.
First, what are GitHub Actions?
Just in case you don’t know, GitHub Actions “is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline.” Most importantly, GitHub Actions are closely tied to your GitHub code repository itself, making it easier to trigger based on events in your repository. Think of events such as opening a pull request, creating a branch, and other events that can trigger a workflow listed in the documentation.
GitHub Actions makes it possible to create, share, reuse, and fork workflows across teams. In GitHub Actions, a workflow “is a configurable automated process made up of one or more jobs.”
GitHub Actions workflows aim to help teams deliver software quickly and reliably by integrating different tools. In PlanetScale’s case, we want teams to increase database productivity. So, let’s see how you can implement workflow automation with your database and code repository!
Setting up the pre-built GitHub Actions workflows repository
If you are interested in getting a feel for how pscale can work with GitHub Actions, you can check out this repository filled with pscale workflow helper scripts built by Johannes Nicolai at PlanetScale. If you have any feedback or improvement suggestions, please add an issue or pull request to the repository.
These pre-built GitHub Actions workflows can do all sorts of workflows, such as:
Create a database and create a table
Create a database branch and deploy request
Merge the latest open deploy request
Add and remove a column to the schema and an index
React on “magic words” in pull request comments like “/ps-create”, “/ps-attach” or “/ps-merge”
If you want to try it out, follow these steps below to set it up in GitHub.
Steps to try out the pre-built PlanetScale and GitHub Action workflows
First, click the green “Use this template” button in the repository and name the new repository whatever you want.
Once the repository is created, go to the "Actions" tab.
01 - Create Database
This workflow will create the database.
Click "01 - Create Database step" in the sidebar.
Select the “Run workflow” dropdown and click "Run."
As it runs, click on the new workflow in the main section, "01 - Create Database."
Click "Create database - click here."
It will eventually prompt you to click a link to authenticate with PlanetScale. You can log in to an existing account or set up a new PlanetScale account with this link.

02 - Add Operation Column & Index
This workflow will create a new database branch with a new column and index. It will also open a deploy request, which will be discussed soon.
Once the database creation step is complete, click the "Actions" tab and move on to "02 - Add Operation Column & Index."
You will again need to click twice into the workflow run to authenticate with PlanetScale. (Later in this blog post, we will teach you how to use service tokens so this is automated.)
After this workflow has run, you can look at the updated schema in the PlanetScale branch to see the column and index added to the schema. To see this, go to https://app.planetscale.com and go to your database, in the "Branches" tab click on the add-operation-column-and-index branch, and select the "Schema" tab. You can see that the workflow has added a column and index to the SQL schema.

03 - Merge latest Deploy Request
This workflow will merge the deploy request that was created in the previous workflow.
Deploy requests allow you to propose schema changes and get feedback from your team, similar to pull requests in GitHub. The order of when you merge a database deploy request and code pull requests often depends on the type of change.
To run this workflow, refer to the steps in the previous workflow, as they are very similar.
At the end of the workflow, you can see your schema change has been deployed in PlanetScale. You can also view the deployed change in PlanetScale by going to your database and the Deploy request tab.
(Optional) If you want to see a column and index removed, you can follow through with "04 - Del Operation Column & Index" following similar steps as above.

How to build your own GitHub Actions workflows with pscale CLI
Now, if you want to build your own GitHub Actions workflows with pscale to customize them to your development and operations workflows, there are a few key things you need to know about:
Workflow files, written in YAML, located in .github/workflows/
When an event triggers a workflow in GitHub Actions, this file describes the steps. In this example, it describes what scripts should be run and when.
Shell scripts, located in .pscale/cli-helper-scripts/
These Bash shell scripts work closely with pscale. They handle logic, different CLI commands, error handling, waiting for asynchronous operations, and are reusable across GitHub Actions workflows.
Prerequisites: You already have a PlanetScale account and one database created. If you followed the steps above, you could use the same database too. You can also delete the database from above and create a new one or use an existing database.
In the following steps, we will create a GitHub Action workflow that is triggered when you create (or push) a git branch in GitHub that starts with db/ and automatically create a database branch in PlanetScale. This will allow you to have an isolated development and testing environment for your database changes.
Steps to create your own GitHub Action workflow with pscale CLI
Create a new GitHub repository or use an existing repository.
In your repository, create the workflow file .github/workflows/create-branch.yml. Add the following code to your file. You can see in the workflow file that it is triggered when a branch that starts with db/** is pushed to GitHub. You will see the create_database_branch job has steps that extract the branch name, validate the name, check out the GitHub repository, and then create the database branch.name: Create database branch

env:
  pscale_base_directory: .pscale

on:
  push:
    branches:
      - 'db/**'

jobs:
  create_database_branch:
    name: Create database branch
    runs-on: ubuntu-latest
    steps:
      - name: Extract branch name
        shell: bash
        run: echo "##[set-output name=branch;]${GITHUB_REF#refs/heads/}"
        id: extract_branch

      - name: Validate parameters
        id: validate_params
        uses: actions/github-script@v3
        env:
          BRANCH_NAME: ${{ github.event.inputs.branch }}
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const branch_name = process.env.BRANCH_NAME || "${{steps.extract_branch.outputs.branch}}";

            const regex = /[^\/]+$/;
            let clean_branch_name;

            if (branch_name.match(regex)) {
              clean_branch_name = branch_name.match(regex)[0];
            } else {
              clean_branch_name = branch_name;
            }

            if (! /^[a-zA-Z0-9_-]+$/.test(clean_branch_name)) {
              const error = `The branch name contains illegal characters: ${clean_branch_name}`;
              core.error(error);
              core.setFailed(error);
            }
            core.setOutput('branch_name', clean_branch_name);

      - name: Checkout
        uses: actions/checkout@v2

      - name: Create database branch — if asked, please click on displayed link to authenticate
        id: create-db-branch
        timeout-minutes: 3
        env:
          PLANETSCALE_SERVICE_TOKEN_ID: ${{secrets.PLANETSCALE_SERVICE_TOKEN_ID}}
          PLANETSCALE_SERVICE_TOKEN: ${{secrets.PLANETSCALE_SERVICE_TOKEN}}
          ORG_NAME: ${{secrets.ORG_NAME}}
          DB_NAME: ${{secrets.DB_NAME}}
          GITHUB_USER: ${{github.actor}}
          BRANCH_NAME: ${{ steps.validate_params.outputs.branch_name }}
        working-directory: ${{env.pscale_base_directory}}/cli-helper-scripts/
        run: |
          ./create-branch.sh "$BRANCH_NAME"

Create the shell script file .pscale/cli-helper-scripts/create-branch.sh. Add the following code to the file:#!/bin/bash

. use-pscale-docker-image.sh
. wait-for-branch-readiness.sh

. authenticate-ps.sh

BRANCH_NAME="$1"

. ps-create-helper-functions.sh
create-db-branch "$DB_NAME" "$BRANCH_NAME" "$ORG_NAME"

The shell script uses some helper scripts that you will add in the next step. If you are working locally, make sure to git commit and git push after you create this file. You can also create it in the GitHub UI.
Add the following files, follow the links for a copy of the files:
.pscale/cli-helper-scripts/use-pscale-docker-image.sh
The workflow uses pscale as a Docker image. This shell script file sets up pscale for the following scripts. It will also make sure you are always using the latest version of the pscale CLI and does not need any package manager installed.
.pscale/cli-helper-scripts/wait-for-branch-readiness.sh
When first created, database branches in PlanetScale take a few seconds to be useable. This shell script makes sure it is ready to be used while the workflow is running.
.pscale/cli-helper-scripts/authenticate-ps.sh
This shell script authenticates with PlanetScale using your PLANETSCALE_SERVICE_TOKEN. If it is not set, it will ask you to authenticate manually as the workflow runs.
.pscale/cli-helper-scripts/ps-create-helper-functions.sh
This is the largest of the shell scripts. It contains scripts you might want to use across multiple different scripts, such as creating a database branch, making a schema change, creating a deploy request, and more. For this workflow, we are only using the create-db-branch function, but you can copy the whole file for later use.
Important: If you are working locally, make sure to git commit and git push after you create this file. You can also create it in the GitHub UI.
The shell script needs “execute” file system permission set to run. You can update the permissions in the command line by running the following git commands:git update-index --chmod=+x .pscale/cli-helper-scripts/create-branch.sh
git commit -m ‘update file permissions’
git push

If you want to know more, see the GitHub Community Forum post on the permission denied error. The git update-index command only works if the files have been previously pushed. They will also need to be updated again if you commit any changes locally. If you plan on making a lot of changes to this file locally, you can run git config core.fileMode false to change your git configuration so that git will honor the executable bit of files. Otherwise, you can go onto the next step.
To authenticate with PlanetScale, we will use GitHub Actions built-in secret store. In GitHub, go to the Settings tab in your code repository, followed by Secrets in the left navigation, then Actions. Select the “New repository secret” button and add each of the following as separate secrets:
PLANETSCALE_SERVICE_TOKEN: In PlanetScale, you will need to create a service token. Go to https://app.planetscale.com and log in. Go to the "Settings" tab for your organization, then "Service tokens", and select the "New service token" button. Give it a name and copy the token that’s returned.
PLANETSCALE_SERVICE_TOKEN_ID: Click on "Edit token permissions", which will bring you to the token overview page. Copy the value next to "ID." Make sure to click the "Add database access" button and select your database. Check both the “branch” and “deploy_request” checkboxes. You can decrease permissions later or delete the token after testing.

ORG_NAME: Your organization name in PlanetScale, you can find it in the PlanetScale web application URL: https://app.planetscale.com/<your-organization-name>/demo-db.
DB_NAME: Your database name in PlanetScale
You are ready to run the action! Push (or create) a branch in your GitHub repository that starts with db/. For example, I want to develop a new feature that will require database changes. I would name my branch db/new-feature, and my PlanetScale branch will be called new-feature. Go back to your repo and under "Actions" you’ll see the new workflow running!

Extra credit! (This is an optional step.) Once you have tried automatically creating database branches, what if you wanted to open a deploy request in PlanetScale when the branch is created, so it is ready for when you want to merge a database change?
If you look in ps-create-helper-functions.sh you can find this function:function create-deploy-request {
    local DB_NAME=$1
    local BRANCH_NAME=$2
    local ORG_NAME=$3

    local raw_output=`pscale deploy-request create "$DB_NAME" "$BRANCH_NAME" --org "$ORG_NAME" --format json`
    if [ $? -ne 0 ]; then
        echo "Deploy request could not be created: $raw_output"
        exit 1
    fi
    local deploy_request_number=`echo $raw_output | jq -r '.number'`
    # if deploy request number is empty, then error
    if [ -z "$deploy_request_number" ]; then
        echo "Could not retrieve deploy request number: $raw_output"
        exit 1
    fi

    local deploy_request="https://app.planetscale.com/${ORG_NAME}/${DB_NAME}/deploy-requests/${deploy_request_number}"
    echo "Check out the deploy request created at $deploy_request"
    # if CI variable is set, export the deploy request URL
    if [ -n "$CI" ]; then
        echo "::set-output name=DEPLOY_REQUEST_URL::$deploy_request"
        echo "::set-output name=DEPLOY_REQUEST_NUMBER::$deploy_request_number"
        create-diff-for-ci "$DB_NAME" "$ORG_NAME" "$deploy_request_number" "$BRANCH_NAME"
    fi
}

This function will create a deploy request in PlanetScale and then export the deploy request URL and deploy request number to the GitHub Action output. If you are running this action in a CI environment, it will also create a diff for the deploy request.
You can add this to your .pscale/cli-helper-scripts/create-branch.sh at the end like this:create-deploy-request "$DB_NAME" "$BRANCH_NAME" "$ORG_NAME"

If you don’t have your main branch in PlanetScale promoted to production, you need to do this before rerunning the workflow.
And then push (or create) a new db/** branch in GitHub to rerun this.
What GitHub Action workflows would you like to see?
We want to hear from you! Your ideas might appear in a future blog post or example! Now that you have an idea of what it is like to automate workflows with GitHub Actions and the PlanetScale CLI, what are some workflows you would like to see built?
What manual steps do you do with your databases that you wish were automated while benefiting from branching, deploy requests, and non-blocking schema changes?
What workflows do you want to see based on triggers in GitHub issues and pull requests?
We would love to hear your feedback! Tweet at us @planetscale to tell us what you would like to build or see with PlanetScale and automated workflows.]]></content>
        <summary><![CDATA[Learn how to build automated workflows to develop and operate PlanetScale databases with GitHub Actions.]]></summary>
      </entry>
    
      <entry>
        <title>Using entropy for user-friendly strong passwords</title>
        <link href="https://planetscale.com/blog/using-entropy-for-user-friendly-strong-passwords" />
        <id>https://planetscale.com/blog/using-entropy-for-user-friendly-strong-passwords</id>
        <published>2022-01-24T16:57:52.732Z</published>
        <updated>2022-01-24T16:57:52.732Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Signup forms with specific and archaic password rules can be incredibly frustrating for anyone using a password manager. "Must have one special character, but only from !@#$ these allowed characters." It attempts to help users be more secure but causes more frustration than security.
When building PlanetScale’s signup form, we wanted to enforce strong passwords while also working well with password managers. After researching the problem, we found the best method for doing this is using an entropy-based password strength calculation.
Entropy-based password strength
A password’s strength can be defined by how many attempts it would take for a computer to successfully guess it.
2^(entropy) = number of attempts needed to crack the password
If you have a computer that makes 1000 attempts per second, you can get an estimate of how long your password would stand up to an attack.
Weak password example
Password: mike1
This simple password has an entropy of ~16. With 1000 attempts per second, it could be cracked in just 60 seconds.
2^16 / 1000 = ~60 seconds
Strong password example
Password: cTzk9*R6-uf9
This password would generally satisfy most websites' password rules. It has numbers, special characters, upper and lower case.
2^61 / 1000 = 36.5 million years
Long and strong password example
Password: mikemikemikemikem
This password has similar entropy to the above password but would fail most common password requirements.
2^61 / 1000 = 36.5 million years
Entropy is better than specific rules
The two last examples highlight the frustration of having specific password rules. Both cTzk9*R6-uf9 and mikemikemikemikem have ~61 bits of entropy, making them very difficult to guess.
The mikemikemikemikem example would never satisfy most password strength rules.
We still consider the first password stronger because it does not contain dictionary words (and more importantly, my own name!). But this does not mean the second password isn’t secure enough for most applications.
Slowing down attempts
A clear lesson we can learn from these examples is that the more attempts an attacker can make, the faster they can crack a password. There are a couple of steps you can take to improve your applications' defense against these attacks.
Add rate limits to your login form. This is a simple way to protect against a program using your password form to check password validity.
Use a password hashing algorithm for storing your passwords. These algorithms are purposefully slow to reduce how quickly attempts can be made. We chose to use Argon2.
Dictionary words and past breaches
Beyond an entropy check, there are other methods you should also consider when designing strong password validation.
Dictionary words
Password crackers will often use a prepared list of actual words to increase the speed they can crack a password. The strong_passwords gem includes a dictionary word list and adjusts the password strength rating when actual words are used.
Breached passwords
A strong password is useless if previously released as part of a password breach. Integrating the haveibeenpwned API with your password validation is an additional way to protect your users from this.
Better signup forms
Now that we know about password strength and entropy, we can use this knowledge to improve our applications’ signup experience.
Instead of showing a list of rules and validating against them, you can instead implement a password meter that measures the password’s entropy and provides feedback to the user as they type it in. It plays nicely with password managers and also gives users quick feedback if they manually type in a password.
Here is how we implemented this for PlanetScale’s signup form:

How we built it
Our authorization pages are in a Ruby on Rails application. We used the strong_password gem + the auto-check-element web component to quickly give users feedback on their password strength as they type it.
The password form UI
We wrapped our existing password form with auto-check-element. This web component posts the value of the password to our backend as the user types. Our backend then calculates the password’s entropy and returns the rendered meter SVG to show the user their progress.<auto-check csrf="<%= form_authenticity_token %>" src="/users/password-strength" required="">
  <div class="mb-1.5 flex items-center justify-between leading-none">
    <%= f.label(:password, "New password", class: "mb-0") %>
    <div class="js-password-strength-container" aria-live="polite"></div>
  </div>
  <%= f.password_field(:password, class: "js-password-strength", autofocus: true, autocomplete: "new-password",
  required: true) %>
</auto-check>

This form element also has a tiny bit of extra JavaScript added to update the meter after each check.<% # Make 10% the min so _some_ red appears. # For values > 90, keep arc slight unclosed. strength = 10 if strength < 10
strength = 90 if (strength > 90) && (strength < 100) radius = 40 perimeter = Math::PI * radius * 2 stroke_dashoffset =
(perimeter - (perimeter * strength)) / 100 arc_color = case strength when 0..33 "rgba(var(--red-500))" when 33..66
"var(--orange-500)" when 66..100 "var(--yellow-500)" end %>

<span class="flex items-center text-sm">
  <span class="mr-sm text-secondary"><%= strength == 100 ? "Strong" : "Too weak" %></span>
  <svg width="14" height="14" viewBox="0 0 100 100">
    <circle cx="50" cy="50" r="<%= radius %>" stroke="var(--border-action)" stroke-width="15" fill="transparent" />
    <% if strength == 100 %>
    <circle
      cx="50"
      cy="50"
      r="<%= radius %>"
      stroke="rgba(var(--green-600)"
      stroke-width="15"
      fill="rgba(var(--green-600))"
    />
    <path
      fill-rule="evenodd"
      clip-rule="evenodd"
      d="M72.3156 38.0312C73.6824 36.6644 73.6824 34.4483 72.3156 33.0815C70.9488 31.7146 68.7327 31.7146 67.3659 33.0815L41.5565 58.8909L33.4247 50.7591C32.0579 49.3923 29.8418 49.3923 28.475 50.7591C27.1082 52.126 27.1082 54.3421 28.475 55.7089L39.0816 66.3155C40.4484 67.6823 42.6645 67.6823 44.0313 66.3155C44.1514 66.1955 44.2609 66.0689 44.3598 65.9369C44.4918 65.8379 44.6184 65.7284 44.7384 65.6084L72.3156 38.0312Z"
      fill="white"
    />
    <% else %>
    <circle
      cx="50"
      cy="50"
      r="<%= radius %>"
      stroke="<%= arc_color %>"
      stroke-width="15"
      fill="transparent"
      stroke-dasharray="<%= perimeter %>"
      stroke-dashoffset="<%= stroke_dashoffset %>"
      transform="rotate(-90 50 50)"
    />
    <% end %>
  </svg>
</span>

Password strength calculation
The controller code determines the password strength percentage and then renders the meter.def create
  checker = User.password_checker
  entropy = checker.calculate_entropy(params[:value] || "")
  percentage = (entropy / STRONG_ENTROPY) * 100

  percentage = 100 if percentage > 100

  render(partial: "users/shared/password_strength_meter", locals: { strength: percentage.to_i })
end

Learn more on how to implement entropy-based password forms
Give it a try yourself by playing around with our sign up form: https://auth.planetscale.com/sign-up
We found these resources useful when implementing our password strength meter:
How to calculate password strength
Strong_password gem
Password strength test tool]]></content>
        <summary><![CDATA[When implementing user authentication with passwords throw out the password rules you know.]]></summary>
      </entry>
    
      <entry>
        <title>How to set up Next.js with Prisma and PlanetScale</title>
        <link href="https://planetscale.com/blog/how-to-setup-next-js-with-prisma-and-planetscale" />
        <id>https://planetscale.com/blog/how-to-setup-next-js-with-prisma-and-planetscale</id>
        <published>2022-01-20T20:19:00.000Z</published>
        <updated>2022-01-20T20:19:00.000Z</updated>
        
        <author>
          <name>Camila Ramos</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[What is PlanetScale?
PlanetScale is a database-as-a-service platform that offers Postgres and open-source Vitess clusters, the technology that powers YouTube, Slack, and other hyperscale companies with the ability to make millions of queries per second and an infinite number of connections.
What is Prisma?
Prisma is an open-source ORM that allows developers to write queries in Javascript/Typescript. Prisma offers three products:
Prisma Client: a type-safe API that lets you write queries without writing SQL.
Prisma Migrate: a database migration tool.
Prisma Studio: a visual editor for the data in your database.
In this blog, we’ll be working with Prisma Client and Prisma Studio.
PlanetScale with Next.js
We’ll walk through setting up a new database, installing Prisma, defining your data models, and writing the API route to write to your database. In this example, I will be showing how you can get data from your user via a form and save it into your database.
Creating your database
Head over to PlanetScale.com and create an account.
Create a new database and select the closest region to your physical location. This will help reduce latency.
Select your cluster and storage size.
Setting up Prisma
If you already have an existing Next app, skip this step. Create a new Next app using npx create-next-app@latest . If you want to follow this exact project, create a form component and render it in your index.js file. Run npm run dev to start your local server and navigate to the URL your terminal suggests, usually http://localhost:3000/.
Open a terminal in your project directory and run the following command to generate a Prisma folder as well as a .env file: npx prisma init
Go into your .env file and update your DATABASE_URL variable with the following:
DATABASE_URL="mysql://root@127.0.0.1:3309/YOUR-DB-NAME-HERE"
Defining your schema
Now in your schema.prisma file, update your data source and client to the following:generator client {
  provider = "prisma-client-js"
  previewFeatures = ["referentialIntegrity"]
}

datasource db {
  provider = "mysql"
  url = env("DATABASE_URL")
  referentialIntegrity = "prisma"
}

Because PlanetScale doesn't recommend foreign key constraints and Prisma defaults to using foreign keys to express relations, we need to set this referentialIntegrity property when using Prisma with PlanetScale. If you'd prefer to use foreign key constraints, you can skip this step, but you need to enable foreign key constraint support for your database in your PlanetScale database settings page.
“Referential integrity is a property of a data set that states that all its references are valid. Referential integrity requires that if one record references another, then the referenced record must exist. For example, if a Post model defines an author, then the author must also exist.“ (Source: Prisma docs)  To read more about referential integrity, check out these Prisma docs.
In this same file you should define your data models, in this case Inquiry. This model will store the inquirer’s name, email, subject of the inquiry and message. I’ve made the subject field optional and denoted this by using a ? on the field.
You should also add an id, and you can use the default and id attribute. The @id attribute defines a single-field ID on the model. The Default attribute defines a default value for the field, and you can pass in autoincrement to create a sequence of integers and assign the incremented values to the ID value.
Install the Prisma VS Code extension to get syntax highlighting and autocompletion for your schema file.model Inquiry {
id      Int   @default(autoincrement()) @id
name    String
email   String
subject String?
message String
}

Running your database locally
Once you’ve defined your data model, open a new terminal in your project directory and run this command: pscale connect YOUR-DB-NAME-HERE main --port 3309 This command will run a local proxy to your database, which allows a simpler way to connect to your database when running your app locally. Instead of having to make multiple connection strings for different database branches you are working in, you can just change the branch in one command line argument.
For this step, ensure that the main branch hasn’t been promoted to production. In the next step, we will synchronize our Prisma schema and our database schema, and you can’t make schema changes to a production branch. You don’t have to worry about this if you’ve been following this guide. You can read more about branching with PlanetScale here.
In a new terminal, run this command to sync your prisma.schema with your PlanetScale schema: npx prisma db push 
You should see a success message similar to this:

To verify that your database is in sync with your schema, as well as run any SQL commands, run: pscale shell prisma-playground main
Run this line and replace Inquiry with an entity you defined in your schema.:
describe Inquiry; //Don’t forget the semicolon here.

You can exit the MySQL shell by typing exit and hitting enter.
Now that you have your schema, promote your branch to production: pscale branch promote YOUR-DB-NAME-HERE main

Creating your API route
Now, you can create an API route by creating a file inside your API folder in your Next.js app. Create a file in your API folder with a descriptive name. Here you will set up your Prisma client and define your function for handling requests. I named my file Inquiry.js and here’s what it looks like:import { PrismaClient } from '@prisma/client'

const prisma = new PrismaClient()

export default async function handler(req, res) {
  if (req.method === 'POST') {
    return await createInquiry(req, res)
  } else {
    return res.status(405).json({ message: 'Method not allowed', success: false })
  }
}

async function createInquiry(req, res) {
  const body = req.body
  try {
    const newEntry = await prisma.inquiry.create({
      data: {
        name: body.firstName,
        email: body.email,
        subject: body.subject,
        message: body.message
      }
    })
    return res.status(200).json(newEntry, { success: true })
  } catch (error) {
    console.error('Request error', error)
    res.status(500).json({ error: 'Error creating question', success: false })
  }
}

To learn more about how Next.js API routes work check out the documentation.
First we define and instantiate our Prisma client.import { PrismaClient } from '@prisma/client'
const prisma = new PrismaClient()

We create a function called Handler which checks that the request method is POST, then calls the function which actually writes to our database, createInquiry. If the method is not POST, it sends back a response with a status code of 405 and a message letting the user know that this operation is not allowed.export default async function handler(req, res) {
  if (req.method === 'POST') {
    return await createInquiry(req, res)
  } else {
    return res.status(405).json({ message: 'Method not allowed', success: false })
  }
}

We define an asynchronous function, createInquiry, which will take the data in the request body and send it to our database.
Wrapped in a try/catch block, we define a new variable newEntry and create a new entry in our Inquiry table.
We define what properties in the body contain the pieces of data we are looking for and assign those to the fields we want to create in our database: name, email, subject, and message. If this is successful, we return a response status code of 200. If it was not successful, we log the error and respond with a server code of 500, signaling to the user that there was some unknown error.
This is beyond the scope of this blog, but in your application you should always validate the user's input before writing to the database.async function createInquiry(req, res) {
  const body = req.body
  try {
    const newEntry = await prisma.inquiry.create({
      data: {
        name: body.firstName,
        email: body.email,
        subject: body.subject,
        message: body.message
      }
    })
    return res.status(200).json(newEntry, { success: true })
  } catch (error) {
    console.error('Request error', error)
    res.status(500).json({ error: 'Error creating question', success: false })
  }
}

Check that your API route is working as expected by using something like Insomnia or Postman. Make sure that you start your server by running npm run dev, and in Insomnia paste in the URL to your API endpoint. In my case, the API is expecting a request object with a firstName, email, subject, and message.{
  "firstName": "Camila",
  "email": "testing@gmail.com",
  "subject": "Come speak",
  "message": "We have an opportunity for you"
}

Testing my API with Insomnia would look like this for me. Once you get back a 200, you’ve confirmed that your endpoint is working as expected.

To confirm that your data was written to your database, you can open a new terminal in your project and run npx prisma studio. This will open up a visual instance of your data in the browser to verify your data is in there.

Saving data from your front-end to your database
You’ll have to build out a way to get this data from your user in the front end and then pass that data to your API. In this example, we are working with a form. We know we have all the data and are ready to write to the database when the user hits submit. With this, one approach would be to write a function that gets executed when the form is submitted, making the call to our API with the data gathered from our user in the form. Here’s what mine looks like:const handleSubmit = async (e) => {
  e.preventDefault()
  const body = { firstName, email, subject, message }
  try {
    const response = await fetch('/api/inquiry', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(body)
    })
    if (response.status !== 200) {
      console.log('something went wrong')
      //set an error banner here
    } else {
      resetForm()
      console.log('form submitted successfully !!!')
      //set a success banner here
    }
    //check response, if success is false, dont take them to success page
  } catch (error) {
    console.log('there was an error submitting', error)
  }
}

const resetForm = () => {
  setFirstName('')
  setEmail('')
  setSubject('')
  setMessage('')
}

In this case, I am using the useState hook and setting the state for each variable (firstName, email, subject, and message) by passing an anonymous function to the onChange property of the form inputs.<input
  type='text'
  name='first-name'
  id='first-name'
  autoComplete='given-name'
  onChange={(e) => setFirstName(e.target.value)}
  value={firstName}
  className='bg-zinc-300 text-gray-200-900 focus:ring-indigo-400 focus:border-indigo-400 border-warm-gray-300 block w-full rounded-md px-4 py-3 shadow-sm'
/>

You’ll have to call the handleSubmit function somewhere to execute this code. Because I’m using a form, I can pass the function call to the onSubmit property like this:<form action="#" method="POST" onSubmit={(e) => handleSubmit(e)}

All done! Now you’re ready to deploy your database to work in production and take your app live.
Deploying to production
Navigate back to your PlanetScale database. Hit the connect button and select Prisma from the dropdown menu. Hit the button to generate a new password, and be sure to copy/paste this somewhere for you to access later.

In my example, I’m using Netlify. You can use Vercel to deploy your app, and the steps will be similar. In my case, the project was already deployed, so I’ll have to go back and make some changes to my environment variables and redeploy. If you are deploying this project for the first time, you can set the environment variables in your initial configuration and won’t have to redeploy as outlined below.
Create a Netlify account and connect the GitHub repo that is connected to this project. Navigate to Site Settings.

Using the side navigation, go to Build and deploy, and select Environment.

Add a variable called DATABASE_URL and set the value to be the URL you were given from your PlanetScale-generated password. Be sure to remove the quotes that wrap the URL.

Save these changes. In the Deploys tab, hit the button that says Trigger Redeploy. Now you’re ready to either push this code to your main branch or merge the branch you’re working on into main to see your new database live.

Give yourself a pat on the back because you just deployed your first PlanetScale database 🥳.
Try it out
Follow this guide and spin up a working app in just a few minutes! Create a new database, define your data models, and write your API to write to your database directly from your Next app. Tweet the team with any questions you have @planetscale.]]></content>
        <summary><![CDATA[A step-by-step guide for using PlanetScale and Prisma with Next.js.]]></summary>
      </entry>
    
      <entry>
        <title>How our Rails test suite runs in 1 minute on Buildkite</title>
        <link href="https://planetscale.com/blog/how-our-rails-test-suite-runs-in-1-minute-on-buildkite" />
        <id>https://planetscale.com/blog/how-our-rails-test-suite-runs-in-1-minute-on-buildkite</id>
        <published>2022-01-18T15:37:51.755Z</published>
        <updated>2022-01-18T15:37:51.755Z</updated>
        
        <author>
          <name>Mike Coutermarsh</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[At PlanetScale, our backend API is built with Ruby on Rails. It’s a pretty standard Rails application. We use minitest for our test suite and FactoryBot for creating test data.
Everyone on our team has worked in the past on Rails applications with slow test suites and knew how much it hurts team productivity. As our app has grown, we have continually invested time into keeping our test suite fast. We know how much a quick feedback cycle pays off for our team and a little extra work on it makes every feature we build easier.
Local development
We never run all of our application’s tests in local development. It’s not a good use of time and will never be as fast as running them on CI. When working locally, we’ll run the tests for the single file we modified, or just a single test at a time. Then we push the commit and get feedback for the whole test suite quickly.
Our whole test suite locally takes around 12 minutes running serially on a MacBook Pro. We haven’t put much effort here because it’s not something our engineers ever run.
Parallel Tests on CI
Rails now can run tests in parallel with minitest. If you’re using another test framework, various gems enable this as well.
This had the biggest impact and is also the easiest step to improve your test suite speed. When we initially started this, we began by running our tests in parallel on 2 workers. You’re limited by the number of cores the machine you’re running on has.
This gave us some speed gains, but we wanted it really fast. Our infrastructure team set us up with some 64 core machines on Buildkite.# Only run in parallel on CI
if ENV["CI"]
  parallelize(workers: 64)
end

After this change, our test suite ran in around 3-4 minutes. We clearly still had some issues to figure out. The next step was improving the tests themselves.
Auditing FactoryBot
After a bit of digging, we noticed most of our test time was spent setting up test data. We use FactoryBot for this.
We began investigating this by putting a debugger in our tests to stop execution right after the test setup. We used pry here to look around at all the objects created and see if they matched our expectations. We found a few surprising places where we were creating up to 8× as many objects as we thought we were.
This is a common mistake in FactoryBot. The library makes it so easy to set up relationships between data that it’s possible to trigger the creation of more associated objects than you expect.
Fixing our Factories
Solving this was more straightforward once we knew the problem. We set up tests with our expectations for the amount of data our factories should create.test "factory doesn’t create tons of databases" do
  create(:database)
  assert_equal 1, Database.count
end

These tests failed at first, but we worked through the factories and eventually got them down to creating the correct number of objects.
This gave us another huge gain, and after a few of these changes, the test run time dropped to ~1 minute.
We keep these tests in our models, protecting us from any regressions when making changes to our factories.
Read more
Parallel tests in Ruby on Rails
Buildkite custom agents
FactoryBot]]></content>
        <summary><![CDATA[Learn how we use minitest and FactoryBot with parallel tests to get our Rails test suite to run in 1 minute on Buildkite.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing Prisma’s Data Platform PlanetScale integration</title>
        <link href="https://planetscale.com/blog/planetscale-mysql-database-on-prisma-platform" />
        <id>https://planetscale.com/blog/planetscale-mysql-database-on-prisma-platform</id>
        <published>2021-11-18T14:55:55.366Z</published>
        <updated>2021-11-18T14:55:55.366Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[The Prisma Data Platform integration was updated in September 2022. The information in this blog post is no longer up to date. Read the PlanetScale and Prisma Data Platform integration documentation for the latest setup instructions. 
As developers, we often want to build faster, but that comes with tradeoffs that we have to deal with in the long run. At PlanetScale, we want to empower developers to be able to build without having to worry about issues of database scalability as their application grows. Similarly, Prisma wants to empower developers to efficiently work with data while making fewer errors.
This is why I’m excited that PlanetScale and Prisma have partnered up to allow developers to create PlanetScale databases in the new Prisma Data Platform. You can have a starter database schema and a live PlanetScale database ready to accept thousands of new database connections with a few clicks.
PlanetScale, paired with Prisma's next-generation ORM (Object-Relational Mapping) tools, already allowed developers to query a deployed database in minutes. Now with the new Prisma Data Platform, you can now get started with PlanetScale and Prisma without leaving your browser. Your PlanetScale databases can instantly use the Prisma Data Platform's Query Console, Data Browser, and Data Proxy features.
The Prisma Data Platform gives you application templates with Prisma data schemas, so you don’t even have to think about a data model to get started. Once set up, you can deploy to Vercel immediately or use the Prisma Data Browser and Query Console to explore your PlanetScale database.

Prisma Data Browser
As developers, unless we are SQL experts, it can be a pain to quickly add and delete data from our databases when developing new features. Unless you have a seed script written, adding data while working locally can be a pain. But with the Prisma Data Browser, you can instantly add data to your database in the browser, validate the result of a query, or quickly view the data in your PlanetScale database. You can also invite other members of your team and use Prisma Data Browser collaboratively.

Prisma Query Console
After you have added data to your database, the next step is to query your database. Whether you are planning out queries to add to your application or analyzing the data, the Prisma Query Console is ready to query your PlanetScale databases immediately after database creation.

Prisma Data Proxy
Lastly, the Prisma Data Proxy is an intermediary between your application and your database. We’re excited about this because you will soon be able to connect PlanetScale with Cloudflare Workers using the Prisma Data Proxy. Stay tuned for documentation and examples for how to get your Cloudflare workers up and running with PlanetScale.
Get started today
Create your first PlanetScale database with one of the application templates and experience the power of Prisma with PlanetScale for yourself. You can find more in PlanetScale’s Prisma Data Platform documentation.
Note: This is in Prisma Early Access right now.
Also, today at 1:30pm EST/10:30am PST during the Prisma Serverless Conference, don’t miss my conversation with PlanetScale CTO and co-founder Sugu Sougoumarane about scaling databases in a serverless world. We will dig into Vitess, which powers PlanetScale, and learn more about what does and doesn’t work when you need to scale database infrastructure for serverless applications.]]></content>
        <summary><![CDATA[Create a PlanetScale database on the Prisma Data Platform; immediately store and query data from the browser]]></summary>
      </entry>
    
      <entry>
        <title>Bring your data to PlanetScale</title>
        <link href="https://planetscale.com/blog/import-your-mysql-data-to-planetscale" />
        <id>https://planetscale.com/blog/import-your-mysql-data-to-planetscale</id>
        <published>2021-11-17T15:30:00.000Z</published>
        <updated>2021-11-17T15:30:00.000Z</updated>
        
        <author>
          <name>Phani Raju</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[For full guidance on how to import your database to PlanetScale with no downtime, check out our Database Imports tutorial and our Migration guides. 
We’re happy to announce that PlanetScale now supports zero downtime data migrations from your existing MySQL database. We’re leveraging the power of Vitess to let you connect your database to PlanetScale, try us out as a replica and then switch over your database to PlanetScale completely. That’s right, no dumping your data, no restoring from backup. Just give us a connection and let us handle the rest.
When you connect your existing MySQL database to PlanetScale, you instantly get to experience the power of PlanetScale’s platform, which includes:
Secure Passwords that scale to thousands of simultaneous connections
Query Insights, start finding & optimizing slow queries
Web Console to query your database directly from the PlanetScale web UI
PlanetScale’s data imports are powered by Vitess’s Unmanaged Tablet technology. In this blog post, we will go over how to perform an import of your existing MySQL database into PlanetScale.
Connect PlanetScale to your existing MySQL database server
Let PlanetScale know how we can reach your MySQL server and which database you’d like to import. You’ll need credentials for a user who has read and write permissions to your database, and you will need to have binary logs enabled.

We will notify you of any schema incompatibility or connection errors that might interfere with using PlanetScale:

Head on over to our documentation to see how to resolve these compatibility issues, and then try importing your database again.
Initial data copy from your database to PlanetScale
If there are no issues, you can get started with the import. The first step is an initial online data copy from your database.

In this phase, PlanetScale will connect to your database and begin copying over the schema and data from your target database automatically, writing it in chunks back to PlanetScale.
When this is done, you’ll have an up-to-date copy of your data securely stored and ready to be queried.
Replicating changes from your database to PlanetScale
After the initial data copy is complete, we will keep PlanetScale’s database in sync with your database by utilizing Binary Log File replication. Any changes made to your database, including inserts, deletes, and updates are replayed on PlanetScale.
Connect your application to PlanetScale’s copy of your database
Now that you’ve made it this far, you can deploy your application pointed to PlanetScale as the database.
This works because your new PlanetScale database is also acting as a “data router”. Queries will be served by PlanetScale with writes transparently routed to your existing database, and then replicated back to PlanetScale using Vitess’ powerful Routing Rules.
This step of routing your application’s reads and writes through PlanetScale allows you to safely validate that your application is fully compatible with PlanetScale without taking downtime and without fragile application-level migration mechanisms like dual writes.
Switching over to PlanetScale as the Primary database
After you have ensured that your application is successful using PlanetScale, you can promote it as the primary or cancel the data import. You do this by clicking on the “Enable primary mode” button in the data import banner.

This will reverse the direction of the routing — reads and writes will now go directly to PlanetScale, and we’ll replicate any writes and updates back to your original database, to make sure it stays in sync. This allows you to cut traffic over without worrying about a difficult or unsafe migration process if anything goes wrong.
Detaching PlanetScale from your database
Once you’re fully migrated and ready to use PlanetScale forever, we’re going to detach your external database from PlanetScale. Click on the “Detach external database” button and follow the prompts to disconnect PlanetScale from your database. We will delete all connection details from PlanetScale and no longer read/write from/to your external database.

Now, you’re fully onboarded to PlanetScale and your database is at home here!
Try it out
Try importing your existing MySQL Database on PlanetScale and let us know how we can make this better. We’d love to hear from you on any ideas you have to improve onboarding. We are always looking for ways to polish the Planet.]]></content>
        <summary><![CDATA[PlanetScale now supports zero downtime data migrations from your existing MySQL Database]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale is GA</title>
        <link href="https://planetscale.com/blog/ga" />
        <id>https://planetscale.com/blog/ga</id>
        <published>2021-11-16T00:03:57.138Z</published>
        <updated>2021-11-16T00:03:57.138Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[It has been an incredible six months since we released our product in beta. From the outset, we focused on creating a platform that would delight developers, built on top of the only open source database proven at hyper scale, Vitess. For years, our technology has allowed companies like Square, GitHub, and YouTube to scale their database and, in turn, their businesses. Our mission is to bring this power to everyone.
We believe databases should be powerful, easy to use, and have impeccable developer experience. This is why we chose to build our serverless platform so that developers can be productive without ever having to worry about scale.
The beginning of this journey was December 1st, 2020. This was when the first line of code was committed on PlanetScale’s cloud database platform. Everything you see and experience on our platform, apart from Vitess, has been created in less than a year. This incredible pace will not stop. We are on a mission to bring in a new era for databases.
So far, the journey has been wild. There are already major sites running on top of the platform with some of the world’s largest brands working with us to solve their database challenges. The response from the community has been almost overwhelming. We have felt your love and excitement and it’s incredibly energizing.
All of this energy and momentum has led us to our last announcement. I am excited to say that we have closed $50M in Series C funding led by Kleiner Perkins. We are proud to welcome Bucky Moore to our board and are delighted to be working with him. The round includes further investments from our existing investors a16z, SignalFire and Insight Partners, alongside new investments from Tom Preston-Werner, Max Mullen, and JackAltman.
I am so incredibly proud of the PlanetScale team. They bring an unrelenting spirit and love for our mission. This is a group of people who are deeply unsatisfied with the state of databases and show up every day to build the future. To date, we have not even exposed 10% of Vitess’ power and functionality. We have truly only just begun.
Thank you team, and thank you to our community. We are doing this for you: the builders, the optimists, the creators, the scalers.]]></content>
        <summary><![CDATA[PlanetScale is now GA.]]></summary>
      </entry>
    
      <entry>
        <title>Introducing PlanetScale Managed on AWS and GCP</title>
        <link href="https://planetscale.com/blog/introducing-planetscale-managed" />
        <id>https://planetscale.com/blog/introducing-planetscale-managed</id>
        <published>2021-11-03T15:00:00.000Z</published>
        <updated>2023-10-18T15:00:00.000Z</updated>
        
        <author>
          <name>James Cunningham</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[When the PlanetScale beta was released in May of 2021, we initially set out to launch on a shared tenancy model where databases from different organizations run in separate processes while sharing compute resources.
Shortly after our beta launch, we saw high demand from organizations with regulatory requirements and compliance constraints that prohibited them from running in a shared tenancy model. It was clear that we needed to offer a single-tenancy model as well.
For these customers who require a single-tenancy option, we now offer an alternative that allows you to use PlanetScale’s database workflow and management functionality from inside your own cloud network and accounts: PlanetScale Managed.
Single-tenant deployment on AWS and GCP with PlanetScale Managed
In this configuration, you're able to use the same API, CLI, and web interface that PlanetScale offers, with the benefit of it running on your own AWS or GCP account. We have packaged the best parts of PlanetScale into a to-go container and delivered them to your own account; bringing you the best of SaaS with the added benefit of a deployment free of noisy neighbors.
PlanetScale Managed is available for deployment on Amazon Web Services (AWS) and Google Cloud Platform (GCP).
How does it work?
PlanetScale Managed is a packaged data plane that’s deployed to an AWS or GCP sub-account that you own and we operate. Additionally, it is available in any AWS region with at least 3 Availability Zones, even ones we don’t offer in the hosted product!
We offer private database connectivity via AWS PrivateLink for AWS accounts or Private Service Connect for GCP accounts.
You and your team will interact with your databases through app.planetscale.com, as you normally would with our hosted product, without requiring any changes to pscale tooling or Continuous Integration workflows.

Benefits of PlanetScale Managed
Single-tenancy isn't the only benefit when it comes to PlanetScale Managed. With this premium, white-glove service, you also get:
A truly fully managed database solution
Expert assistance for setting up and managing your horizontally sharded database
BAAs for HIPAA compliance
Deployment to additional regions
PCI compliance
Additional Support options
Available on AWS Marketplace (AWS only). Your PlanetScale purchase through the AWS Marketplace and the resources you use on PlanetScale will qualify against your EDP commitment.
How do I get PlanetScale Managed?
If you're interested in enhancing your development workflow and planning for long term scale with PlanetScale Managed, click here to set up time to talk with our database experts.
We’d love to talk about your team’s needs and how PlanetScale Managed can meet them.
You can also read more in the PlanetScale Managed documentation.]]></content>
        <summary><![CDATA[Deploy PlanetScale in your AWS or GCP account with our Enterprise PlanetScale Managed plan.]]></summary>
      </entry>
    
      <entry>
        <title>New PlanetScale pricing: Scaler plan upgrades and our new enterprise plan</title>
        <link href="https://planetscale.com/blog/introducing-new-planetscale-pricing" />
        <id>https://planetscale.com/blog/introducing-new-planetscale-pricing</id>
        <published>2021-10-28T15:40:00.000Z</published>
        <updated>2021-10-28T15:40:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[In May of this year, we launched PlanetScale, the database for developers, to easily use developer workflows on top of Vitess, the database technology that powers hyperscalers. From the outset, a goal of ours has been to make pricing transparent and friendly to users.
We never want you to be surprised by a bill.
The initial reception to this structure has been fantastic, with over 8,000 databases created in our free tier since May. We are dedicated to always refining and making things even simpler. To this end, we’ve gathered feedback and updated our pricing to make it clearer and even more generous so you can start using PlanetScale now and never stop.
Developer and Scaler plans
Update: The Hobby plan (previously known as Developer) was deprecated on April 8th, 2024.

Update: The Scaler plan has now been deprecated in favor of the resource-based Scaler Pro plan. Scaler Pro has since been renamed to the Base plan.
For our Scaler plan, we’re massively expanding the amount of included storage and usage.
The first 25GB of storage are included, then billed at $1.25/mo per GB
The first 500 million row reads are included, then billed at $1.50/mo per 10 million rows read
The first 50 million row writes are included, then billed at $1.50/mo per 1 million rows written
Our Scaler plan is billed per database and costs $29 per month, plus any additional usage and storage over the included limit. It includes all of the Developer plan features, plus more:
10,000 connections
10 branches per database, with the option to purchase more
Automated backups every 12 hours
Ticket-based support
15 day audit log retention
We’re also simplifying our free Developer plan. The Developer plan now includes 10GB storage, 100 million row reads per month, 10 million row writes per month, and up to 1,000 connections.
You are limited to one free database per organization.
The Developer plan allows you to get up and running with a free PlanetScale database in seconds. And once you outgrow the free plan, the upgrades we’ve made to the Scaler plan should keep you going for a while. Our intention with the Scaler plan upgrades is to give you a fantastic baseline offering with affordable options to grow with us.
You can find more information about the Developer and Scaler plans on our Billing documentation page.
Enterprise plan updates
We’ve also added a resource-based Enterprise plan for organizations with larger databases. This plan includes enterprise features like:
Single Sign-On (SSO)
Custom audit log retention
Unlimited database branches
Built-in horizontal sharding
Premium support
And more
Stay tuned for more updates to our Enterprise offering early next month.
We’d love to hear from you! If you have any questions or feedback on our pricing updates, please contact us at support@planetscale.com.
Try it out
You can sign up for a PlanetScale account now and spin up a new database that scales indefinitely, thanks to the power of Vitess, in just seconds.
Happy developing!]]></content>
        <summary><![CDATA[We’ve updated our database plans to better meet your needs]]></summary>
      </entry>
    
      <entry>
        <title>Comparing AWS’s RDS and PlanetScale</title>
        <link href="https://planetscale.com/blog/planetscale-vs-aws-rds" />
        <id>https://planetscale.com/blog/planetscale-vs-aws-rds</id>
        <published>2021-09-30T15:52:00.000Z</published>
        <updated>2021-09-30T15:52:00.000Z</updated>
        
        <author>
          <name>Jarod Reyes</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We have been blown away by the reception to our product from the developer community, but maybe even more surprising are the number of customers switching from RDS to PlanetScale lately. I’ll do my best to quickly lay out the main reasons we see so many businesses switching to PlanetScale from AWS RDS.
Amazon RDS is a relational database service by Amazon Web Services. Essentially, RDS MySQL is a database in the cloud. PlanetScale is similarly a MySQL database hosted in the cloud, but with an emphasis on unlimited scale potential and developer-first design. The main advantage of using RDS at this point is Postgres support. So, if you are devoted to Postgres, RDS is your best bet. However, if you're looking for a database that will scale with your growth and that fits into your developers workflows, switching from RDS to PlanetScale is a no-brainer. Here are the top two reasons we see organizations and developers leaving RDS for PlanetScale: scalability and developer workflow.
 🚀 Want an in-depth look at how PlanetScale stands up to RDS? Check out our PlanetScale vs Amazon RDS comparison page. 

Connection limits and other scalability issues with RDS
I know, it’s surprising that an AWS product like RDS has scaling limits, but it turns out hosting databases isn’t enough. Scaling a business’s database layer requires more than just buying larger resources and provisioning bigger machines. Most often we see that when businesses scale, they need to scale the connections to their databases as well.
PlanetScale Vitess offering runs on the open-source database clustering system that was built to scale YouTube in 2010. Vitess continues to power the databases of some of the web's largest companies, such as GitHub, Pinterest, and more.
Vitess and PlanetScale have the ability to scale to nearly limitless connections. In fact, we recently ran some benchmarking showing our ability to run one million concurrent connections on a PlanetScale database. We'll dig more into this next.
All of the numbers below are based on real-world Vitess clusters

Scale issue #1: Connection limits
While RDS limits connections to 16,000, PlanetScale has been designed to scale to nearly limitless database connections per database. And while you can have up to 16,000 connections on RDS, you will have to manually upgrade and increase connection limits or create and manage your own connection pool. For developers building modern web apps, which often have thousands of simultaneous connections from different clients, this does not scale.
Scale issue #2: Connection pooling
Connection pooling is a well-known database access pattern that keeps database connections active so that when a database connection is later requested, one of the active connections are used rather than having to create a new connection from scratch. RDS MySQL will allow you to create up to 16,000 connections, but you will have to create connection pools yourself. With PlanetScale, Vitess elegantly manages the state of the connection pool, meaning you just make queries to your database without worrying about connections at all.
Bonus Feature — Query Consolidation
Vitess also makes sure that identical requests are automatically served to multiple clients simultaneously through a single query. Often, the outages we see from customers who were on NoSQL or RDS databases are cascading outages due to an initial spike in query response times. This is often due to anomalies or odd traffic patterns (think seasonal hits to your website). Vitess gets around this by identifying spikes in query attempts. So if 3 million people go to your YouTube video at once, Vitess will notice that multiple clients are simultaneously (or nearly simultaneously) attempting the same query and serve them all from the same connection.
Scale Issue #3: Total cost of RDS
Maybe most important to most businesses is pricing scalability. While RDS has a very complete picture of their pricing structure (it is 2740 words long, which is roughly 20% more words than are on the entire album Abbey Road by the Beatles), its rigidity does not lend itself well to sudden increases in database usage or quickly scaling your needs up or down. We’ve designed PlanetScale pricing to allow businesses to only pay for what they use, though we do have hybrid models that allow businesses to bring their own resources or indeed pay per machine (talk to sales if you’d like to discuss this more).
In the end, we saved 20-30% by switching to PlanetScale. — Barstool Sports
The scaling issues alone are often enough for a business to choose to switch to PlanetScale, but we see also see another set of developers who are bringing us onto their teams because of our simpler developer workflow.
PlanetScale’s developer workflows and non-blocking schema changes
It’s not entirely user friendly. There’s definitely a learning curve when starting to use the software —G2 Review of RDS
If you go onto G2 or any other review board for AWS RDS, you will see the negative comments mostly revolve around complexity and ramp up time to understand the product. When we began designing the PlanetScale cloud product, we knew that this was the main way we could improve the developer experience for databases — by removing the ramp up time and making critical routines like schema changes and CI/CD processes much easier to manage with a database.
Non-blocking schema changes

Making schema changes on RDS is complicated and often requires the use of multiple external libraries. Even worse, it often requires downtime or maintenance windows. This is becoming increasingly harder for DBAs and engineering teams to stomach when the suite of tools they use daily are making continuous deployments with no downtime easier and easier to do. Without a ton of your own orchestration, this is just not easily done with databases.
PlanetScale’s technology makes non-blocking schema changes a reality. PlanetScale allows you to deploy schema changes directly in the CLI and web UI. Additionally it will automatically check upstream for any other changes that may have been introduced in database branches and let you know if it is safe to deploy. You can read more about non-blocking schema changes on our docs.
The gist: PlanetScale will never require a maintenance window or downtime for a migration.
Staging branches and CI/CD workflow automation
I won’t go into too much detail about Continuous Integration/Continuous Deployment (CI/CD) — Red Hat describes it well here — but it has become the gold standard for engineering teams and it relies on easy automation of delivery and deployment tools. We at PlanetScale wanted to make creating a staging environment for your database much easier. But RDS developers recommend using mysql_dump to make a copy of your AWS MySQL production database for staging. This is why we introduced database branching.
PlanetScale created database branches to allow our customers to handle data like they handle features on GitHub. The ability to cut a branch of your database and then make changes and redeploy to your main database is a huge time saver. We already know of developers who are using the pscale CLI. You can find additional guidance in our Using the PlanetScale CLI with GitHub Actions workflows blog post.
The gist: Making a staging environment with up-to-date data schemas should not require a ticket to the data team.
Dev tools and documentation
One of the last reasons we see developers switching from RDS is the lack of great documentation or tooling. Our customers note that configuration alone for a new RDS resource requires hours of dev time. This is too much time to sink into processes that should be standardized and can be automated. Setting up a database should be easier in 2021.
Provisioning a new RDS Database requires a prerequisite number of steps followed by these 13 steps. Compare that to the ability to instantly provision a database on PlanetScale. A database that allows for nearly unlimited connections, database branches, and query insights for every instance. It’s just not a fair comparison. Honestly if you’d like to see how fast it is just sign up for a plan now and provision a database.
Takeaways
Clearly AWS RDS is optimizing for the incumbent industry mammoths who have engineering hours to spare and don’t move very quickly (Aurora is still on MySQL 5.7, not the latest version MySQL 8). But for every other business, we prioritized speed and scale. RDS offers a breadth of customization and configuration at the cost of needing to employ many more DBAs to manage your data store.
While AWS has managed to make compute resources globally reliable, their database layer still does not scale well. RDS does not scale nicely with businesses that are on a growth trajectory, hitting connection limits, replica limits, and indeed pricing floors that require you to manage your database more as you grow, not less. Businesses who are moving quickly need more support, more guidance, and should not be required to master a new interface. For this reason, we give all of our customers access to the database experts who built PlanetScale and Vitess in order to help them migrate to PlanetScale quickly and with the support they need.
PlanetScale does not offer the same customization, but does provision in seconds and allows businesses of all sizes (regardless of how many DBAs they hire) to reliably scale like experts. PlanetScale offers a much better developer workflow, which reduces time to feature and the overhead of needing to file a data ticket for every new feature.
If you'd like to test PlanetScale out, you can use our Database import tool, which allows you to import your RDS database to PlanetScale with no downtime if you choose to switch over. Check out our RDS migration guide for more information.]]></content>
        <summary><![CDATA[PlanetScale draws customers from RDS due to better scalability, superior developer workflow, and typically much lower cost]]></summary>
      </entry>
    
      <entry>
        <title>Quick deploys using the Web Console</title>
        <link href="https://planetscale.com/blog/sql-in-web-console" />
        <id>https://planetscale.com/blog/sql-in-web-console</id>
        <published>2021-09-13T15:18:00.000Z</published>
        <updated>2021-09-13T15:18:00.000Z</updated>
        
        <author>
          <name>Elom Gomez</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[Before today, if you preferred doing everything in a visual interface or GUI, you may have found it difficult to run SQL commands with PlanetScale. The current setup for running SQL commands against your PlanetScale branches includes assumptions, such as command line knowledge and CLI expertise, that are not friendly to some users. At PlanetScale, we are constantly thinking about different ways to improve the developer experience with our product, and today, we continue that tradition by introducing a web console. The web console can be found on your database branch page.

You can now go through the database branch lifecycle using just the website, no need to switch between apps to connect to your database branch and run commands.
We’ve built in friendly error messages, for example, running an ALTER/CREATE/DROP command on a PlanetScale production database branch.

Here’s an example workflow where I deploy a TypeScript application using our Vercel integration and the web console.
The workflow walks through:
Using the PlanetScale integration to create and connect a PlanetScale database to your Vercel application
Inserting tables into your database branch using the web console
Deploying your database branch
Give it a try, and let us know what you think!]]></content>
        <summary><![CDATA[Deploy a TypeScript app using the PlanetScale Vercel integration and Web Console]]></summary>
      </entry>
    
      <entry>
        <title>Optimizing SQL with Query Statistics</title>
        <link href="https://planetscale.com/blog/optimizing-sql-with-query-statistics" />
        <id>https://planetscale.com/blog/optimizing-sql-with-query-statistics</id>
        <published>2021-08-31T15:12:41.189Z</published>
        <updated>2021-08-31T15:12:41.189Z</updated>
        
        <author>
          <name>David Graham</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[When you experience database issues, capturing queries on a running system is a hard problem. Existing solutions, such as tcpdump, SHOW PROCESSLIST; or third-party monitoring agents, can cause additional system load or gather only a small subset of queries.
Starting today, all PlanetScale database branches now track statistics about each SQL query that has executed against it without any overhead. By using Vitess, which powers PlanetScale databases, to track how many times a query runs, how many rows it returned, and how long each query takes to complete, we provide a complete view of query traffic on the database. With this insight into your database, you can debug your queries and make changes to your application or quickly deploy new indexes with non-blocking schema changes.
As a starting point, we flag any query running for longer than 100 milliseconds as a candidate for optimization.
To learn more about how to use query statistics, check out our documentation here.
Debugging a slow query
While developing this feature we found a few queries in our PlanetScale production database that needed additional indexes to complete more quickly. We reduced one query’s average runtime in the database backup feature by 98%. First let’s look at the way the database backup feature works and then dig into the slow query.
Every database branch is backed up to external storage at least once per day. A daily backup replaces the previous day's backup as soon as it completes successfully. This ensures we always have a recent backup of every branch’s data.
Deleting a daily backup requires two steps:
Mark the backup record as deleted—a soft delete—by setting its deleted_at timestamp.
Delete the backup's data files from external storage.
We periodically run a background job to find expired daily backups that need to be deleted. The job runs this query in batches.select * from backup where deleted_at is null and expires_at <= :vtg1 order by id asc limit :vtg2

The :vtg1 syntax is a replacement parameter provided by Vitess. The actual values look like the following.select * from backup where deleted_at is null and expires_at <= current_timestamp order by id asc limit 1000


Using the query statistics report we discovered this query takes an average of 719ms to find expired backups to delete. The only good news about this slow query is that it runs inside a background job, rather than in a web request cycle, so it doesn’t slow down users. It does consume too much time in our job workers and in the database, so let’s find out why it’s slow.
Optimizing a slow query with Query Statistics
We think this query should be faster because we have a composite index on (expires_at, data_deleted_at, deleted_at). However, the MySQL explain plan shows that the query planner chooses a full table scan over every row rather than using this index.> explain select * from backup where deleted_at is null and expires_at <= current_timestamp order by id asc limit 1000\G;
*************************** 1. row ***************************
        id: 1
  select_type: SIMPLE
        table: backup
   partitions: NULL
         type: ALL
possible_keys: index_backup_on_expires_at_and_data_deleted_at_and_deleted_at
          key: NULL
      key_len: NULL
          ref: NULL
         rows: <100s of thousands—every row in the table>
     filtered: 5.00
        Extra: Using where

A couple observations about the data reveals why this is a poor index and points us to a more selective index that will narrow this result set dramatically.
97% of the backup rows are expired, so querying by expires_at < current_timestamp is not selective.
97.5% of the rows are soft deleted, so querying by deleted_at is not null is highly selective.
In retrospect, this makes sense when dealing with daily backups that replace the previous day's backup. However, most of our tables do not match this workload and the relationship is the reverse: most rows are live rather than soft deleted. We often include deleted_at as a trailing key in composite indexes for this reason, but this isn’t quite right for the backup table.
If we change the order of keys in this index to (deleted_at, expires_at, data_deleted_at) then the query is highly selective. It can eliminate deleted rows from the set and search over the few remaining rows that may be expired.
The explain plan for the new index shows that the query planner does indeed choose the index and estimates it needs to visit only a single row to provide the result.> explain select * from backup where deleted_at is null and expires_at <= current_timestamp order by id asc limit 1000\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: backup
   partitions: NULL
         type: range
possible_keys: index_backup_on_deleted_at_and_expires_at_and_data_deleted_at
          key: index_backup_on_deleted_at_and_expires_at_and_data_deleted_at
      key_len: 17
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using filesort

Conclusion
Using PlanetScale query statistics along with MySQL explain plans to optimize an index reduced a 719ms query to under 20ms in our background job workers. Full table scan performance further degrades as the table size grows, so this query would have eventually consumed enough time inside the database to impact other queries that are servicing web requests. By optimizing this query, we’ve ensured that this process will never impact the user experience.
Try it out now!
You can see the real-time query statistics for your PlanetScale databases right now. Or you can sign up for PlanetScale and migrate your data to get more insight into your databases. Check it out and let us know what you think.]]></content>
        <summary><![CDATA[Check the performance of your SQL queries in real-time.]]></summary>
      </entry>
    
      <entry>
        <title>NoneSQL All the DevEx</title>
        <link href="https://planetscale.com/blog/nonesql-all-the-devex" />
        <id>https://planetscale.com/blog/nonesql-all-the-devex</id>
        <published>2021-08-27T15:36:52.435Z</published>
        <updated>2021-08-27T15:36:52.435Z</updated>
        
        <author>
          <name>Justin Gage</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MySQL, MongoDB, Firebase, Spanner; there has literally never been a better time to be a database user at any level of complexity or scale. But there’s still one common thread (ha!) among them – the focus is on infrastructure, not developer experience. This post is going to argue that database internals are going to matter less long term; developer experience is what will differentiate offerings.
Most database products today focus on internals
To get some perspective on the shift happening in databases, it’s worth looking back at what are essentially 3 eras (at least from this perspective) of data stores. The common thread between them all is the improvements were infrastructure focused, not developer experience focused.
1. The beginning, relational land
An early history of databases is beyond the scope of this post, but it’s safe to say that databases were synonymous with relational data for the first couple of decades of their existence. MySQL’s initial release was in 1995 (that’s 26 years ago), and Postgres’s was a year later in ‘96. They’re actually still the most popular databases in terms of usage on the planet, and there’s probably a lesson there, but for another time.
Fundamentally, SQL was about rigidity and transactional integrity (i.e. ACID) – even though today, people use it to query data that’s less structured. Before OLAP was a thing, the priority for these databases was making sure your reads were clean and your inserts worked, every time, without exception; and so to do that, you structured your schemas in advance. And this was fine, for the most part. Where things started to go wrong was scale.
Note: While SQL refers to the actual language used to query relational databases, over the years it has become synonymous with the concept of pre-defined schemas and relational data, hence the moniker “SQL Database.”

The problem with MongoDB, and NoSQL databases generally, is that they’re not ACID compliant (they’re eventually consistent, as per CAP Theorem), and developers actually often liked SELECT * over something like .findOne().
NoSQL databases are maturing, for sure – we’re starting to see support for transactions (on some timeframe of consistency) and generally more stability. After years of working with “flexible” databases though, it has become clearer that rigidity up front (defining a schema) can end up meaning flexibility later on. If you decide down the road to analyze data in a different context or another area of your application, having a schema defined can produce reliable results from changing queries. And, fundamentally, developers actually often liked SELECT * over something like .findOne().
2. SQL scalers
Enter the “new thing” in databases, which people are calling SQL scalers – relational databases with transactional rigidity, but they scale horizontally. One example is Spanner, a relational data store developer at Google (paper here) that claims to scale horizontally and infinitely (e.g. cross shard transactions). Aurora, which is AWS proprietary, also claims to be distributed and faster than MySQL and co. Another good one to note is Vitess, an orchestration layer for MySQL, which is what we work on here at PlanetScale.
Again, the narrative centers around infrastructure – you don’t have to sacrifice scale for transactional rigidity, etc. Even Spanner’s product page talks a lot about scale, and not very much about anything else:

You can start to see the early seedlings of “SaaS talk” here – marketing around spending less time on manual tasks, and more time on what matters (whatever that is) – but the message is overtly performance oriented. Which makes sense, because that’s how we’ve been thinking about databases for the past 20 years.
Database internals will eventually just not matter
I believe there’s a good chance that databases will follow in the path of general compute, and what’s actually going on under the hood is going to be more commoditized; instead, databases will win based on superior developer experience. We’re converging on a shared understanding of what reliability means, and that it’s more important to move fast than own infrastructure, as long as things can scale later. This is the direction that all infrastructure is taking really, and it’s also a trend we’ve seen in developer-focused SaaS (think Stripe, Twilio). In some ways, of course, infrastructure is developer experience; but databases are getting so good and general purpose that it’s going to be academic.
First, defining terms: what’s the difference between infrastructure and DevEx as it relates to databases? Here’s a potential taxonomy:
Task
Category
Task
Category
Right sizing instances
Infrastructure
Schema / migrations
Developer Experience
Scaling to meet demand
Infrastructure
Version control
Developer Experience
Optimizing cost
Infrastructure
CLI ergonomics
Developer Experience
Optimizing performance
Infrastructure
User management
Developer Experience
Upgrades and patches
Infrastructure
IDE / Client
Developer Experience
Networking
Infrastructure
Data Branching
Developer Experience
A lot of these infrastructure tasks are already automated (or more accurately, outsourced) with DBaaS like RDS. But a lot of it isn’t – you still need to worry about sizing and scaling manually (in most cases). Infrastructure automation happens gradually; even with Lambda – which is “serverless” – you need to allocate specific amounts of memory, which sounds an awful lot like a server to me. But you don’t need to install and configure, and progress is progress.
A useful model to look to is data warehouses. When Redshift was state of the art (not long ago), you chose the size and power of your data warehouse in advance, and this is still how they price it. But Snowflake and BigQuery came along and completely removed infrastructure from the discussion. To do that, you need two things:
Pricing that completely follows usage ($/GB stored + queried)
A focus on the developer experience (query UI, permissions, marketplace, etc.)
It’s possible the reason this happened quickly in warehousing – as opposed to production data stores – is that use cases are more narrow and often not mission-critical.
In developer tools (and software more broadly), there’s always a split between smaller sized and larger sized customers:
Smaller customers value simplicity and predictable pricing
Larger customers want extreme flexibility and granularity
When I worked at DigitalOcean, this was a core dichotomy that colored how we approached building product, pricing, and go-to-market. And with databases it’s the same – the “serverless” notion is more exciting to smaller customers who don’t need to worry about what’s going on behind the scenes. Enterprises with mission critical applications care very much about the small details.
But that, too, eventually changes – at some point, the database gets good enough to scale from a tiny company all the way up to the largest apps in the world. Not to scare you away, fellow developers, but this is actually exactly the narrative in one of the most famous business books of all time, The Innovator’s Dilemma. Products start as disruptive on the low end, get laughed at by the larger guys, and then eventually eat those same larger guys.
Defining developer experience in databases
What does developer experience in databases actually mean, and what would something great look like? What do we have to look forward to, in other words?
Over the course of the eras of databases we outlined above, companies have backed their way into figuring out some answers to that question. You can separate the obvious ones into 3 large buckets, but beyond that, there are so many things that we probably can’t even imagine yet, but are going to be awesome.
Interacting with your database
How do you query your data? How easy is it to connect, wherever you are? How easy is it to get the data you want?
1. Application queries
How does your application interact with your database? NoSQL did a great job of normalizing (no pun intended) the use of client libraries in your codebase – instead of ugly triple quotes SQL formatting, you could write something like .insertOne() in whatever language you define your endpoints in. For SQL, this has existed for a while in the form of ORMs (like ActiveRecord for Ruby on Rails) but it has usually been the job of the framework, not the database.
In the future, I’d expect to see a tighter coupling between the frameworks we’re using for reactive frontends – React, Vue, etc. – and the database, via hooks or otherwise. We’re already seeing this with Prisma and Co. who define this as “a better ORM.” The model of an un-opinionated database with something like PostgREST on top is already changing (another good example is Fauna).
2. Ad-hoc queries
It’s becoming table stakes for DBaaS providers to include a UI for querying as part of the product. This started with data warehouses, but made its way into products like Supabase that are targeting production use cases. It’s a lot nicer to log into BigQuery and write queries there directly than what I had to do with Hive – install custom JDBC drivers into DBeaver.
3. Authentication and user management
User management and granular permissioning will be built into the database layer as a critical part of developer experience. You’ll be able to restrict specific tables, types of data, branches, etc. to specific users, revoke access after specific amounts of time, etc. You can technically do this in DBs like Postgres, but new databases will rethink it from the ground up and make it dead simple via a UI or/and CLI.
The CLI in general is another great place to focus on. What would psql look like if it was reimagined from the ground up? How tightly can we couple the CLI with a web UI to make authentication simple?
Workflows: migrations, version control, and environments
Ah, migrations – the reason your local environment isn’t working even though you pulled 10 seconds ago. Nobody can ever perfectly guess what their schema will be in 5 years; new features get built and existing ones get refactored. What could version control for a database look like?
If your database isn’t SQL or NoSQL under the hood but instead whatever you want, you’ll be able to choose when to apply schemas and when not to. Your schema will, paradoxically, be flexible, and because of that, it will be able to follow Git workflows just like your code. Imagine opening a pull request on your database, writing queries side by side to see the different results, having your teammates add comments, and then merging it into production.
We already (sort of) version control our databases – changes get made in staging, tested against a frontend also in staging, and then deployed to prod. Maintaining parity between your local/staging database and prod can be tedious. What if each database pull request created an entirely new deployment of your database (all data affected), and that connection string automatically got injected into your application.
Pricing, scale, and monitoring
Pricing is part of developer experience. As the infrastructure powering your database gets further abstracted, the primary component of scale will just be price: you’ll be paying per GB stored, which will scale linearly (maybe with discounts) as your app gets larger. The dashboards and UI you get will shift from monitoring infrastructure (latency, throughput, etc.) to monitoring cost and making sure you don’t get stuck with a huge bill you weren’t anticipating (I’m looking at you, AWS).
All of these questions (or guesses) are concretely more interesting than “why am I getting CONNECTION DENIED errors” and that’s the point – developer experience doesn’t have to be infrastructure, and it can be exciting.
What do you think the next era of databases is going to look like? What is DevEx going to look like for cloud databases in 5 years? Let the PlanetScale team know on Twitter (@planetscale) or join the discussion on HackerNews.]]></content>
        <summary><![CDATA[Databases will win based on superior developer experience not what is under the hood.]]></summary>
      </entry>
    
      <entry>
        <title>Automatically copy migration data in PlanetScale branches</title>
        <link href="https://planetscale.com/blog/automatically-copy-migration-data-in-planetscale-branches" />
        <id>https://planetscale.com/blog/automatically-copy-migration-data-in-planetscale-branches</id>
        <published>2021-08-23T15:23:00.000Z</published>
        <updated>2021-08-23T15:23:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Never accidentally reapply a past migration again! We've removed another barrier to using PlanetScale branches with the framework or migration tool of your choice. Now, you can automatically persist schema changes in your migration table across database branches.
Many frameworks and migration tools keep track of database schema changes in a migration table that includes data about what migrations have been applied and in what order. Without that migrations table data, many tools will incorrectly try to reapply previous migrations, which would cause errors during migration. The migrations table is the missing link that shows how we arrived at the current schema by recording which schema migrations were applied.
In PlanetScale, you can now turn on the ability to automatically copy migration data from these tables any time you open a deploy request or make a new database branch. There are built-in options for Rails, Phoenix, Django, .NET, Prisma, Sequelize, or you can specify the migration table name used by your framework or migration tool.
You may ask yourself, "How does this fit into my existing workflow?" Good question! Here's an example workflow from PlanetScale Software Engineer, Iheanyi Ekechukwu, demonstrating a schema migration with Prisma in a TypeScript application:
While this demo is using Prisma, many frameworks and migration tools will use a similar workflow. You can see our migration tutorials with Prisma and Ruby on Rails in the docs.
Update: We now recommend using prisma db push instead of prisma migrate dev with a shadow branch. Read more in our documentation about prisma db push and PlanetScale.
Give it a try, and let us know what you think!]]></content>
        <summary><![CDATA[Use PlanetScale branching with the database schema migration tools of your choice]]></summary>
      </entry>
    
      <entry>
        <title>Building PlanetScale with PlanetScale</title>
        <link href="https://planetscale.com/blog/building-planetscale-with-planetscale" />
        <id>https://planetscale.com/blog/building-planetscale-with-planetscale</id>
        <published>2021-08-18T15:05:00.000Z</published>
        <updated>2021-08-18T15:05:00.000Z</updated>
        
        <author>
          <name>Iheanyi Ekechukwu</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Our mission at PlanetScale is to build the best database for developers, so of course we’ve been using PlanetScale as our primary database since we started. We have high standards for our development workflow and velocity, and we know our customers expect the same. Simply put, if it’s not good enough for us, it won’t be good enough for you. This blog post will talk about how we leverage features like branching and non-blocking schema changes to constantly ship new features to our users.
Our local development environment
There are many moving parts that power PlanetScale, but for me, an engineer on the Surfaces team (which is responsible for the UI, API, and CLI), there are two primary codebases that I focus on. My days are usually spent working on the PlanetScale front-end, which uses Next.js with TypeScript, and the Ruby on Rails API that powers it. For simplicity and ease of use, each engineer runs the Ruby on Rails application locally, which speaks to a local MySQL instance. You’re probably wondering–if you use MySQL for local development, when do you use PlanetScale? When it comes time to make schema changes, we use PlanetScale’s branching feature to deploy changes to our production database which is hosted on PlanetScale.
For developers, schema changes are a way of life
Before we talk about how we do migrations at PlanetScale, let’s walk through why schema changes are so important to get right.
There’s no escaping making changes to a database’s schema when utilizing a relational database as a product engineer. Whether it’s adding a column to a pre-existing table or adding a table that will power a new feature, it’s the developer’s way of life. In previous projects and engineering teams, shipping database changes to production has been a huge source of pain. In one case, the migrations were tied to application deployments, so the service would attempt to run the database migrations before starting. The downside of this is that when a database migration is long-running, it will prevent other application versions from being deployed until it finishes. Additionally, unexpected errors can occur during a migration, and you don’t want those affecting the application deployment process.
There’s no escaping making changes to a database’s schema when utilizing a relational database as a product engineer.
I’ve also been a part of engineering organizations with a dedicated team for applying database migrations. This involved opening up a ticket with the migration’s SQL statements and then waiting for the database team to review and apply them. While this sounds great in theory because it is somebody else’s responsibility, the speed of the migrations rolling out then depends on the availability of another team and any other database migrations that may come before it. This waiting game negatively affects the time it takes to ship and test a new product feature in production and hurts product velocity. After these past experiences with database migrations, I am thoroughly impressed with the PlanetScale workflow for deploying schema changes with zero downtime and without breaking production traffic.
Empowering developers with database branching and deploy requests
My previous experiences with database migrations helped me understand and appreciate the value of non-blocking schema changes in PlanetScale. First, we create our database migrations like we usually do, using the relevant Rails command:bundle exec rails g migration CreateAuditLogTable

This creates a migration file that we can then use to change the database schema. When it is time to apply this database migration, we usually apply these migrations to two locations:
The local MySQL database server, to test the code locally
A dedicated database branch in PlanetScale, which is an isolated copy of your production database that you can make changes to
After getting our database changes to a good spot locally, we’re ready to create a pull request on GitHub and a deploy request on PlanetScale. I won’t explain how to create a pull request (although GitHub has a great guide), but here’s the workflow we use for creating and deploying a deploy request:
Creating a new database branch
With the PlanetScale CLI, we create and switch to a new branch in our database by running the following command:pscale branch switch add-audit-logs-table --database ourdatabase --create

This command will store the configuration for using the database branch locally at .pscale.yml, which is used by planetscale-ruby for connecting when needed.
Connecting to the new database branch with Rails
To switch between our local MySQL database and our PlanetScale database branch, we use an environment variable called ENABLE_PSDB. When this setting is enabled, it connects the Rails application to the database branch in PlanetScale. We can now make any desired changes against this branch in complete isolation from our main production database.
Below, you can see how we use this environment variable in our config/database.yml file and custom config/planetscale.rb file:development:
  primary:
    <<: *default
    port: <%= ENV['ENABLE_PSDB'] ? 3305 : nil %>
    database: <%= ENV['ENABLE_PSDB'] ? 'ourdatabase' : 'psdb_development' %>
  primary_replica:
    <<: *default
    port: <%= ENV['ENABLE_PSDB'] ? 3305 : nil %>
    database: <%= ENV['ENABLE_PSDB'] ? 'ourdatabase' : 'psdb_development' %>
    replica: true
# Connect to the main production database and start the PlanetScale Proxy
if Rails.env.production?
    PlanetScale.start(
      org: 'planetscale',
      db: 'ourdatabase',
      branch: 'main'
    )
elsif Rails.env.development? && ENV['ENABLE_PSDB']
  PlanetScale.start(org: 'planetscale')
end

Applying the schema changes and creating a deploy request
When it comes time to apply this migration, we run ENABLE_PSDB=1 bundle exec rake db:migrate, which then applies the schema change against the database branch. You can then either go to the PlanetScale application and open a deploy request from the branch itself, or use the CLI to create a deploy request like so:pscale deploy-request create ourdatabase add-audit-logs-table

Usually, I visit the Deploy requests page within our database in the web application and copy the URL to the deploy request, or construct it myself from the deploy request number. The creator then pastes that URL into the body of their pull request on GitHub so reviewers can look at both side-by-side. The deploy request also shows the DDL statements (CREATE/ALTER/DROP) for each table changed in the migration, with a line-by-line diff, so everybody with access can clearly see what will happen.



We have a dedicated Slack channel for pull requests, which usually helps decrease the turnaround time on reviews. Posting a link to the deploy request in Slack also doesn’t hurt, which helps decrease the review time since schema migrations tend to be fairly brief.
Deploying the schema changes
After a deploy request has been approved by a team member, the creator can deploy the schema changes to the main production branch by adding it to the deploy queue. The deploy queue enables multiple deployments to be queued up at once, so another teammate can also queue up their schema changes for deployment after the currently running one. If anything goes wrong during the migration (such as adding a NOT NULL constraint to a row that is NULL), the deployment will stop and show the relevant error. After fixing the error, the deployment can then be restarted. After the deployment is successfully completed, the GitHub pull request is merged and our application gets deployed to production with the new changes. If something needs to be added or changed in the schema, it’s just as easy to create another deploy request with the updates and push those changes to a new pull request.
The beauty of this process is that it decouples the deployment of database schema changes from the application deployment process without needing a database administrator to handle it.
Database migrations aren’t scary anymore
The beauty of this process is that it decouples the deployment of database schema changes from the application deployment process without needing a database administrator to handle it. Additionally, these migration deployments come with no downtime or locking of production database tables. Since we’ve started using database branches and deploy requests to manage our schema in production, I feel empowered whenever I’m building new features or making schema changes. It’s no longer scary to make changes to the database, even after we’ve had some long-running (multi-day) migrations. We can see the operations that will occur directly in a deploy request before we deploy them. Non-blocking schema changes help us move fast and ship new features without breaking things or requiring a database administrator.
…I feel empowered whenever I’m building new features or making schema changes. It’s no longer scary to make changes to the database…
Whether you are a developer who likes to hack on side projects or part of an engineering team, I’d love for you to experience the joy of using PlanetScale. Sign up today and give us a shot. Happy hacking!
P.S. We’re hiring! If you’re interested in being a part of the team that builds the best database for developers, take a look at our careers page!]]></content>
        <summary><![CDATA[How PlanetScale uses database branching and non-blocking schema changes to build PlanetScale.]]></summary>
      </entry>
    
      <entry>
        <title>Connect any MySQL client to PlanetScale using Connection Strings</title>
        <link href="https://planetscale.com/blog/connect-any-mysql-client-to-planetscale-using-connection-strings" />
        <id>https://planetscale.com/blog/connect-any-mysql-client-to-planetscale-using-connection-strings</id>
        <published>2021-08-16T20:03:00.000Z</published>
        <updated>2021-08-16T20:03:00.000Z</updated>
        
        <author>
          <name>Taylor Barnett</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[Today, we are excited to share that connection strings are available to all PlanetScale users.
As a company that obsesses over making databases more accessible to developers, you might have struggled with why PlanetScale didn’t have a way to connect to your database with connection strings. And we heard you! You can now use the tools you're familiar with to connect to PlanetScale databases, whether that’s with Rails, Python, Prisma, Laravel, or any other MySQL client. Connection strings also enable you to connect to other serverless computing platforms like AWS Lambda or Vercel.
Within PlanetScale, you can generate a new password and automatically get client code for many popular frameworks and languages for you to connect your PlanetScale database.

Strong passwords, never stored in plain text
PlanetScale connection strings are built with security as a priority, so you can spend less time worrying if your database connections are secure. PlanetScale Passwords are created for use with a single database branch. This strong security model allows you to generate passwords tied to a branch, which prevents accessing the data or schema on the given branch from another branch. PlanetScale also only stores hashes and metadata about your database passwords. We do not store any passwords in plain text to add an extra layer of security to your database.
GitHub Secret Scanning out of the box
Leaked secrets happen. We’ve all been there. If one of your PlanetScale Passwords or service tokens are committed in plain text to a public GitHub repository or private repositories owned by organizations where GitHub Advanced Security is enabled, through the GitHub Secret Scanning program we will automatically take corrective action to delete the access tokens and shut down all access from them.
Built for serverless scale
Connection Strings now enable you to connect your serverless functions with PlanetScale databases on serverless platforms such as AWS Lambda and Vercel Serverless Functions. No need to worry about managing your connection count. PlanetScale can handle tens of thousands of simultaneous database connections. Scaling with you as your serverless application grows.
Native MySQL authentication support
PlanetScale supports both MySQL native authentication, which is widely used to provide a secure connection to MySQL servers, and MySQL Caching SHA-2 authentication, which is the most secure authentication mechanism to connect to MySQL. Based on your application needs and platform support, you can switch between the authentication modes with the same password.
You can read more about connecting securely to PlanetScale in our docs.
Try it out
If you haven’t already done so, create a new database in PlanetScale and try connecting with a connection string by generating a new password today. Give it a try and let us know what you think.]]></content>
        <summary><![CDATA[Connect PlanetScale to any MySQL client with Connection Strings for a true database experience.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale on Vitess</title>
        <link href="https://planetscale.com/blog/planetscale-on-vitess" />
        <id>https://planetscale.com/blog/planetscale-on-vitess</id>
        <published>2021-07-20T19:30:00.000Z</published>
        <updated>2021-07-20T19:30:00.000Z</updated>
        
        <author>
          <name>Deepthi Sigireddi</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[We have all experienced the limitations of the relational model and the operational burden of running a large fleet of databases. At PlanetScale, our vision is to build a database developers love without compromising on any of the database features required to run an application that can scale up as needed. To achieve this, we knew we needed a database engine with a track record of powering companies that deal with humongous amounts of data and traffic. That’s why we chose Vitess.
The Power (and Promise) of Vitess
Vitess was created at YouTube over 10 years ago to solve the massive scalability problem that YouTube was facing. Over time, other companies with similar scalability needs adopted Vitess because its power, flexibility and scale outpaced other solutions. This means that Vitess has been powering multiple large-scale production systems for several years now.
Enterprise-grade scalability that can support companies like YouTube and Slack is a key benefit of Vitess. However, there are valuable advantages that Vitess provides to early stage companies. Building on a solid foundation means that you no longer have to worry about needing to migrate to a different data platform once hypergrowth materializes.
Vitess provides an abstraction over MySQL that adds missing functionality while retaining the benefits of a relational database and allows you to automate failure detection and recovery. In addition to this, Vitess provides safety features like automatic row limits, hot row protection, query consolidation and non-blocking schema changes which reduce the likelihood of high load bringing down the database. These are all features that vanilla MySQL cannot and does not provide.
The combination of high-availability and safety means that even a small-scale database deployment can benefit hugely from deploying Vitess over MySQL.
Adopters of Vitess like Slack and JD.com have seen the benefits of a highly-available relational database system. While exact numbers from JD.com are not available, Slack runs 100% on Vitess and their uptime numbers speak for themselves.
Want to learn more about how to run Vitess? Get a crash course in setting up, deploying, and managing Vitess in our free Vitess course. 
A word about Sharding
Sharding is one of the superpowers that comes with Vitess. Vitess allows you to split out the data into multiple shards and have a unified view across them. In addition to this core capability, the maintainer team at PlanetScale has built out the underlying infrastructure in a way that facilitates many more workflows, including but not limited to:
Materialization of data while redacting sensitive information
Moving tables around to balance database size
Migration (copying) of data from one system to another
Re-sharding with a different sharding key if the initial choice turns out to be sub-optimal
While sharding is not something we expect early adopters of PlanetScale to need right away, it will be available to use as these systems grow.
All this sounds great, but..
While Vitess reduces the operational burden of managing a large fleet of MySQL instances, it comes with its own operational complexity. There are the typical issues associated with managing any large software deployment ranging from version upgrades and incompatibilities to managing the hardware (or cloud provisioning) required to run Vitess in production.
In addition to these, there is a shortage of Vitess experts who can run such a system in production. There is a steep learning curve associated with acquiring proficiency in Vitess and it is quite likely that many of the people who try it out never end up going into production for this reason.
Enter PlanetScale
At PlanetScale, we have set our minds to solving the problem of providing the easiest-to-use database possible that is designed around developer needs. There are two main strands to this theme.
Ease of adoption
Creating a new database for development purposes should be dead simple and take no more than 10 seconds. The PlanetScale onboarding experience is focused on making this as seamless as possible. This also means that we need to be able to deploy a functional Vitess cluster in that short time period and that is just what we did. Vitess’ pre-existing compatibility with Kubernetes facilitated this greatly with the addition of some secret sauce from PlanetScale.
Ease of use
We need to be solving real problems that real developers have. A particular pain point that comes up repeatedly is the difficulty of making schema changes to a running system. The fear of hurting the production environment while pushing these updates has led to schema change processes that can take up to hours, days, or weeks. This slows down the pace of innovation and creates frustrating hurdles for developers.
PlanetScale’s database branching and Deploy Requests make schema changes fast and easy. Developers can ship product updates without having to worry about locking or causing downtime. The non-blocking schema change functionality is a core feature of Vitess and we have built on top of that to automatically check for potential conflicts before a schema change is deployed.
Building on PlanetScale
The combination of PlanetScale and Vitess allows developers to quickly provision a database to use with an application. Once the application is in production, schema changes can still be made in a safe, asynchronous manner without affecting system availability. The database is highly available and any failures are handled by the service. As the application scale grows, Vitess’ sharding capabilities can be leveraged to maintain performance without downtime. All of this functionality provides you with the foundation to start building and scale indefinitely.]]></content>
        <summary><![CDATA[At PlanetScale our vision is to build a database developers love that can scale indefinitely. To do this we knew we needed a database with a history of powering companies that deal with humongous amounts of data and traffic. That’s why we chose Vitess.]]></summary>
      </entry>
    
      <entry>
        <title>Sam Lambert appointed new CEO of PlanetScale </title>
        <link href="https://planetscale.com/blog/new-ceo-of-planetscale" />
        <id>https://planetscale.com/blog/new-ceo-of-planetscale</id>
        <published>2021-07-19T07:00:00.000Z</published>
        <updated>2021-07-19T07:00:00.000Z</updated>
        
        <author>
          <name>Jiten Vaidya</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[It is with great pleasure that I announce that I have made the decision to transition the role of Chief Executive Officer of PlanetScale to Sam Lambert. I will still be deeply involved in the company as the Chief Strategy Officer, and will also continue to serve on the board.
Sam joined PlanetScale as Chief Product Officer nine months ago at a pivotal time for the company, as we began the transition from the hosted Vitess platform that served as our foundation to the developer-experience-focused, limitlessly scalable MySQL database that we launched in May.
Sam has done a tremendous job in hiring a top-notch team, articulating a compelling product vision and executing on product, engineering and marketing. Sam and I have partnered closely in the last few months, and he has strongly exceeded my expectations as a leader at PlanetScale.
I would like to take this opportunity to officially congratulate Sam on his new role. I have every confidence that PlanetScale will continue to thrive under his leadership.
I am as excited today as I was the day we founded PlanetScale three and a half years ago. We have a brilliant team pushing the state of the art of what a developer focussed database can be in the cloud. I look forward eagerly to what we will achieve in the next chapter.]]></content>
        <summary><![CDATA[Announcing a new CEO for PlanetScale]]></summary>
      </entry>
    
      <entry>
        <title>The promises and realities of the relational database model</title>
        <link href="https://planetscale.com/blog/the-realities-of-the-relational-database-model" />
        <id>https://planetscale.com/blog/the-realities-of-the-relational-database-model</id>
        <published>2021-07-13T04:00:00.000Z</published>
        <updated>2021-07-13T04:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[The relational model — tables with a predefined set of typed columns and with cross table references — is one of the oldest surviving models in computer science and in real life deployments. It’s rare to see models that survive decades of software evolution. This survival suggests that the model is sensible, practical, and solid. The relational model makes sense as we organize our data into elements and their properties, and how we associate different elements with each other.
Along comes SQL, a declarative language that is expressive enough to answer the simplest, or very complex questions about our data. It’s well understood by DBAs, developers, data scientists, analysts and more. SQL or its variants are found in all relational databases, and in popular non-relational databases.
But the relational model has drawbacks, and those are more on the operational side: while database systems optimize for read/writes, they do not optimize as much for metadata changes. And most specifically to schema changes. There are various problems with schema changes, but if I look into the most demanding one, it is the fact that schema changes require an operational undertaking, that is outside the hands of the developer.
I believe the issue of schema management is one of the major reasons to push developers away from the relational model and into NoSQL solutions.
There is a historical context. Back in the old days, schema changes were not so frequent. We didn’t have the internet, and products would evolve to the next version over months. DBAs commonly acted as gatekeepers to the database, ensuring the schema is valid, the queries are performant, and they would carefully evaluate any request for a schema change. The flow to make a change necessarily involved discussions between different owners. The change itself was not considered to be in the data path. You’d, for example, take the system down for scheduled maintenance, run a series of schema changes, bring the system back, and repeat every few months.
The times have changed. Products are offered as-a-service over the internet. Taking systems down for maintenance is not as tolerated. We run continuous deployments, and development velocity is increased. Today, it’s not uncommon for popular services to run multiple schema migrations per day. The roles have changed. Today’s DBAs are more of enablers; still ensuring the database service is reliable and performant, but also clearing a path for the developers to do their work and to get what they need.
And while relational database systems have evolved to meet today’s traffic volumes, they have not made similar advancement in meeting today’s developers’ needs. The problem intensifies as we try to give developers the velocity they need, and the RDBMS is still an impediment in their path..
First, it requires a deeper understanding of database behavior, of metadata locking, of operational issues that can arise as result of a migration: locks, resource exhaustion, replication lag.
Then, it requires access or privileges to run schema change operations. Developers need to identify where in production their table is found. What specific servers serve as primaries. In the MySQL world, people will commonly use 3rd party tools such as gh-ost or pt-online-schema-change, which run an online schema change through emulation and replacement. But these require access to your production system. The developer needs to understand how to invoke these tools; how to configure throttling; how to observe and monitor their progress; how to clean up their artifacts.
It requires developers to be able to handle errors. These could be anything from internal database error, to tooling error, to mid-migration failover scenarios.
It requires coordination and scheduling. You normally don’t want to run (deploy) multiple schema changes at once. Developers need to sync with each other, prioritize work. The database system does not provide a flow, or anything similar to the common practices familiar to developers, like version control and conflict resolution for one’s code changes.
The operational expertise illustrated above, along with the need to synchronize changes, reinstates the DBA as the database’s gatekeeper. This time as a forced constraint. As being the single coordinator, the resolver of issues and errors, the scheduler for schema changes. A small, young company is able to get by, but as the business grows the need for a person that coordinates and runs schema changes becomes apparent.
As a result, the developer is no longer the owner for their change. They need to open tickets and to grab someone’s attention, to trust someone to run their schema change, to check for updates. I’ve seen developers going in different routes to avoid this path:
Stalling development and aggregating multiple changes into single deployments.
Overloading schema-less JSON columns with more content.
Avoiding schema changes and tweaking the code in a non-optimal fashion.
Moving away from relational databases and into document stores, trading off the advantages of RDBMS for faster deployments.
A RDBMS schema change is an alien operation for many developers. It feels nothing like a code deployment. It does not enjoy the level of automation the developers come to expect of code. There is no conflict resolution mechanism to deal with rapid developments across large teams. The risk to production is high and the database does not offer a mechanism to undeploy your changes.
These are some of the things we had in mind while developing PlanetScale. We believe the relational model is solid, and that reducing its operational friction goes a long way. We wanted to create a developer friendly experience that also gives back the developers ownership of their changes. We believe this experience can give developers joy.]]></content>
        <summary><![CDATA[The relational model is one of the oldest surviving models in computer science but it has some drawbacks that need to be addressed.]]></summary>
      </entry>
    
      <entry>
        <title>Integrating PlanetScale with Vercel in a few steps</title>
        <link href="https://planetscale.com/blog/planetscale-vercel-integration" />
        <id>https://planetscale.com/blog/planetscale-vercel-integration</id>
        <published>2021-07-01T16:00:00.000Z</published>
        <updated>2021-07-01T16:00:00.000Z</updated>
        
        <author>
          <name>Nick Van Wiggeren</name>
        </author>
        
        
        <category term="tutorials" />
        
        <content><![CDATA[At PlanetScale, we aim to be the world’s best database for everyone by building the only database platform you can start within seconds and scale continuously to meet your needs as you grow.
Vercel, the creator of the popular open-source Next.js React framework, recognized the power of our database and how it could benefit the thousands of developers using their frontend web development platform. Just like PlanetScale, Vercel enables developers to create high-quality web experiences without having to worry about scaling and performance.
Today we are happy to announce a new integration for Vercel!
Together, Vercel and PlanetScale combine a powerful serverless platform with a scalable and easy-to-use database, providing an incredible development experience with limitless scale that is elegant and easy to use.
Recently, I joined Vercel’s Head of Developer Relations Lee Robinson on a live stream to discuss all the features and demo how simple and powerful our integrations are. We started with a SQL database on PlanetScale, deployed an entire Next.js application, connected Vercel to PlanetScale, authenticated with our account, and then deployed the application, all in a matter of minutes! The process is that easy and is a real game-changer for front-end development.
To learn about the simple and easy steps to deployment, watch the live stream recording.
Or get started today and head over to our tutorial section or visit Vercel.
If you need help, reach out to the PlanetScale’s support team, or join our GitHub Discussion board to see how others are using PlanetScale.]]></content>
        <summary><![CDATA[Together Vercel and PlanetScale combine a serverless platform with a scalable and easy-to-use database providing an incredible development experience with limitless scale.]]></summary>
      </entry>
    
      <entry>
        <title>Serverless finally has a database</title>
        <link href="https://planetscale.com/blog/serverless-finally-has-a-database" />
        <id>https://planetscale.com/blog/serverless-finally-has-a-database</id>
        <published>2021-05-24T17:00:00.000Z</published>
        <updated>2021-05-24T17:00:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[In 2010, when the Vitess project was started to help the YouTube team at Google handle MySQL database scalability issues, the Ruby on Rails application framework was about 5 years old, and there were already several “Unicorn” companies that had leveraged both technologies to massive success. What has happened in the interim? If you trace the advancements in databases, do they compare with the advancements in application development?
More than 10 years later, Rails has gone on to be tremendously influential in the world of app dev, directly inspiring other languages and communities to embrace frameworks, convention over configuration, and other ideas that seem so commonplace now that it’s surprising that they were scarce in a pre-Rails world. In terms of what’s freely available to the typical dev in terms of databases, however, advances have been mostly relegated to the types of problems that FAANG-scale companies face, or otherwise are focused on optimization and mousetrap improvement.
Another way of looking at this is that the “compute” component of application development, as it has come to be known in the public cloud commodity era, has propelled tremendous development and change in how people write the “business logic” part of applications. The most predominant example of this seismic change is the maturation of Serverless technology, but it has to this point been hindered, ironically, by the one thing that we’ve all supposed to have figured out by now: databases. Serverless has had a major, mostly unforeseen impact beyond its technical advances as well -- people love the economics. Don’t maintain what you don’t need, don’t lay out money up front, don’t overprovision -- basically, don’t pay for what you don’t use.
So why hasn’t Serverless been served by the right database? If you consider the spectrum from Single Page Applications to “Serious Distributed Backend Applications,” it’s clear that the extremes have been served relatively well. There are decent, but not great, options for the types of simple data stores required by SPAs, and the “Very Serious Enterprise Apps” (VSEAs) are being served by “Serious Enterprise Serverless Databases” which appear to delight in contorting developer’s brains to get them to “think about problems differently.” In addition to these types of solutions not being built for the typical developer, there has also been almost no innovation in terms of the types of workflows, capabilities, and platforms that application developers have enjoyed in the past decade. The whole middle, then, which is where most of us develop, is underserved. Until now.
The dream behind PlanetScale is to think beyond the database to consider what a data platform could look like that serves the modern application developer’s needs as well as something like GitHub does for code. We thought, “If you designed something from scratch to cover the entire spectrum from SPAs to VSEAs, what would it look like?” We’ve obsessed over bringing you the best developer experience possible, figured out how to deliver innovative features like non-blocking schema changes, deploy requests, and database branches, and we’ve packaged it in the same “pay as you grow” style that we’ve come to enjoy on the compute side of things. An easy on ramp, and a platform that you can start using today and stop using never. With PlanetScale, the data needs of Serverless developers can finally be served.]]></content>
        <summary><![CDATA[The dream behind PlanetScale is to serve the modern application developer’s needs as well as something like GitHub does for code.]]></summary>
      </entry>
    
      <entry>
        <title>Non-Blocking Schema Changes</title>
        <link href="https://planetscale.com/blog/non-blocking-schema-changes" />
        <id>https://planetscale.com/blog/non-blocking-schema-changes</id>
        <published>2021-05-20T18:45:00.000Z</published>
        <updated>2021-05-20T18:45:00.000Z</updated>
        
        <author>
          <name>Lucy Burns</name>
        </author>
        
        
        <category term="product" />
        
        <content><![CDATA[With the launch of PlanetScale, we are excited to share more about the non-blocking schema change workflow we’ve built for our platform.
The challenge
We love developers. We obsess over how to make them more productive with their database. In every conversation with our customers, we’ve heard that schema changes are one of the biggest pain points when it comes to using a relational database. We’ve heard about schema change processes that require opening tickets to get a DBA review for each change. This can take weeks.
At some companies, engineering teams won’t change or update certain columns in their production databases because the migration will take too long and will cause performance issues. Others have told us about how they turned columns in their relational databases into JSON stores, just to avoid schema migrations.
While other technologies have grown more and more developer friendly, databases remain difficult to use, in part because of the challenges schema changes present.
Several options are available to alleviate some of the pain. Liquibase and Flyway are two more manual tools used to handle the challenges of schema versioning and also provide some deployment management capabilities. Both pt-online-schema-change and gh-ost offer online or non-blocking schema changes (schema change migrations that don’t lock tables while being deployed). They do so by creating a new table that is a copy of the given table. The schema changes are applied to the new table and the data in the original table is copied over. Once that is complete, the original table is replaced by the new table. However, these tools are often run manually and require the support of additional infrastructure. (Read about how GitHub implements gh-ost here.)
PlanetScale is a platform designed with an out-of-the box workflow that doesn’t require additional management overhead and protects our users from making changes that block databases, lock individual tables, or slow down production during schema changes.
We want developers to push schema changes as easily, and as often, as they push code changes.
Non-Blocking Schema Changes
Using Vitess and gh-ost under the hood, we provide our users with a safe, easy, reliable way to push schema changes to production.
Our non-blocking schema change workflow:
Allows users to test out schema changes on a branch that is isolated
Analyzes schema changes in advance to ensure there are no conflicts
Deploys schema changes in the background without impact to production
How does it work?
Branching
We provide a database branching feature that allows users to create sandbox environments for testing database changes. When a user needs to make a schema change, they create a database branch from their production database, which is automatically deployed with a copy of the production schema. The user can test out schema changes on their branch without worrying about impacting the production database.
Deploy requests
Once a user is satisfied with their schema changes, they can create a deploy request. Users have the option to request a review of their changes from a teammate, or they can add their deploy request directly to the deploy queue.
Deploy queue
The deploy queue represents all of the deploy requests, or schema changes, for a given database that are awaiting deployment. PlanetScale deploys schema changes in the order in which they are received. (In our experience, deploying schema changes one at a time is generally more efficient than running them concurrently, with a few exceptions.)
When the deploy request reaches the front of the queue, the deployment to production begins. This process happens in the background and is sensitive to production traffic. If there’s a spike in traffic, the schema change migration will scale down to avoid using resources needed to handle the increased traffic.
What about conflicts?
To avoid migrating a schema change that will conflict with the production schema, PlanetScale analyzes the schema changes in advance of deployment.
When a user creates a deploy request, PlanetScale automatically checks for conflicts, analyzing the schema on the branch against the main branch schema at the time of the branching. PlanetScale also analyzes the schema changes against current schema on main, which may have changed in the time since the branch was created, ensuring that no conflicts exist.
Additionally, when a user adds a deploy request to the deploy queue, PlanetScale checks the schema changes in the deploy requests ahead of that user’s deploy request for any potential conflicts. If a conflict exists, the deploy request is rejected from the queue, and the user is notified of the conflict. This prevents users from having to wait until it is their turn to deploy, only to discover unanticipated conflicts with their schema changes, and with longer running migrations, this can mean a time savings of up to a few days.
Can I try it out?
We’ve created a demo database on PlanetScale that walks you through the non-blocking schema change workflow with real data. Give it a try.]]></content>
        <summary><![CDATA[Non-blocking schema changes let you push updates to your database without fear of blocking your databases locking individual tables or slowing down production during schema migrations.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing PlanetScale: The database for developers.</title>
        <link href="https://planetscale.com/blog/announcing-planetscale-the-database-for-developers" />
        <id>https://planetscale.com/blog/announcing-planetscale-the-database-for-developers</id>
        <published>2021-05-18T17:10:00.000Z</published>
        <updated>2021-05-18T17:10:00.000Z</updated>
        
        <author>
          <name>Sam Lambert</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[Every one of us at PlanetScale is deeply unsatisfied by the current state of “modern” databases. We’ve worked at companies like YouTube, Amazon, Facebook, DigitalOcean, and GitHub, always focused on solving the same problem — database scaling.
We’ve seen the same sad pattern repeat year after year. Companies pick a stack on day 1 that optimizes for developer velocity to get that MVP out the door. After they see some success, years 3 or 4 are spent paying down immense technical debt — mostly due to that early database choice. There needs to be a better way.
PlanetScale’s technologies have been the choice of the hyperscalers for years. We’ve seen some of the largest web services in the world make Vitess their database of choice, scaling beyond imagination.
The world is full of hosted databases that don’t have much more to offer. The state of the art isn’t really that good. At PlanetScale, we hold the belief that databases should have power and flexibility that make development joyful, along with the confidence that you’ll never outgrow them.
Today we are thrilled to announce PlanetScale — the database for developers.
PlanetScale is the first database designed for developer workflows, on top of the technology of the hyperscalers.
Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. Our goal is to give you both, never compromising on the power of your datastore while making changes feel as easy as deploying code.
Give it a try and let us know what you think. We can’t wait to scale with you.]]></content>
        <summary><![CDATA[PlanetScale is the first database designed for developer workflows on top of the technology of the hyperscalers.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 9.0</title>
        <link href="https://planetscale.com/blog/announcing-vitess-9" />
        <id>https://planetscale.com/blog/announcing-vitess-9</id>
        <published>2021-01-27T19:30:00.000Z</published>
        <updated>2021-01-27T19:30:00.000Z</updated>
        
        <author>
          <name>Alkin Tezuysal</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[On behalf of the Vitess maintainers, I am pleased to announce the general availability of Vitess 9.
Major Themes
In this release, we have focused on making Vitess more stable after the successful release of Version 8. There have been no major issues reported, so no patches were released for Version 8. This has allowed us to push further on compatibility and adoption of common frameworks as priorities. We have compiled all improvements into the Release Notes. Please read them carefully and report any issues via GitHub. We would like to highlight the following themes for this release:
Compatibility (MySQL, frameworks)
Our ongoing work ensures that Vitess accepts all queries that MySQL does. We continuously focus on SET and information_schema queries in this release, as well as other common and complex queries. Several parts of the query serving module have been refactored to facilitate further compatibility enhancements.
Please note that reserved connections are still not enabled by default. You should plan to test them first in a test environment to ensure all your queries and frameworks are supported before enabling them in production.
Migration
Enhanced logging and metrics have been added to VReplication to help debug stalled and failing VReplication workflows and to provide increased visibility into other operational and performance-related issues.
VReplication support for JSON columns, which was previously incomplete, has been refactored and is now functionally complete.
A new version (v2) of the VReplication workflow CLI commands has been introduced. These commands incorporate functional and UX improvements based on user experience and feedback. They are deemed experimental (but fully functional), and we welcome feedback and suggestions on improving them further.
Innovation
There has been a significant push towards streamlining Online Schema Changes.
Changed syntax: The syntax for online DDL has been changed and finalized. We introduce the @@ddl_strategy session variable, or the -ddl_strategy command line flag to determine whether migration is executed normally (direct) or online (gh-ost or pt-osc). Furthermore, migrations now use the standard ALTER TABLE syntax.
Better auditing: A migration is now associated with a context as well as the identity of the issuing vttablet.
Better managed: It's possible to list migrations by context and to cancel all pending migrations. Vitess will automatically retry migrations that fail due to a failover.
More statements: Online DDL now also works for CREATE and DROP statements. This allows us to group together migrations with the same context.
Safe, lazy, and managed DROPs: Online DDL DROP statements are converted to RENAME statements, which send the tables to the lifecycle mechanism: tables are held for safekeeping for a period of time, then slowly and safely purged and dropped, without risking database lockdown. A multi-table DROP statement is exploded into distinct single-table operations.
As always, please validate any new features in your test environments before using them in production.
Documentation
Two new user guides have been created for new adopters of Vitess:
VSchema and Query Serving
Running Vitess in Production
There is a shortlist of incompatible changes in this release. We encourage you to spend a moment reading the release notes to see if any of these will affect you.
Please download Vitess 9 and try it out!]]></content>
        <summary><![CDATA[On behalf of the Vitess maintainers, I am pleased to announce the general availability of Vitess 9.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 8.0</title>
        <link href="https://planetscale.com/blog/announcing-vitess-8" />
        <id>https://planetscale.com/blog/announcing-vitess-8</id>
        <published>2020-10-27T17:30:00.000Z</published>
        <updated>2020-10-27T17:30:00.000Z</updated>
        
        <author>
          <name>Alkin Tezuysal</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 8.
Major Themes
In this release, we have continued to make important improvements to the Vitess project with over 200 PRs in several areas. Some of the major bug fixes and changes in behaviors are documented in the Release Notes. Please read them carefully and report any issues via GitHub. We would like to highlight the following themes for this release.
Compatibility (MySQL, frameworks)
We've continued our ongoing work to ensure that Vitess accepts all queries that MySQL accepts. In particular, work has focused on SET and information_schema queries. Reserved connections are still not enabled by default, and you might need to enable them to ensure all queries and frameworks are well supported.
We are proud to announce that we have initial support for:
PHP
WordPress
Mysqli
JavaScript
TypeORM
Sequelize
Python
Django
PyMySQL
SQLAlchemy
Ruby
Rails/ActiveRecord
Java
JDBC
Hibernate
Rust
MySQL
mysql_async
SQLx
Tooling
MySQL Workbench
Mycli
Migration
Performance and error metrics and improved logging related to VReplication workflows have been added for more visibility into operational issues. Additional vtctld commands VExec and Workflow allow easier inspection and manipulation of VReplication streams.
The VStream API was enhanced to provide more information for integration with change data capture platforms: the Debezium Vitess adapter uses this capability.
We have incorporated several small feature enhancements and bug fixes based on the increased traction that VReplication has seen both among early adopters and large production setups.
Usability
Ease of use and accessibility are very important for the Vitess community. Usability improvements were another highlight received from the community.
Innovation
We continue to add integration of popular open-source tools and utilities on top of Vitess's dynamic framework. There are a few of these in this release we would like to highlight:
VTorc: Integration of Orchestrator has continued and finally became part of Vitess. This proven open-source tool, which has been the de facto solution for MySQL failover mechanisms, is now built into Vitess. Support is experimental in 8.0, and we will continue to harden it in future releases.
Online Schema Changes: Understanding the ALTER TABLE problem and coming up with a solution using proven tools was our goal for this release. We're able to integrate both pt-online-schema-change and gh-ost to overcome major limitations for schema migrations.
There is a shortlist of incompatible changes in this release. We encourage you to spend a moment reading the release notes.
Please download Vitess 8 and try it out!]]></content>
        <summary><![CDATA[On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 8 for MySQL.]]></summary>
      </entry>
    
      <entry>
        <title>Pitfalls of isolation levels in distributed databases</title>
        <link href="https://planetscale.com/blog/pitfalls-of-isolation-levels-in-distributed-databases" />
        <id>https://planetscale.com/blog/pitfalls-of-isolation-levels-in-distributed-databases</id>
        <published>2020-10-04T18:30:00.000Z</published>
        <updated>2020-10-04T18:30:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[The more loosely coupled components are in a distributed system, the better it scales. This rule applies to distributed databases, too, and the isolation level plays a big part in this. This post attempts to explain what these isolation levels mean and the tradeoffs between them. We also give you recommendations on how to choose the isolation level best suited to your needs.
There exists a set of ANSI Standards for isolation. There is also a critique about those standards explaining the ambiguities in them. These explanations are interesting for those who are passionate about databases and transactions, but this level of understanding is not required to use a database.
In this post, we are going to cover the minimum knowledge required to use isolation levels effectively. To achieve this, we are going to study two use cases that are representative of most applications and look at their effects with respect to different isolation levels.
Case A: Bank
A customer withdraws money from a bank account:
Begin Transaction
Read the user’s balance
Create a row in the activity table (we want to avoid calling this a transaction to prevent confusion with database transactions)
Update the user’s balance after subtracting the withdrawal amount from the amount read
Commit
We do not want the user’s balance to change until the transaction completes.
Case B: Retail
An international customer buys an item from a retail store using a currency that is different from the list price:
Begin Transaction
Read the exchange_rate table to obtain the latest conversion rate
Create a row in the order table
Commit
We assume that a separate process is continuously updating the exchange rates, but we do not care if an exchange rate changes after we have read it, even if the current transaction has not completed yet.
Serializable
The Serializable isolation level is the only one that satisfies the theoretical definition of the ACID property. It essentially states that two concurrent transactions are not allowed to interfere with each other’s changes, and must yield the same result if executed one after the other.
Unfortunately, Serializable is generally considered to be impractical, even for a non-distributed database. It is not a coincidence that all the existing popular databases like Postgres and MySQL recommend against it.
Why is this setting so impractical? Let us take the two use cases:
In the Bank use case, Serializable is perfect. After we have read a user’s balance, the database guarantees that the user’s balance will not change. So, it is safe for us to apply business logic such as ensuring that the user has sufficient balance, and then finally writing the new balance based on the value we have read.
In the Retail use case, Serializable will also work correctly. However, the process that updates the exchange rates will not be allowed to perform its action until the transaction that creates the order succeeds.
This may sound like a great feature at first glance, because of the clear sequencing of events. However, what if the transaction that created orders was slow and complex? Maybe it has to call out into warehouses to check inventory. Maybe it has to perform credit checks on the user placing the order. During all this time, it is going to hold the lock on that row, preventing the exchange rate process from updating it. This possibly unintended dependency may prevent the system from scaling.
A Serializable setting is also subject to frequent deadlocks. For example, if two transactions read a user’s balance, they will both place a shared read lock on the row. If the transactions later try to modify that row, they will each try to upgrade the read lock to a write lock. This will result in a deadlock because each transaction will be blocked by the read lock held by the other transaction. As we will see below, other isolation levels can easily avoid this problem.

In other words, a contentious workload will fail to scale if using a Serializable setting. What if the workload was not contentious? In that case, we did not need this isolation level at all. A lower isolation could have worked equally well.
To work around this unnecessary and expensive safety, the application has to be refactored. For example, the code that obtains the exchange rate may have to be called before the transaction is started, or the read may have to be done using a separate connection.
The other isolation levels, although not as theoretically pure, allow you to perform Serializable reads on a case-by-case basis. This makes them more flexible and practical for writing scalable systems.
Lock Free Implementations
There are ways to provide Serializable consistency without locking data. However, such systems are subject to the same problems described above; conflicting transactions just end up failing differently. The root cause of the problem is in the isolation level itself, and no implementation can get you out of those constraints.
RepeatableRead
The RepeatableRead setting is an ambiguous one. This is because it differentiates point selects from searches, and defines different behaviors for each. This is not black and white, and has led to many different implementations. We will not go into the details of this isolation level. However, as far as our use cases are concerned, RepeatableRead offers the same guarantees as Serializable and consequently inherits the same problems.
SnapshotRead
The SnapshotRead isolation level, although not an ANSI standard, has been gaining popularity. This is also known as MVCC. The advantage of this isolation level is that it is contention-free: it creates a snapshot at the beginning of the transaction. All reads are sent to that snapshot without obtaining any locks. But writes follow the rules of strict Serializability.
A SnapshotRead transaction is most valuable for a read-only workload because you get to see a consistent snapshot of the database. This avoids surprises while loading different pieces of data that depend on each other transactionally. You can also use the snapshot feature to read multiple tables as of a certain time, and then later observe the changes that have occurred since that snapshot. This functionality is convenient for Change Data Capture tools that want to stream changes out to an analytics database.
For transactions that perform writes, the snapshot feature is not that useful. What you mainly want to control is whether to allow a value to change after the last read. If you want to allow the value to change, then it is going to be stale as soon as you read it because someone else can update it later. So, it doesn’t matter if you read from a snapshot or get the latest value. If you do not want it to change, you want the latest value, and the row must be locked to prevent changes.
In other words, SnapshotRead is useful for read-only workloads, but it is no better than ReadCommitted for write workloads, which we will cover next.
Re-applying the Retail use case in this isolation level works naturally without creating contention: The read from the exchange rate yields a value that was as of the snapshot when the transaction was created. While this transaction is in progress, a separate transaction is allowed to update the exchange rate.
What about the Bank use case? Databases allow you to place locks on data. For example, MySQL allows you to “select… lock in share mode” (read lock). This mode upgrades the read to that of a Serializable transaction. Of course, you also inherit the inherent deadlock risks of this isolation level.
In other words, a lower isolation level offers you the best of both worlds. But it gets better: you also have the option of issuing a “select… for update” (write lock). This lock prevents another transaction from obtaining any kind of lock on this row. This approach of pessimistic locking sounds worse at first, but will allow two racing transactions to successfully complete without encountering a deadlock. The second transaction will wait for the first transaction to complete, at which point it will read and lock the row as of the new value.

MySQL supports the SnapshotRead isolation level by default, but misleadingly calls it REPEATABLE_READ.
Distributed databases
Although a single database has many ways of implementing Repeatable Reads efficiently, the problem becomes more complex in the case of distributed databases. This is because transactions can span multiple shards. If so, a strict ordering guarantee must be provided by the system. Such ordering either requires the system to use a centralized concurrency control mechanism or a globally consistent clock. Both these approaches essentially attempt to tightly couple events that could have otherwise executed independent of each other.
Therefore, one must understand and be willing to accept these trade-offs before wanting a distributed database to support distributed Snapshot Reads.
ReadCommitted
The ReadCommitted isolation is less ambiguous than SnapshotRead because it continuously returns the latest view of the database. This is also the least contentious of the isolation levels. At this level, you may get a different value every time you read a row.
The ReadCommitted setting also allows you to upgrade your read by issuing a read or write lock, effectively providing you with the ability to perform on-demand Serializable reads. As explained previously, this approach gives you the best of both worlds for application transactions that intend to modify data.
The default isolation level supported by Postgres is ReadCommitted.
ReadUncommitted
This isolation level is generally considered unsafe and is not recommended for distributed or non-distributed settings. This is because you may read data that might have later been rolled back (or never existed in the first place).
Distributed Transactions
This topic is orthogonal to isolation levels, but it is important to cover this here because it has significance when it comes to keeping things loosely coupled.
In a distributed system, if two rows are in different shards or databases, and you want to atomically modify them in a single transaction, you incur the overhead of a two-phase commit (2PC). This requires substantially more work:
Metadata about the distributed transaction is created and saved to durable storage.
A prepare is issued to all individual transactions.
A decision to commit is saved to the metadata.
A commit is issued to the prepared transactions.
A prepare requires you to save metadata so the transaction can be resurrected in the new leader if a node crashes before a commit (or rollback).
A distributed transaction also interacts with the isolation level. For example, let us assume that only the first commit of a 2PC transaction has succeeded and the second commit is delayed. If the application has read the effects of the first commit, then the database must prevent the application from reading the rows of the second commit until completion. Flipping this around, if the application has read a row before the second commit, then it must not see the effects of the first commit.
The database has to do additional work to support the isolation guarantees for distributed transactions. What if the application could tolerate these partial commits? Then we are doing unnecessary work that the application doesn’t care about. It may be worth introducing a new isolation level like ReadPartialCommits. Note that this is different from ReadUncommitted where you may read data that may eventually be rolled back.
Lastly, excessive use of 2PC reduces the overall availability and latency of a system. This is because your effective availability will be dictated by the worst performing shard.
Conclusion
To be scalable, an application should avoid relying on any advanced isolation features of a database. It should instead try to use as few of the guarantees as it can. If you can write an application to work with ReadCommitted isolation level, then moving to SnapshotRead should be discouraged. Serializable or RepeatableRead are almost always a bad idea.
It is also better to avoid multi-statement transactions if possible. However, as the application evolves, this need may become unavoidable. At that point, try mainly relying on the atomic guarantees of transactions, and stay at the lowest isolation level the database system supports.
If using a sharded database, avoid distributed transactions. This can be achieved by keeping related rows within the same shard.
These recommendations may conflict with the general advice of not prematurely optimizing your program, but this case is different. This is something that one must do from the beginning, because it is very hard to refactor a non-concurrent program to be concurrent.]]></content>
        <summary><![CDATA[The more loosely coupled components are in a distributed system the better it scales]]></summary>
      </entry>
    
      <entry>
        <title>MySQL semi-sync replication: durability consistency and split brains</title>
        <link href="https://planetscale.com/blog/mysql-semi-sync-replication-durability-consistency-and-split-brains" />
        <id>https://planetscale.com/blog/mysql-semi-sync-replication-durability-consistency-and-split-brains</id>
        <published>2020-10-02T04:00:00.000Z</published>
        <updated>2020-10-02T04:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[MySQL semi-sync is a plugin mechanism on top of asynchronous replication, that can offer better durability and even consistency (term defined later). It helps in high availability solutions, but can in itself reduce availability. We look at some basics and follow up to present scenarios that require higher level intervention to ensure availability and to avoid split brains from taking place.
I recommend reading this semi-sync blog post by Jean-François Gagné (aka JFG), which illustrates the internals of the semi-sync implementation, and debunks some myths about semi-sync. We will overlap a bit with another recommended post by JFG, about high availability and recovery.
Note: in this post we adopt the term “primary” over the term “master” in the context of MySQL replication. However, at this time there is no alternative to using the actual names of some configuration and status variables that use “master” terminology, and some duality is present.
Overview
As quick recap, semi-synchronous replication is a mechanism where a commit on the primary does not apply the change onto internal table data and does not respond to the user, until the changelog is guaranteed to have been persisted (though not necessarily applied) on a preconfigured number of replicas. We limit our discussion to MySQL 5.7 or equivalent.
Specifically, the primary is configured with rpl_semi_sync_master_wait_for_slave_count, which is 1 or above, if semi-sync is to be used. An INSERT transaction, for example, will not report being committed (i.e. the user will not get “OK”) until at least that number of replicas have acknowledged receipt of that transaction’s changelog (entries in the binary log). In practice, this means at least that number of replicas have written the changelog onto their relay logs where the change is now persisted. The replicas will apply the change later on, normally as soon as they can. For the scope of this post, we assume replicas are not intentionally stopped, and we can also ignore delayed replicas.
The primary waits up to rpl_semi_sync_master_timeout, after which it falls back to asynchronous replication mode, committing and responding to the user even if not all expected replicas have acknowledged receipt. In the scope of this post, we are interested in an “infinite” timeout, which we will accept to be a very large number.
The only replicas to acknowledge receipt of the changelog are replicas where semi-sync is enabled. There could be other, non semi-sync replicas in our topology. They, too, pull the changelog from the primary, but do not acknowledge receipt. As JFG points out in his post, the semi-sync replicas aren’t necessarily the most up to date. It’s possible that a non-semi-sync replica has pulled more changelog data than some or all semi-sync replicas.
Durability
The most straightforward use case for semi-sync is to guarantee durability of the data. We only approve a commit once the transaction (in the form of a changelog) is persisted elsewhere, on a different server, or on multiple servers.
As mentioned earlier, this does not mean the changelog has been applied on those servers. Replication may be lagging and the servers may be too busy to be able to apply the changelog in a timely fashion. But, at the very least, if the primary crashes, there’s “forensic evidence” in the relay logs of a semi-sync replica that can tell us what the latest changes were.
Consistency
The term “consistency” is overloaded, and especially in distributed systems. The CAP Theorem definition of consistency is, for example, very strict: once data was written on one server, any immediate follow-up read on any other server must reflect that write (or any later write).
We will suffice with eventual consistency. This in itself an overloaded term, so let’s clarify: in the case of a primary outage, we consider our system consistent if we’re able to promote a new primary within any amount of time (obviously in practice we expect the time to be short) such that it is consistent with the previous primary’s advertised state. I write “advertised” state because the end user or applications should not be surprised by any data changes when the new primary is promoted, regardless of how transactions/replication work internally.
Split brain
A split brain is a situation where there are two servers which are simultaneously taking writes, each thinking they’re the (single) primary. In a split brain scenario the data diverges between the two servers. If those servers also have replicas, then we have two divergent replication trees. Fixing and merging those divergent trees is difficult, and normally we revert the changes on one to make it look like the other (whether by backup+restore or by some rollback method).
Reiterating JFG’s observation that while engineers normally frown upon split brains, the damage of split brain may well be within a product’s allowance.
Topologies and scenarios
As always, getting best durability as possible, best consistency as possible, best response times as possible, best split brain mitigation as possible, is costly. Product owners and engineers make reasonable tradeoffs.
Let’s consider a few configured topologies, and see what scenarios we can expect upon failure. In particular, how durable our data is, can we expect it to be consistent, and can we expect split brains.
General scenario: wait for single replica, multiple semi-sync replicas
In this common scenario our primary is configured with rpl_semi_sync_master_wait_for_slave_count=1, and there are multiple (more than one) replicas configured with semi-sync. Let’s say there are 4 semi-sync replicas, named R1, R2, R3, and R4. We nickname this setup “1-n”
In this setup, a write takes place on the primary, and before approving the write to the user/app, the primary waits for an acknowledgement from exactly one replica. It doesn’t matter which. For some writes, it will be replica R1; at some other time, R4 is the first to acknowledge.
It’s important to note that the changelog, the binary log, is sequential. A replica that acknowledges some changelog event, has necessarily received all of its prior events. This is why the primary doesn’t need to care that different acknowledgements come from different replicas. An acknowledgement means there’s a replica with full durability of committed transactions up to and including the one in question.
Should the primary fail, we have at least one replica guaranteed to have all approved commits in its relay logs. These are not necessarily applied, but if our replication cluster is normally low-lagged, we can expect those commits to be few, and we can expect the replica to catch up (apply those commits) within a few seconds or less.
A possible situation is that there may be non-semi-sync replicas even more up to date than our semi-sync replica. We can choose to use them. The fact they have more transactions than the most up-to-date semi sync replica means they have received binary log events not acknowledged by a semi-sync replica. While this may seem confusing at first, these are fine to accept, and do not contradict our promise to our users. We promised that “if the primary tells you a commit is successful, then the data is durable elsewhere”. We never said anything about the state of the data before telling the user the commit is successful. In fact, this scenario is no different than the primary actually getting a commit acknowledged by the replica, begins to communicate that to the user, then crashing halfway. The user cannot distinguish between the two cases at the time of failure (though post crash analysis may indicate which of the two was true).
And so we can choose to use the data from that even-more-up-to-date non-semi-sync replica, seed the rest of the replicas with that data, or, we can choose to throw away any server that is more-up-to-date than our most-up-to-date semi-sync replica and recycle them. It’s our choice and it’s down to operational considerations (time, capacity, load, ...).
If data is durable, does that mean it’s available?
Not necessarily. Consider this possible, even likely scenario: the primary and R1 are both in the same data center. R2, R3, R4 are in different availability zones, possibly remote, and so have higher latency than R1 communicating to the primary.
The app issues an UPDATE on the primary. R1 acknowledges the write. R2, R3, R4 do not, as yet. But the primary does not care, it commits and reports success to the app. Then the primary and R1 both get network isolated together because the DC they’re in has lost network. The loss of network cut short the delivery of the change log to R2, R3, R4. They never got it.
Should we promote either R2, R3, or R4, we lose consistency. The app expects the result of that UPDATE to be there, but neither of these servers have the data. Moreover, even human intervention is limited. Since the data is only found on primary and R1, we cannot get hold of the data. A human can always transport to the physical server to extract the data and send it over via cellular network, or even by car. It’s likely that this will take substantial time.
A predefined failover plan may choose to promote, say, R2, after all, losing some data, but cutting short on outage time. Reconciliation will have to take place afterwards.
Do we even know the status?
If the primary and one of the semi-sync replicas go offline, we have an additional problem: we can’t say for sure whether any of the remaining semi-sync replicas got the latest updates from the primary. It’s possible that they did. But it’s also possible that the single semi-sync replica that went offline, was the single one to get and acknowledge some last writes taking place on the primary.
To generalize, if the number of lost semi-sync replicas is equal to, or greater than rpl_semi_sync_master_wait_for_slave_count, then we do not know whether the remaining replicas are consistent.
Split Brain
But, arguably worse, is that we can now be in a split brain situation. Apps in the primary datacenter were still writing to the old primary when the network went down. Those writes continued to run. The old primary now receives writes never seen by R2 (the newly promoted primary). Once we do wish to reconcile, we are faced with contradicting information.
Physical locations
The geo-distribution of our servers plays a key part in how tolerant our system is for failure and what outcomes we can expect.
Colocated primary and semi-sync replicas
On a normal day, writes to the primary have low latency: acknowledgements from semi-sync replicas are quick to arrive thanks to the fast network within a datacenter.
When the primary fails, we have at least one good semi-sync replica to promote (or we can use it to seed a different server to promote). Because we promote within the same datacenter, app behavior is unlikely to be affected, other than the obvious disruption during the failover time.
If the datacenter is network-isolated, we need to make a decision: wait out the outage, incurring downtime to our services, or promote a new primary in another datacenter. We risk losing transactions (we cannot know whether we lose transactions until we are again able to connect to the original primary). We also risk a split brain scenario.
Cross-site semi-sync replication
If we only run semi-sync replicas in a different datacenter than the primary’s, we first pay with increased write latency, as each commit runs a roundtrip outside the primary’s datacenter and back. With multiple semi-sync replicas it’s the time it takes for the fastest replica to respond.
When the primary goes down, we have the data durable outside its datacenter. But we can then also compare with non semi-sync replicas in the primary’s datacenter: they may yet have all the transactions, too, in which case we can promote one of them. Again, promoting within the same datacenter tends to be less disruptive to the application. Or, we can spend some time seeding them from one of our semi-sync replicas.
Or, we can choose to promote our semi-sync replicas, switching the primary data center for the replication cluster. We then must reassign semi-sync replicas, and ensure none run from within the new primary’s datacenter.
In the event of the primary’s datacenter network isolation, we promote one of our semi-sync replicas, in a different DC. We reconfigure semi-sync replicas, before making the new primary writable. This avoids a split brain scenario, assuming all of our semi-sync replicas are available to us.
What if two or more sites get network isolated at the same time?
It largely depends on your hardware distribution.
If your servers are distributed across three sites, then the majority of your sites is now down. Chances are you were only planning for single site outage at any given time, and a two-site outage is a hard downtime for you. You may or may not have the capacity to run with just one third of your hardware. You’d need to risk consistency and split brain.
Let’s assume we run out of 5 different sites (or availability zones), and 2 are down. In our “1-n” setup, split brain is still possible, even though the majority of sites are up. That’s because our semi-sync setup is about the communication between a primary and any of its replicas.
But this is also something you can attach a number to. What is the probability of two availability zones going down at the same time? For a given vendor? Cross vendors? Is there a number you’re willing to accept? What is the probability of the two zones being in outage both over X minutes? Will you be willing to wait out X minutes to maintain durability and consistency?
In our physical world, there is no hard limitation to how many sites could go down at the same time. We’re generally willing to accept that some risks are so low that we can accept them.
Quorum
Of special interest is that in our “1-n” scenario, we have a quorum of two servers out of five or more. The primary, with a single additional replica, are able to form a quorum and to accept writes. That’s how we got to have a split brain. While R2, R3, R4 form a majority of the servers, writes took place without their agreement.
People familiar with Paxos and Raft consensus protocols may find this baffling. However, reliable minority consensus is achievable, and Sugu Sougoumarane’s Consensus Algorithms series of posts continues to describe this.]]></content>
        <summary><![CDATA[We look at some basics and follow up to present scenarios that require higher level intervention to ensure availability and to avoid split brains from taking place.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 3 - Use cases</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-3" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-3</id>
        <published>2020-09-26T04:00:00.000Z</published>
        <updated>2020-09-26T04:00:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
Recap of parts 1-3
Here is a recap of what we covered in the last blog:
Durability is the main reason why we want to use a consensus system.
Since Durability is use-case dependent, we made it an abstract requirement requiring the consensus algorithms to assume nothing about the durability requirements.
We started off with the original properties of a consensus system as defined by Paxos and modified it to make it usable in practical scenarios: instead of converging on a value, we changed the system to accept a series of requests.
We narrowed our scope down to single leader systems.
We came up with a new set of rules that are agnostic of durability. The essential claim is that a system that follows these rules will be able to satisfy the requirements of a consensus system. Specifically, we excluded some requirements like majority quorum that have previously been used as core building blocks in consensus algorithms.
Consensus Use Cases
If there was no need to worry about a majority quorum, we would have the flexibility to deploy any number of nodes we require. We can designate any subset of those nodes to be eligible leaders, and we can make durability decisions without being influenced by the above two decisions. This is exactly what many users have done with Vitess. The following use cases are loosely derived from real production workloads:
We have a large number of replicas spread over many data centers. Of these, we have fifteen leader capable nodes spread over three data centers. We don’t expect two nodes to go down at the same time. Network partitions can happen, but only between two data centers; a data center will never be totally isolated. A data center can be taken down for planned maintenance.
We have four zones with one node in each zone. Any node can fail. A zone can go down without notice. A partition can happen between any two zones.
We have six nodes spread over three zones. Any node can fail. A zone can go down without notice. A partition can happen between any two zones.
We have two regions, each region has two zones. We don’t expect more than one zone to go down. A region can be taken down for maintenance, in which case we want to proactively transfer writes to the other region.
I have not seen anyone ask for a durability requirement of more than two nodes. But this may be due to difficulties dealing with corner cases that MySQL introduces due to its semi-sync behavior. On the other hand, these settings have served the users well so far. So, why become more conservative?
These configurations are all uncomfortable for a majority based consensus system. More importantly, these flexibilities will encourage users to experiment with even more creative combinations and allow them to achieve better trade-offs.
Reasoning about Flexible Consensus
The configurations in the previous section seem to be all over the place. How do we design a system that satisfies all of them, and how do we future-proof ourselves against newer requirements?
There is a way to reason about why this flexibility is possible. This is because the two cooperating algorithms (Request and Election) share a common view of the durability requirements, but can otherwise operate independently.
For example, let us consider the five node system. If a user does not expect more than one node to fail at any given time, then they would specify their durability requirement as two nodes.
The leader can use this constraint to make requests durable: as soon as the data has reached one other node, it has become durable. We can return success to the client.
On the election side, if there is a failure, we know that no more than one node could have failed. This means that four nodes will be reachable. At least one of those will have the data for all successful requests. This will allow the election process to propagate that data to other nodes and continue accepting new requests after a new leader is elected.
In other words, a single durability constraint dictates both sides of the behavior; if we can find a formal way to describe the requirements, then a request has to fulfil those requirements. On the other hand, an election needs to reach enough nodes to intersect with the same requirements.

For example, if durability is achieved with 2/5 nodes, then the election algorithm needs to reach 4/5 nodes to intersect with the durability criteria. In the case of a majority quorum, both of these are 3/5. But our generalization will work for any arbitrary property.
Worst Case Scenario
In the above five node case, if two nodes fail, the failure tolerance has been exceeded. We can only reach three nodes. If we don’t know about the state of the other two nodes, we will have to assume the worst case scenario that a durable request could have been accepted by the two unreachable nodes. This will cause the election process to stall.
If this were to happen, the system has to allow for a compromise: abandon the two nodes and move forward. Otherwise, the loss of availability may become more expensive than the potential loss of that data.
Practical Balance
A two-node durability does not always mean that the system will stall or lose data. A very specific sequence of failures have to happen:
Leader accepts a request
Leader attempts to send the request to multiple recipients
Only one recipient receives and acknowledges the request
Leader returns a success to the client
Both the leader and that recipient crash
This type of failure can happen if the leader and the recipient node are network partitioned from the rest of the cluster. We can mitigate this failure by requiring the ackers to live across network boundaries.
The likelihood of a replica node in one cell failing after an acknowledgment, and a master node failing in the other cell after returning success, is much lower. This failure mode is rare enough that many users treat this level of risk as acceptable.
Orders of Magnitude
The most common operation performed by a consensus system is the completion of requests. In contrast, a leader election generally happens in two cases: taking nodes down for maintenance, or upon failure.
Even in a dynamic cloud environment like Kubernetes, it would be surprising to see more than one election per day for a cluster, whereas such a system could be serving hundreds of requests per second. That amounts to many orders of magnitude in difference between a request being fulfilled and a leader election.
This means that we must do whatever it takes to fine tune the part that executes requests, whereas leader elections can be more elaborate and slower. This is the reason why we have a bias towards reducing the durability settings to the bare minimum. Expanding this number can adversely affect performance, especially the tail latency.
At YouTube, although the quorum size was big, a single ack from a replica was sufficient for a request to be deemed completed. On the other hand, the leader election process had to chase down all possible nodes that could have acknowledged the last transaction. We did consciously trade off on the number of ackers to avoid going on a total wild goose chase.
In the next blog post, we will take a short detour. Shlomi Noach will talk about how some of these approaches work with MySQL and semi-sync replication. Following this, we will continue pushing forward on the implementation details of these algorithms.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
Consensus Algorithms at Scale: Part 2 — Rules of consensus
You just read: Consensus Algorithms at Scale: Part 3 — Use cases
Next up: Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Consensus Algorithms at Scale: Part 5 — Handling races
Consensus Algorithms at Scale: Part 6 — Completing requests
Consensus Algorithms at Scale: Part 7 — Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[Consensus Use Cases]]></summary>
      </entry>
    
      <entry>
        <title>Orchestrator failure detection and recovery: New Beginnings</title>
        <link href="https://planetscale.com/blog/orchestrator-failure-detection" />
        <id>https://planetscale.com/blog/orchestrator-failure-detection</id>
        <published>2020-09-19T04:00:00.000Z</published>
        <updated>2020-09-19T04:00:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Orchestrator is an open source MySQL replication topology management and high availability solution. Vitess has recently integrated orchestrator as a native component of its infrastructure to achieve reliable failover, availability, and topology resolution of its clusters. This post first illustrates the core logic of orchestrator’s failure detection, and proceeds to share how the new integration adds new failure detection and recovery scenarios, making orchestrator’s operation goal-oriented.__
Note: in this post we adopt the term “primary” over the term “master” in the context of MySQL replication.
Orchestrator’s holistic failure detection
Vitess and orchestrator both use MySQL’s asynchronous (async) or semi-synchronous replication. For the purposes of this post, the discussion is limited to async replication. In an async setup, we have one primary server and multiple replicas. The primary is the single writable server and the replicas are all read-only, mainly being used for read scale-out, backups, etc. While MySQL offers a multi-writable primaries setup, it is commonly discouraged, and Vitess does not support it (in fact, a multi-writer setup is considered a failure scenario as described later on).
The most critical and important failure scenario in an async topology is a primary’s outage. Either the primary server has crashed, or is network isolated: the result is that there are no writes on the cluster, and the replicas are left hanging with no server to replicate from.
Common failure detection practices
How does one diagnose that the primary server is healthy? A common practice is to see that port :3306 is open. More reliably, we can send a trivial query, such as SELECT 1 FROM DUAL. Or even more reliable is to query for actual information: a status variable, or actual data. All these techniques share a similar problem. What if the primary server doesn’t respond?
A naive conclusion is that the primary is down, kicking off a failover sequence. However, this may well be a false positive since there could be a network glitch. It is not uncommon to miss a communication packet once in a while, so database clients are commonly configured to retry a couple times upon error. The common way to reduce such false positives is to run multiple checks, successively: if the primary fails a health check, try again in, say, 5 seconds, and again, and again, up to n times. If the nth test still fails, we determine the server is indeed down.
This approach yet introduces a few problems:
Exactly when is enough tests?
Exactly what is a reasonable check interval?
What if the primary is really down? We have wasted **n***interval seconds to double check, triple check, etc., when we could have failed over sooner.
What if the primary is really _up, and the problem is with the network between the primary and our testing endpoint? That’s a false negative and we failed over for nothing.
Consider the last bullet point. Some monitoring solutions run health checks from multiple endpoints, and require a quorum, an agreement of the majority of check endpoints that there is indeed a problem. This kind of setup must be used with care; the placement of the endpoints in different availability zones is critical to achieve sensible quorum results. Once that’s done, though, the triangulation is powerful and useful.
Orchestrator’s approach
Orchestrator uses a different take on triangulation. It recognizes that there are more players in the field: the replicas. The replicas connect to the primary over MySQL protocol, and request the changelog so as to follow up on the primary’s footsteps. To evaluate a primary failure, orchestrator asks:
Am I failing to communicate with the primary? And,
Are all replicas failing to communicate with the primary?
If, for example, orchestrator is unable to reach the primary, but can reach the replicas, and the replicas are all happy and confident that they can read from the primary, then orchestrator concludes there’s no failure scenario. Possibly some of the replicas themselves are unreachable: maybe a network partitioning or some power failure took both primary and a few of the replicas. orchestrator can still reach a conclusion by the state of all available replicas. It’s noteworthy that orchestrator itself runs in a highly available setup, cross availability zones, where orchestrator requires quorum leadership so as to be able to run failovers in the first place, mitigating network isolation incidents. But this discussion is outside the scope of this post. Orchestrator doesn’t do check intervals and a number of tests. It needs a single observation to act. Behind the scenes, orchestrator relies on the replicas themselves to run retries in intervals; that’s how MySQL replication works anyhow, and orchestrator utilizes that.
This holistic approach, where orchestrator triangulates its own checks with the servers’ checks, results in a highly reliable detection method. Iterating our example, if orchestrator thinks the primary is down, and all the replicas say the primary is down, then a failover is justified: the replication cluster is effectively not receiving any writes, the data becomes stale, and that much is observable all the way to the users and client apps. The holistic approach further allows orchestrator to treat other scenarios: an intermediate replica (e.g. 2nd level replica in a chained replication tree) failure is detected in exactly the same way. It further offers granularity into the failure severity. orchestrator is able to tell that the primary is seen down, while replicas still disagree. Or that replicas think the primary is down while orchestrator can still see it.
Emergency detection operations
If orchestrator can’t see the primary, but can see the replicas, and they still think the primary is up, should this be the end of the story?
Not quite. We may well have an actual primary outage, it’s just that the replicas haven’t realized it yet. If we wait long enough, they will eventually report the failure; but orchestrator wishes to reduce total outage time by resolving the situation as early as possible.
Orchestrator offers a few emergency detection operations, which are meant to speed up failure detection. Examples:
As in the above, orchestrator can’t see the primary. Emergently probe the replicas to check what they think. Normally each server is probed once in a few seconds, but orchestrator now chooses to probe sooner.
A first tier replica reports it can’t see the primary. The rest of the replicas are fine, and orchestrator can see the primary. This is still very suspicious, so orchestrator runs an emergency probe on the primary. If that fails, then we’re on to something, falling back to the first bullet.
orchestrator cannot reach the primary, replicas can all reach the primary, but lag on replicas is ever increasing. This may be a limbo scenario caused by either a locked primary, or a “too many connections” situation. The replicas are likely to be some of the oldest connections to the primary. New connections cannot reach the primary and to the app it seems down, but replicas are still connected. orchestrator can analyze that and emergently kick a replication restart on all replicas. This closes and reopens the TCP connections between replicas and primary. On locked primary or on “too many connections” scenarios, replicas are expected to fail reconnecting, leading to a normal detection of a primary outage.
Orchestrator and your replication clusters
An important observation is that orchestrator knows what your replication clusters actually look like, but doesn’t have the meta information about how they should look like. It doesn’t know if some standalone server should belong to this or that cluster; if the current primary server is indeed what’s advertised to your application; if you really intended to set up a multi-primary cluster. It is generic in that it allows a variety of topology layouts, as requested and used by the greater community.
Old Vitess-orchestrator integration
For the past few years, orchestrator was an external entity to Vitess. The two would collaborate over a few API calls. orchestrator did not have any Vitess awareness, and much of the integration was done through pre- and post- recovery hooks, shell scripts and API calls. This led to known situations where Vitess and orchestrator would compete over a failover, or make some operations unknown to each other, causing confusion. Clusters would end up in split state, or in co-primary state. The loss of a single event could cause cluster corruption.
Orchestrator as first class citizen in Vitess
We have recently integrated orchestrator into Vitess as an integral part of the vitess infrastructure. This is a specialized fork of orchestrator, that is Vitess-aware. In fact, the integrated orchestrator is able to run Vitess native functions, such as locking shards or fetching tablet information.
The integration makes orchestrator both cluster aware and goal driven.
Cluster-awareness
MySQL itself has no concept of a replication cluster (not to be confused with InnoDB cluster or MySQL Cluster): servers just happen to replicate from each other, and MySQL has no opinion on whether they should replicas from each other, or what’s the overall health and status of the replication tree. orchestrator can share observations and opinions on the replication tree, based on what it can see. Vitess, however, has a firm opinion on what it expects. In Vitess, each MySQL server has its own vttablet, an agent of sorts. The tablet knows the identity of the MySQL server: which schema it contains; part of what shard it is; what role it assumes (primary, replicas, OLAP, ...) etc. The integrated orchestrator now gets all of the MySQL metadata directly from the Vitess topology server. It knows beyond doubt that two servers belong to the same cluster, not because they happen to be connected in a replication chain, but because the metadata provided by Vitess says so. orchestrator can now look at a standalone, detached server, and tell that it is, in fact, supposed to be part of some cluster.
Goal driven
This cluster awareness is a fundamental change in orchestrator’s approach, and allows us to make orchestrator goal-driven. orchestrator’s goal is to ensure a cluster is always in a state compatible with what Vitess expects it to be. This is accomplished by introducing new failure detection modes not possible before, and new recovery methods too opinionated otherwise. Examples:
orchestrator observes a standalone server. According to Vitess’ topology server, that server is a REPLICA. orchestrator diagnoses this as a “replica without a primary” and proceeds to connect it with the proper replication cluster, after validating that GTID-wise the operation is supported.
orchestrator observes a REPLICA that is writable. Vitess does not support that setup. orchestrator turns the replica to be read-only.
Likewise, orchestrator sees that the primary is read-only. It switches it to be writable.
orchestrator detects a multi-primary setup (circular replication). Vitess strictly forbids this setup. orchestrator checks with the topology service which of the two is marked as the true primary, then makes the other(s) standard replicas. To emphasize the point, a multi-primary setup is considered to be a failure scenario.
Possibly the most intriguing scenario is where orchestrator sees a fully functional replication tree, with writable primary and read-only replicas, but notices that Vitess thinks the primary should be one of the replicas, and that the server that acts as the cluster’s primary should be a replica. This situation can result from a previously, prematurely terminated failover process. In this situation, orchestrator runs a graceful-takeover (or a planned-reparent, in Vitess jargon) to actually promote the correct server as the new primary, and to demote the “impersonator” primary.
Thus, Vitess has an opinion of what the cluster should look like, and orchestrator is the operator that makes it so. It is furthermore interesting to note that orchestrator’s operations will either fail or converge to the desired state.
But, what if a primary unexpectedly fails? What server should be promoted?
Orchestrator’s promotion logic
On an unexpected failure, it is orchestrator’s job to pick and promote the most suitable server, and to advertise its identity to Vitess. The new interaction ensures this is a converging process and that orchestrator and vitess do not conflict with each other over who should be the primary. Orchestrator promotes a server based on multiple limiting factors: is the server configured such that it can be a primary, e.g. has binary logs enabled? Does its version match the other replicas? What are the general recommendation for the specific host (metadata acquired from Vitess). But there are also general, non server-specific rules, that dictates what promotions are possible. Do we strictly have to only failover within the same data center? The same region/availability zone? Or, do we strictly have to only failover outside the data center? Do we only ever failover onto a server configured as semi-sync replica? And how do we reconfigure the cluster after promotion?
Previously, some of these questions were answered by configuration variables, and some by the user’s infrastructure. However, the new integration allows the user to choose a failover and recovery policy, that is described in code. Orchestrator and Vitess already support three pre-configured modes, but will also allow the user to define any arbitrary (within a set of rules) policy they may choose.
More on that in a future post.]]></content>
        <summary><![CDATA[How the new integration adds new failure detection and recovery scenarios making orchestrator’s operation goal-oriented.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 2 - Rules of consensus</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-2" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-2</id>
        <published>2020-09-09T04:00:00.000Z</published>
        <updated>2020-09-09T04:00:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[If you’re still catching up, you can find links to each article in the series at the bottom of this article.
The Rules of Consensus
YouTube Scale
When we were running Vitess at YouTube, there were tens of thousands of nodes serving very high QPS. The scale out was in all dimensions: some of the shards had over fifty replicas. The topology was complicated with these nodes being spread out across multiple data centers. To make this work, we had to strike a balance between latency, availability and durability. To meet these requirements, we used to perform regular failovers that were mostly automated. I am happy to say we never lost data due to hardware failure.
The first time we heard about Paxos, it sounded magical: an algorithm that will dynamically elect a leader to ensure that all requests are fulfilled without errors, divergence, or data loss. The Vitess failovers used to take a few seconds, and we wanted to avoid serving errors during this period.
We started to evaluate Paxos to see if it could be retrofitted into Vitess. We quickly found that our quorum sizes would have been too big for a majority based algorithm. Also, the MySQL replication mechanism didn’t look anything like the durability mechanism Paxos was describing. Our only option was to do a gap analysis between the two systems. In a way, this is what led to the discovery of FlexPaxos: an additional knob that allows you to achieve a more meaningful performance vs. safety trade-off.
There were other differences: our failover algorithms did not look anything like what Paxos recommends. Studying how the two systems differed led to the discovery of a common set of principles that any leader-based system can follow to guarantee correctness and safety. We will cover these rules in this blog post.

Why Consensus
We are focused on using Consensus to address durability at scale. It is possible that there are other use cases, but we are not concerned about those.
No system can give you absolute durability guarantees; there is always the possibility of a catastrophic failure that is bigger than anticipated. You must decide the level of failure tolerance you want. This depends on the reliability of the resources and the criticality of the data. In other words, durability requirements are use-case dependent.
To accommodate all possible use cases, we will treat durability as an abstract requirement. The algorithms must be agnostic of these requirements, and should be able to accommodate arbitrarily complex rules. This changes the way we approach the problem, and we will go through this exercise in the following sections.
Single Value Behavior
Paraphrasing the definition from Paxos: the primary guarantee we want from a consensus system is that it must not forget a value it has acknowledged as accepted. Once a value is accepted, all other values must be rejected.
When asked to accept a single value, the operation would have one of the three following outcomes:
Accepted: the value was successfully accepted.
Rejected: the value was rejected.
Failed: the operation did not succeed, but may succeed later.
If the first request was Accepted, then any subsequent attempts to write a different value will be Rejected.
If the first request was Rejected, it likely means that a previous value was accepted before our “first” attempt. In this case, subsequent requests will also be Rejected.
If the first request Failed and a second request is made, the system can choose to finalize either of the requests as Accepted, but not both. Since the second request can also Fail, we need to restate this more generally: the system can choose to Accept any previously requested values as final. Pathological failure modes can cause the system to remain in the Failed state indefinitely. But it is generally expected to converge eventually.
In Practice
It is not very practical for a system to just accept a single value. Instead, let us see what should happen if we changed the specifications to a system that accepts a series of values, which is what storage systems typically do.
If a system first accepts a value v1, and later receives a request for v2, it must record v2 as having happened after v1. The more significant property is the following: If the request for v1 failed because the system was not able to meet the durability requirements, then a request for v2 requires the system to make a final decision on whether v1 should be completed or rejected. If completed, it will record v2 after v1. Otherwise, v1 is discarded and v2 will be the only accepted value.
Raft understood this, which is why they describe their system as a way to achieve consistent log replication.
In our case, we will think in terms of requests rather than values, which can be any operation a storage system may have to perform. It could be a transaction, or setting the value for a key, or any other atomic operation.
Let us restate: The purpose of a consensus system is to accept a series of requests in a strict order and keep them consistent across multiple nodes.
Single Leader
To limit complexity and scope, we are going to stick to single leader designs. The popular implementations that I know use the single leader approach. There is research on leaderless and multi-leader algorithms. But I am not very familiar with them.

A Single Leader consensus system is a combination of two workflows that cooperate with each other:
A leader accepts requests and makes them durable.
A new leader can be elected to resume requests without divergence or loss of data.
Paxos and Raft also use the single leader approach.
The Rules
Now that we have spent enough time building up the premise, it is time to codify the rules governing a consensus system:.
A Leader’s job is to fulfill requests by satisfying the mandated durability requirements.
To elect a new Leader, the following actions must be performed:
Terminate the previous Leadership, if any.
Recruit the necessary nodes for the new Leader.
Propagate previously completed requests to satisfy the new Leader’s durability requirements.
Forward Progress: If a Leader election fails, a subsequent re-attempt should have a path to success, where a new Leader can be elected without breaking the durability and safety guarantees.
Race: If concurrent attempts are being made to elect a Leader, at most one Leader must prevail.
Rules 3 & 4 are actually implicit in rule number 2. But these properties are so important that it’s worth making them explicit.
These rules are intentionally generic to allow for creativity in achieving these goals. In fact, being more specific than this will exclude some valid implementations. However, we will show multiple ways to satisfy these rules. We will also validate the existing popular algorithms against these new set of rules.
You will notice the following differences:
No mention of a majority quorum.
No mention of intersection of nodes.
No proposal numbers.
This is where we deviate from traditional systems because we believe these are not strictly required for a consensus system to operate correctly. For example, you can build a consensus system with fifty nodes, but still only have a quorum size of two. There is no need for these quorums to intersect across leaders. As for proposal numbers, very few understand why they are even needed. It is better to discuss what we are trying to achieve, and then introduce proposal numbers as one option, and maybe consider alternatives that don’t involve proposal numbers. We will drill down into each of these properties and explore trade-offs between multiple approaches.
In the next post, we will cover some practical use cases that this generalized set of rules allows us to cover. We will also drill deeper into the meaning and significance of these rules.
Read the full Consensus Algorithms series
Consensus Algorithms at Scale: Part 1 — Introduction
You just read: Consensus Algorithms at Scale: Part 2 — Rules of consensus
Next up: Consensus Algorithms at Scale: Part 3 — Use cases
Consensus Algorithms at Scale: Part 4 — Establishment and revocation
Consensus Algorithms at Scale: Part 5 — Handling races
Consensus Algorithms at Scale: Part 6 — Completing requests
Consensus Algorithms at Scale: Part 7 — Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[The Rules of Consensus]]></summary>
      </entry>
    
      <entry>
        <title>On joining PlanetScale and the vision of open source database infrastructure</title>
        <link href="https://planetscale.com/blog/on-joining-planetscale-and-the-vision-of-open-source-database-infrastructure" />
        <id>https://planetscale.com/blog/on-joining-planetscale-and-the-vision-of-open-source-database-infrastructure</id>
        <published>2020-09-01T19:30:00.000Z</published>
        <updated>2020-09-01T19:30:00.000Z</updated>
        
        <author>
          <name>Shlomi Noach</name>
        </author>
        
        
        <category term="company" />
        
        <content><![CDATA[This is a personal account on why I joined PlanetScale to work on Vitess and PlanetScaleDB, and what I perceive Vitess can become in the MySQL open source ecosystem.
Background
I am a software engineer, drawn into the database world. I’ve been working with MySQL for two decades now, and am authoring or have authored various open source tools in the MySQL ecosystem: orchestrator for high availability, gh-ost for online schema migrations, freno for throttling and others. In the past decade I’ve worked mostly on infrastructure and distributed systems, work that produced those open source solutions.
The MySQL community enjoys a plethora of open source solutions, and an open discussion between its members. Many companies run MySQL infrastructure as means to enable their product, and are happy to share advice and experience on all things database infrastructure.
Yet, throughout the years I’ve experienced a growing frustration. As much as we, the community, collaborated and shared solutions, we were still all re-inventing the same things. Writing an open source tool and sharing it with the world is great, but deploying and automating it on another platform always hits issues. Different companies have different infrastructure, different deployment mechanisms, different network setups, different availability zones, different expectations on how to address availability issues, different versions, different topology layouts, etc. The effort to integrate a 3rd party tool is sometimes measured in months. Companies would just resort to developing their own in-house solutions, tightly integrated with their own infrastructure and flow.
We’ve had some discussions about that. We tried to run some community collaboration on a project or two. But people, myself included, have their own workload and schedule, and this didn’t take off. There were ideas for better integration between some of the open source tools. Some such integrations exist. But there’s nothing to the extent that solves or seriously empowers infrastructure and deployments.
Enter Vitess
Many consider Vitess an “open source sharding framework for MySQL”, and, indeed, Vitess solves some of the hardest problems in the database world: sharding, live resharding, data locality, and geo-placement. But Vitess is more than that. Architecturally, it provides the mechanisms to run a full blown infrastructure for a database deployment.
Vitess runs in a proxy/agent/controller setup. Clients connect to the proxy (vtgate) which routes their queries to the correct agents/shards/backend databases (vttablet). A controller (vtctld) can refactor and operate on topologies. This puts Vitess in the position to be able to automate away, and hide, much of database operational complexity.
Vitess’ design puts it in a position to further address more of the hardest problems: high availability, consistency, disaster recovery, online schema migrations, consistent reads, throttling, automated and tested backups and restores, rolling upgrades, data retention, local serving, load balancing, service discovery, and more. Some of these are already in the works, or already exist and are being improved at this time (more to come in future technical blog posts). And it is this design that allows Vitess to interact with, or fully integrate, existing open source tools in the MySQL ecosystem.
In my personal vision, Vitess is not a sharding framework, but an infrastructure framework for MySQL. Setup Vitess and get online schema migrations for free, no setup required. Get failovers and service discovery out of the box. Customize your HA/consistency rules if you like. Get throttling and consistent reads built in. I envision Vitess as the Grand Unified Theory for database infrastructure, that integrates existing tools, knowledge and experience in one well coordinated solution.
Fortunately my vision is compatible with the vision that the PlanetScale team has for Vitess and MySQL. We will work to make Vitess itself simpler and easier to install and use, while bringing in more infrastructure solutions. We plan to write and share more about the engineering challenges and developments we do here.
Shlomi Noach Software Engineer and database geek github.com/shlomi-noach @ShlomiNoach]]></content>
        <summary><![CDATA[Why I joined PlanetScale to work on Vitess and PlanetScaleDB and what I perceive Vitess can become in the MySQL open source ecosystem.]]></summary>
      </entry>
    
      <entry>
        <title>Consensus algorithms at scale: Part 1 - Introduction</title>
        <link href="https://planetscale.com/blog/consensus-algorithms-at-scale-part-1" />
        <id>https://planetscale.com/blog/consensus-algorithms-at-scale-part-1</id>
        <published>2020-08-28T07:00:00.000Z</published>
        <updated>2020-08-28T07:00:00.000Z</updated>
        
        <author>
          <name>Sugu Sougoumarane</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Be sure to follow along with this eight part series. You will find all posts in the series linked at the bottom of each article.
Introduction
Consensus algorithms in their theoretical and applied forms can be difficult to reason about. Often, these algorithms are solutions that have stumbled upon some good problems to solve. Unfortunately, the problems are evolving. And I don’t think these solutions are going to remain relevant much longer. Let’s start with defining the problems they solve:
Distributed Durability: In case of node failures, your data is guaranteed to be elsewhere.
Availability: The ability for the system to continue serving if some nodes have become unavailable.
Automation: If there is a failure, the system knows how to remedy itself without human intervention.

Strictly speaking, one could argue that Automation is a different theoretical problem because it requires failure detection. But the reality is that today’s systems expect consensus systems to satisfy the above properties.
Let us now turn this around: If we had started out with these requirements, would we have ended up with something like Paxos or Raft as the best solution? Before we can answer this question, we need a better understanding of the requirements.
More importantly, cloud providers are coming up with complex topologies like zones and regions. They have pricing structures that encourage specific configurations. It is important that the systems we build are capable of adapting to these nuances. It is only a matter of time before these rigid algorithms start to run out of flexibility.
The spoiler here is that we are building this type of flexibility in Vitess: You specify what is important to you, and what (reasonable) trade-offs you are willing to make. And Vitess will have the knobs to exactly match these parameters without compromising on anything else.
However, we need to satisfy the skeptic’s concern: can you build such a system using vanilla MySQL? The short answer is yes.
The approach
In this series of blog posts, I’ll take you through a journey where we will dissect consensus algorithms. We’ll break them up into smaller concerns, and we’ll build a new set of rules and principles using a variety of more flexible algorithms which can be built. We will conclude with how to achieve these objectives in Vitess.
As a disclaimer, this is an engineering approach. So, if you are expecting proof, you’ll likely be disappointed. I will instead be using and sharing intuitions developed from running storage systems at massive scale. Consequently, we will make two changes to how we approach this problem:
Use engineering terminology. This is more for my own sake, because it is hard to reason about how an academic concept maps to real-world scenarios.
Use an approach based on objectives to be achieved: approaching the problem top-down, identifying the concerns, and keeping them separate.
The second aspect is significant because most consensus algorithms perform orchestrated actions that achieve multiple objectives at the same time. It is hard to know why a decision was made a certain way and what the trade-offs are if a different approach was used.
With a better understanding of the concerns, we can make better trade-offs without being stuck with rigid implementations.
Read the full Consensus Algorithms series
You just read: Consensus Algorithms at Scale: Part 1 — Introduction
Next up: Consensus Algorithms at Scale: Part 2 — Rules of consensus
Consensus Algorithms at Scale: Part 3 - Use cases
Consensus Algorithms at Scale: Part 4 - Establishment and revocation
Consensus Algorithms at Scale: Part 5 - Handling races
Consensus Algorithms at Scale: Part 6 - Completing requests
Consensus Algorithms at Scale: Part 7 - Propagating requests
Consensus Algorithms at Scale: Part 8 — Closing thoughts]]></content>
        <summary><![CDATA[This is a multi-part blog series and will be updated with links to the corresponding posts.]]></summary>
      </entry>
    
      <entry>
        <title>Learn Horizontal Scaling on PlanetScaleDB with Vitess — Rate Puppies in a Rust app with Sharded MySQL Database</title>
        <link href="https://planetscale.com/blog/learn-horizontal-scaling-on-planetscaledb-with-vitess-rate-puppies-in-a-rust-app-with-sharded-mysql-database" />
        <id>https://planetscale.com/blog/learn-horizontal-scaling-on-planetscaledb-with-vitess-rate-puppies-in-a-rust-app-with-sharded-mysql-database</id>
        <published>2020-08-14T19:45:00.000Z</published>
        <updated>2020-08-14T19:45:00.000Z</updated>
        
        <author>
          <name>Jiten Vaidya</name>
        </author>
        
        
        <category term="engineering" />
        
        <content><![CDATA[Since writing this blog we have released a new version of PlanetScale. Learn more about what we’ve built and give it a try, and be sure to check out our docs.
Please note, this blog refers to PlanetScaleDB v1 and is not applicable to our latest product.
At PlanetScale, we have built PlanetScaleDB, a fully managed database-as-a-service on top of open source Vitess that enables horizontal scaling of MySQL far beyond what you can do with a single instance. In this blog, we’ll explain how sharding works in Vitess and on PlanetScaleDB.
A sharded database is a collection of multiple databases (shards) with identical relational schemas. Vitess allows your application to treat a sharded database as though it is a humongous monolithic database without having to worry about the complexities of sharding. Because of this, you can start with a small database on PlanetScaleDB and grow to massive scale without changing your application logic.
In this blog post, we’ll explore the Vitess sharding concepts: VSchema, Vindex, and Vitess Sequences using a sample dog rating application called “Goodest Doggo”. This sample app allows users to rate puppies and as you can imagine, needs to be designed to grow to humungous scale.
How is Vitess sharding different?
A sharded database is a collection of multiple databases (shards) with identical relational schemas. Many database systems that use shards for scaling shard the data in a table without consideration for co-locating the data that belongs together. This results in inefficiencies around writing the data in a transactionally consistent fashion as well as reading the data. Vitess, in contrast, allows you to co-locate the related data.
Assume you have a users table with the id column as the primary key and an orders table with its own id column as its primary key and user_id column as a secondary key. Vitess allows you to shard the user table using its id column and shard the orders table using the user_id column. This ensures both the user row for a user and the order rows for that user live in the same shard. A VSchema allows you to express this information.
In other words, just as a relational schema tells us how the data is organized within a single database, using tables, columns and indexes, Vitess uses a VSchema to define how the data is organized in shards across multiple databases. This allows us to define mechanisms which make it more efficient to access the data from these shards.
What are the elements of a VSchema?
Just as you would write a SQL statement that defines your relational database schema by creating table definitions which define columns and indexes, you define a VSchema (or a sharding scheme as it is called in PlanetScaleDB) by populating a JSON document that has the following information:
"sharded": whether the database is sharded or not
"vindexes": definition of all vindex types used in the VSchema
“tables”: one entry for each table, each table entry has the following information:
a. Primary Vindex applied to a column in the table b. (optional) Secondary Vindexes applied to columns c. (optional) Sequences definitions columns for ids
As an example, consider the following table named “puppers” that represents the dogs in our application:CREATE TABLE IF NOT EXISTS puppers (
  `id` BIGINT(22),
  `name` VARCHAR(256),
  `image` VARCHAR(256),
  PRIMARY KEY(id)
);

Here is an example of a simple VSchema for a sharded database for the table above that illustrates the three elements of a VSchema:{
  "sharded": true,
  "vindexes": {
    "hash_vdx": {
      "type": "hash"
    }
  },
  "tables": {
    "puppers": {
      "column_Vindexes": [
        {
          "column": "id",
          "name": "hash_vdx"
        }
      ]
    }
  }
}

What is a Vindex?
A relational database has a schema that consists of tables, columns and indexes amongst other elements. In the same way an index makes it efficient to access a given row in a given table in an unsharded database, a Vindex for a given table in a sharded database allows you to access a row quickly by allowing you to determine which shard a row lives in. Like indexes, Vindexes can be primary or secondary and a given table can only have a single primary Vindex, but can have multiple secondary Vindexes.
How does a Vindex map a row to a shard?
To understand this, you will have to understand the concept of keyspaces and keyspace_ids. In the Vitess world every row in every sharded database has a keyspace_id. The keyspace_id is not stored, but is computed by applying a specific sharding function to the value of a specific column in that row. The keyspace_ids range from 0x00 - 0xFF and this range represents the entire keyspace. Vitess shards cover this entire range. This is why a sharded database in Vitess is called a keyspace. Each shard spans a range in the keyspace and shards are named by starting and ending keyspace_id values for the shard. Let us take the example of a 4-shard keyspace. If we divide the range 0x00 to 0xFF (hexadecimal values) in four equal ranges, we get:0x00-0x40
0x40-0x80
0x80-0xC0
0xC0-0xFF

Thus the four shards would be named: “00-40”, “40-80”, “80-C0”, “C0-FF”. Vitess drops the “00” at the beginning of the keyspace range and the FF at the end, so the shards are called “-40”, “40-80”, “80-C0”, “C0-” instead.
Vindexes allow you to map the value of a particular column in a given row to one or more keyspace_ids. Each shard has a starting and ending keyspace_id, thus given a keyspace_id, you can deterministically tell which shard a row belongs to.
Just like an index, a Vindex is applied to a column in a table, but a Vindex has an additional property called sharding function. Vitess gives you 15 predefined sharding functions and certain sharding functions go well with columns of certain types, for example you would typically use “hash” sharding function with a numeric column, or you would use “unicode_loose_md5” for a varchar or varbinary column. You can also write custom sharding functions and use those instead.
What is a Vitess sequence?
In an unsharded database when you need to assign a monotonically increasing value to a row, you can define a column to be of type “autoincrement”. In the sharded world, if you do this, you would end up having duplicate values in the same column across shards. Vitess solves this problem by allowing you to create ids which are monotonically increasing and unique across shards for a given table by defining sequences in the VSchema. The implementation of Vitess Sequences is backed by a row in a table a secondary database. Vitess limits the number of writes needed by allowing you to cache a certain number of values.
Let us make the id column in the puppers table above of type Vitess sequence by adding the following segment to table entry for puppers in our VSchema:"auto_increment": {
	"column": "id",
	"sequence": "pupper_seq"
}

Which gives the following VSchema:{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "puppers": {
      "column_Vindexes": [
        {
          "column": "id",
          "name": "hash"
        }
      ],
      "auto_increment": {
        "column": "id",
        "sequence": "pupper_seq"
      }
    }
  }
}

How to define sequences?
You will notice that we are using the term pupper_seq without defining it anywhere. Here is how that process works. You typically define sequences in an unsharded keyspace that lives alongside your main keyspace. In the Vitess world this keyspace is typically called a lookup keyspace. Here is how you would define the sequence in the lookup keyspace. You first create a table named puppers_seq and you insert one row in it initializing the sequence. Note the comment vitess_sequence associated with the CREATE TABLE statement. It’s important for you to keep that because that is used by Vitess to treat this table distinctly from other tables which hold real data.CREATE TABLE IF NOT EXISTS pupper_seq (
  `id` INT,
  `next_id` BIGINT,
  `cache` BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO pupper_seq (id, next_id, cache) VALUES (0, 1, 3);

After defining this table, you apply the following VSchema to the lookup keyspace:{
  "sharded": false,
  "tables": {
    "pupper_seq": {
      "type": "sequence"
    }
  }
}

These two steps define pupper_seq as a Vitess sequence and it can now be used by the VSchema for the sharded keyspace as we have used it above.
Putting all this together
So, for the Goodest Doggo dog rating application, here is how we will organize our data. We will create two keyspaces, one called lookup, unsharded and one called puppers which is sharded. We will start with two shards and we can reshard as needed as we go. The lookup keyspace will hold tables we need for sequences and for lookup Vindexes. The doggers keyspace will hold the actual data. Here is the schema and VSchema for these two keyspaces:
The keyspace doggers has three tables: puppers, ratings, and users. We want the id column to be of type autoincrement, but as you can see below, because this is a sharded database so we do not specify the column as autoincrement in the schema, but instead we define a Vitess sequence on that column backed by the puppers_seq table in the lookup schema. This is the same for the id column for the ratings table.
We want the table puppers sharded by its id column, but the table ratings sharded by the user_id column. This is expressed in the VSchema.
Here is the Schema and VSchema for the puppers database:CREATE TABLE `puppers` (
  `id` bigint(22) NOT NULL,
  `name` varchar(256) DEFAULT NULL,
  `image` varchar(256) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE `ratings` (
  `id` bigint(22) DEFAULT NULL,
  `user_id` bigint(22) DEFAULT NULL,
  `rating` bigint(20) DEFAULT NULL,
  `pupper_id` bigint(22) DEFAULT NULL,
  KEY `pupper_id` (`pupper_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE `users` (
  `id` bigint(22) NOT NULL,
  `email` varchar(64) DEFAULT NULL,
  `password` varbinary(256) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
{
  "sharded": true,
  "vindexes": {
    "binary_md5_vdx": { "type": "binary_md5" },
    "hash_vdx": { "type": "hash" }
  },
  "tables": {
    "puppers": {
      "columnVindexes": [{ "column": "id", "name": "hash_vdx" }],
      "autoIncrement": { "column": "id", "sequence": "pupper_seq" }
    },
    "ratings": {
      "columnVindexes": [{ "column": "pupper_id", "name": "hash_vdx" }],
      "autoIncrement": { "column": "id", "sequence": "rating_seq" }
    },
    "users": {
      "columnVindexes": [{ "column": "id", "name": "binary_md5_vdx" }]
    }
  }
}

Here is the Schema and VSchema for the lookup database:CREATE TABLE IF NOT EXISTS pupper_seq (
  id INT,
  next_id BIGINT,
  cache BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO pupper_seq (id, next_id, cache) VALUES (0, 1, 3);

CREATE TABLE IF NOT EXISTS rating_seq (
  id INT,
  next_id BIGINT,
  cache BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO rating_seq (id, next_id, cache) VALUES (0, 1, 3);
{
  "sharded": false,
  "tables": {
    "pupper_seq": {
      "type": "sequence"
    },
    "rating_seq": {
      "type": "sequence"
    }
  }
}

Want to see this in action?
We have created a Quickstart Demo on PlanetScaleDB that creates the databases we described above and populates them with data from our sample dog rating application.
To get started, you will need to:
Create a PlanetScaleDB account.
Select the Quickstart Demo VSchema.
This will spin up a cluster and two databases and when successful will present you with a database URL.
In the meantime, download and start the application for your platform.
Once the application is running, it will provide you with a URL (such as http://localhost:8000) on your local host that you can browse to.
This web application will prompt you for the connection string for the cluster you created with the Quickstart Demo on PlanetScaleDB.
Input the connection string into the app and start rating the puppies.
You can look at how the data is distributed across shards by clicking on the tab “Show Data”.
You can also connect to the database to a MySQL client and try running queries as you run the app to see how the data is distributed across shards.]]></content>
        <summary><![CDATA[Rate Puppies in a Rust app with Sharded MySQL Database]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 7</title>
        <link href="https://planetscale.com/blog/announcing-vitess-7" />
        <id>https://planetscale.com/blog/announcing-vitess-7</id>
        <published>2020-07-28T07:00:00.000Z</published>
        <updated>2020-07-28T07:00:00.000Z</updated>
        
        <author>
          <name>Deepthi Sigireddi</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 7.
Major Themes
Improved SQL Support
We continued to progress towards (almost) full MySQL compatibility. The highlights in Vitess 7 are replica transactions, savepoint support, and the ability to set system variables per session. We expect to continue down this path for Vitess 8.
Stability
Vitess had accumulated significant technical debt due to functionality that was added organically over time. Some parts of the code had become unmaintainable. In this release, VTGate's healthcheck and VTTablet's tabletserver and tabletmanager have been rewritten. These rewrites have already paid dividends. Replica transaction support and system variable support are built on the foundation of the new healthcheck and tabletserver. The VTTablet rewrites are expected to facilitate several new features in upcoming releases.
Innovation
Vitess 7 adds ease-of-use and many new features built on top of VReplication. VStream Copy allows streaming of entire tables or databases, thus enabling change data capture applications. Schema Versioning enables correct handling of binlog events on replication streams based on older versions of the schema. VExec and Workflow commands make it possible to manage vreplication workflows without manual edits to metadata. A novel framework has been built to allow dedicated connections alongside connection pooling. Locks and system variables have been implemented using this framework.
Tutorials
Vitess 7 adds three new tutorials to the documentation. We have added a tutorial that demonstrates how to use the open source vitess-operator from PlanetScale, a tutorial for region-based sharding, and one for a local Docker installation.
There is a short list of incompatible changes in this release. We encourage you to spend a moment reviewing the release notes.
Please download Vitess 7 and try it out!]]></content>
        <summary><![CDATA[On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 7. Major themes include improved SQL support as we continue to progress towards (almost) full MySQL compatibility.]]></summary>
      </entry>
    
      <entry>
        <title>Debunking 3 myths about Vitess fault tolerance</title>
        <link href="https://planetscale.com/blog/debunking-3-myths-about-vitess-fault-tolerance" />
        <id>https://planetscale.com/blog/debunking-3-myths-about-vitess-fault-tolerance</id>
        <published>2020-06-10T21:00:00.000Z</published>
        <updated>2020-06-10T21:00:00.000Z</updated>
        
        <author>
          <name>Abhi Vaidyanatha</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Here at PlanetScale we hear some concerns about the reliability of Vitess and its capabilities with regards to data loss. When one hears “cloud-native, highly-available, distributed database running on Kubernetes,” it does sound too good to be true, so we understand the initial apprehension. Even though multiple reputable companies such as Slack, Square, GitHub, and JD run their production databases on Vitess, we still get questions about whether or not Vitess will lose data. We’re here to debunk myths about Vitess and address some lingering questions about how Vitess handles failures.
Myth: Because Vitess does not use a consensus based commit protocol, if your master goes down, you will lose data.
Because Vitess is based on MySQL, it solves this potential problem using MySQL’s lossless semi-synchronous replication feature. Simply put, before a transaction is considered committed, the master must first acknowledge that at least one replica has received the transaction as well.
Myth: Even if you’ve saved your data, when your Vitess master goes down, you cannot automatically perform self-recovery.
While out of the box this is true, MySQL deployments (to which Vitess is no exception) are very commonly run with Orchestrator to automatically reparent upon master failure. Even better, PlanetScale’s proprietary Vitess operator will automatically detect a dead master and will do the reparenting for you.
Myth: Configuring self-recovery with Vitess requires a lot of extra steps.
Also not true! Vitess’ control plane includes workflows such as PlannedReparentShard and EmergencyReparentShard that are available right out of the box. Your orchestration tool simply needs to detect the failure, send the reparent request, and Vitess handles the rest. The case of a network partition does require manual intervention; this is due to Vitess’ tradeoffs to optimize around a very low p99 latency for high performance.
If you still don’t believe us, check out this longer form piece from our CEO Jiten Vaidya, where he dives into the difference between theoretical and practical durability.
If you want to know more about Vitess and its capabilities, contact us in the Vitess Slack community, try it out for yourself with the quickstart guide, or check out our newly open-sourced Kubernetes operator!]]></content>
        <summary><![CDATA[Here at PlanetScale we hear some concerns about the reliability of Vitess and its capabilities with regards to data loss.]]></summary>
      </entry>
    
      <entry>
        <title>Announcing Vitess 6</title>
        <link href="https://planetscale.com/blog/announcing-vitess-6" />
        <id>https://planetscale.com/blog/announcing-vitess-6</id>
        <published>2020-04-29T07:01:00.000Z</published>
        <updated>2020-04-29T07:01:00.000Z</updated>
        
        <author>
          <name>Morgan Tocker</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[I am excited to announce the general availability of Vitess 6, the second release to follow our new accelerated release schedule.
While only 12 weeks have elapsed since the previous release, it feels like a few key investments have started to pay dividends all at once. To provide some personal highlights:
Improved SQL Support
Vitess now understands much more of MySQL’s syntax. We have taken the approach of studying the queries issued by common applications and frameworks, and baking them right into the end-to-end test suite.
Common issues such as SHOW commands not returning correct results or MySQL’s SQL_CALC_FOUND_ROWS feature have now been fixed. In Vitess 7, we plan to add support for setting session variables, which will address one of the largest outstanding compatibility issues.
Kubernetes Topology Service
The Helm charts now default to using Kubernetes as the Topology Service. This helps remove a dependency on etcd-operator, which has since been discontinued.
This change also unlocked the adoption of Helm 3 and support for a wider range of Kubernetes versions, making installing Vitess much easier.
General Availability of VReplication-based Workflows
While VReplication made its appearance in Vitess 4, it has now been promoted from experimental to general availability, and the documentation now points to MoveTables and Resharding.
These workflows require significantly fewer steps than their predecessors (Vertical Split Clone and Horizontal Sharding), which we intend to deprecate at some point in the future.
In addition to this, the end-to-end testsuite is now fully migrated to Golang, and we’ve improved the health of the code base by removing a lot of legacy code specific to Statement-Based Replication and “V2” query routing.
There is a slightly higher number of incompatible changes than in prior releases, so we encourage you to spend a moment reading the release notes.
Please download Vitess 6 and take it for a spin!]]></content>
        <summary><![CDATA[I am excited to announce the general availability of Vitess 6, the second release to follow our new accelerated release schedule.]]></summary>
      </entry>
    
      <entry>
        <title>ACID Transactions are not just for banks — the Vitess approach</title>
        <link href="https://planetscale.com/blog/acid-transactions-are-not-just-for-banks-vitess-approach" />
        <id>https://planetscale.com/blog/acid-transactions-are-not-just-for-banks-vitess-approach</id>
        <published>2020-04-29T07:00:00.000Z</published>
        <updated>2020-04-29T07:00:00.000Z</updated>
        
        <author>
          <name>Jiten Vaidya</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[The outdated example of transferring money from account A to account B needs to be updated for today's emerging applications. Vitess provides an answer.
Remember the early 2000s world of Office Space? Java was still new, usenet passed for social media, and applications were using proprietary Sun hardware with Oracle databases. Then the tech bubble burst, and in that time of frugality, companies shifted to using commodity hardware with open source databases. In many cases, the database of choice was MySQL. The overall solution was cheaper, but you did have to give up some features. Due to the limits of both hardware and software, MySQL was often configured in such a way that it would lose data if there was a system failure.
ACID (Atomicity, Consistency, Isolation, and Durability) is a set of guarantees that traditional database systems provide applications. Applications should be able to depend on these guarantees during regular operations or if there is a failure.
Of particular importance is Durability, which means that when the database system acknowledges a transaction (a commit in database terminology), it should be able to survive permanently, even if there is a system crash or power failure.
Durability, does it really matter?
As companies started using MySQL on cheaper hardware, they did not always configure it to provide full ACID guarantees. Around the same time, NoSQL systems such as MongoDB became popular, and these systems initially did not even offer ACID transactions or durability guarantees. What you got in return was improved performance. This way of operating buffered a greater number of changes in memory and then batched the access to storage devices.
Being able to scale ("web-scale") was the priority. Providing correctness when failures occurred (and in complex systems, failures always do occur) was a secondary consideration.
I remember this clearly from the early days of YouTube. Most of the team had previously worked at PayPal, and when the team took shortcuts around durability, the refrain always was "it's okay, it's not money." This eventually changed as monetization became important and was tied to view counts.
In today's world, there are better examples of the need for durability than financial transactions.
Better examples for today's applications
The traditional example of ACID demonstrates the potential failures that could occur when withdrawing $20 from Account A and depositing it into Account B, and ensuring that the system never encounters a case where the money is lost, both accounts are credited, or the transaction is acknowledged as successful, only to later be reversed.
For today's applications, a better example would be:
While participating in a collaboration tool like Slack, you might decide to create a private channel with a few colleagues. You realize that you have added the wrong person to this channel and you want to remove them. When you do so, you receive a confirmation that this update was successful. This change updates an entry in a database server. If the server crashes two seconds later, the expectation is that the user will have been removed and will no longer see the channel.
Similarly, when I hit send on a message to a colleague, I expect that the message has been sent. I do not consider that it might show as sent but, due to a system failure, may not have been received by my colleague.
(Note: Slack actually does not have these problems because they use Vitess for storage at scale.)
These problems are created by asynchronous failures. The problem occurs because the database server has told the application that the operation was successful, the application has then informed the user of success, but then the database server later failed to deliver on its promises.
The Vitess approach
Vitess is an open source database scaling system for MySQL. We originally developed Vitess to scale YouTube, and it is now a CNCF graduated project (along with Kubernetes, Prometheus, and others).
Vitess prevents the asynchronous failure scenario that we talked about above in two different ways:
First, it ensures that the changes are saved locally on storage, with the redo log and binary logs safely written to disk. Recent MySQL versions (5.7+) will do this by default, but Vitess ensures this on all versions of MySQL it supports.
Vitess makes use of semi-synchronous replication. This ensures that a change has not only been applied locally on the database server but also that there is at least one other server which has received and persisted the change.
"Semi-sync" provides a great tradeoff between durability guarantees and performance. By contrast, some modern systems are solving this problem with a quorum, where the majority of nodes must receive the modification for it to be successful. While quorum can simplify some of the steps of failover, as the system grows (and adds more nodes), the performance overhead increases. This means that performance decreases.
Another advantage of the Vitess approach is that not every replica needs to be a member of the semi-sync group, so you can choose to design failure zones where at least one replica in a different availability zone/data center has a copy before the operation is considered successful.
Vitess has been the system of record for companies like YouTube, Slack, Square Cash, Pinterest, JD.com, and Hubspot for many years. As of today, we do not know of any data loss incidents at these companies due to Vitess.
Conclusion: Build systems that do not lose data
It is easy to forget just how much we integrate modern applications into our lives. As more and more parts of our lives depend on our applications, our expectations for durability and consistency have increased. The shortcuts that we took in the early 2000s helped us scale systems when the technology was not always there. Now that we have the technology, we should be building systems that do not lose data in the face of ordinary failures and have a much lower chance of losing data in the face of catastrophic failures.
To learn more about open source Vitess, go to vitess.io or join the Vitess Slack channel.]]></content>
        <summary><![CDATA[Build systems that do not lose data. Vitess prevents asynchronous failure in two ways: (1) ensuring that the changes are saved locally on storage with the redo log and binary logs safely written to disk and (2) making use of semi-synchronous replication.]]></summary>
      </entry>
    
      <entry>
        <title>Videos: Intro to Vitess—its powerful capabilities and how to get started</title>
        <link href="https://planetscale.com/blog/videos-intro-to-vitess-its-powerful-capabilities-and-how-to-get-started" />
        <id>https://planetscale.com/blog/videos-intro-to-vitess-its-powerful-capabilities-and-how-to-get-started</id>
        <published>2020-04-23T07:00:00.000Z</published>
        <updated>2020-04-23T07:00:00.000Z</updated>
        
        <author>
          <name>Abhi Vaidyanatha</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[In 2020, we launched our first video series about Vitess to help individuals learn more about its features and capabilities. For those new to Vitess, it is an open source horizontal sharding framework for MySQL, developed at YouTube to help them keep up with their massive scale. It was later donated to the CNCF and now sits among Kubernetes, Prometheus, and others as a graduated project.
The videos feature Sugu Sougoumarane, co-creator of Vitess and CTO of PlanetScale. At a glance, these short videos are in a FAQ format and cover these topics:
How Vitess was tested on Borg at YouTube
What attributes make Vitess cloud-native and how does it run on Kubernetes?
What is VReplication?
What can be done with VReplication?
Real time rollups explained
In 2024, we released a free course on getting up-and-running with Vitess.This course covers what Vitess is, and guides you though how to set up your own cluster, both in an unsharded and sharded configuration.Check it out.
If you are interested in getting started with Vitess, join the Vitess Slack community, try it out for yourself with the quickstart guide, or check out our newly open-sourced Vitess Operator for Kubernetes.]]></content>
        <summary><![CDATA[This video playlist featuring Vitess co-creator Sugu Sougoumarane is an excellent resource to learn more about the features and capabilities of open source Vitess.]]></summary>
      </entry>
    
      <entry>
        <title>PlanetScale migrates open source Vitess test suite from Python to Go</title>
        <link href="https://planetscale.com/blog/planetscale-migrates-open-source-vitess-test-suite-from-python-to-go" />
        <id>https://planetscale.com/blog/planetscale-migrates-open-source-vitess-test-suite-from-python-to-go</id>
        <published>2020-03-20T07:00:00.000Z</published>
        <updated>2020-03-20T07:00:00.000Z</updated>
        
        <author>
          <name>Deepthi Sigireddi</name>
        </author>
        
        
        <category term="vitess" />
        
        <content><![CDATA[Over the last three quarters, the team at PlanetScale has focused on the dual goals of making open source Vitess easy to use and easy to contribute to. A part of this effort was a migration of all the integration tests written in Python to Go.
There were several reasons for this project:
The Python tests were very time-consuming to develop and debug.
The Python tests added additional install dependencies for anyone getting started as a contributor.
Support for the Python version being used (2.7) ended on January 1, 2020.
This was a fairly massive project that required several people working on it for almost four months. The project was started around November 1, 2019 and completed on February 25, 2020. There were 197 separate integration tests in 39 files that had to be migrated. In terms of LOC, it was over 24,000 lines of Python code.
In order to accomplish the migration, we first built a test framework in Go (using the command and testing packages) that allowed us to start a Vitess cluster and interact with it programmatically. The framework had to support running multiple tests in parallel without port conflicts; create non-conflicting working directories for all the relevant processes; log sufficient information to enable failure diagnosis, etc. Once that was done, it was a matter of translating Python tests into the equivalent Go code.
Along the way, we were also able to improve the CI pipeline for Vitess. While Travis CI has served us well over the years, we saw an opportunity to switch to GitHub actions. The advantages?
Larger compute+memory instance types. While Travis CI (and Circle CI for that matter) will provide you with larger instances on paid plans, we really wanted to stay within the free tier so that contributors could run with the same technologies and experience as the core project. Larger sizes are important for Vitess, since the test suite can launch 6 or more instances of mysqld.
No limit of 5 concurrent jobs. We were using Travis matrix builds for a purpose they weren’t designed for — to split 2 hrs and 30 minutes of testing into 5 “shards” of 30 minutes. That meant that we could only effectively have one concurrent job, and during peak periods there could be a delay of an hour or more to have test suite results. Our new GitHub actions configuration still uses shards, but now with over 14 of them. We are also no longer blocked by other developers running CI tasks at the same time.
The end result of the project is that it is now much easier and faster to develop new integration tests. It is also easier for someone new to the project to get started. The CI changes give us quicker feedback on pull requests and increase throughput on pull requests.
To learn more, join the Vitess Slack channel and attend the next monthly open meeting on Thursday, March 19 to hear Arindam Nayak talk about this project in detail. The meeting details can be found in the Vitess Slack channel.]]></content>
        <summary><![CDATA[It’s easier than ever to contribute to Vitess. The test suite migration from Python to golang makes Vitess more developer friendly.]]></summary>
      </entry>
    
    </feed>