MySQL, MongoDB, Firebase, Spanner; there has literally never been a better time to be a database user at any level of complexity or scale. But there’s still one common thread (ha!) among them – the focus is on infrastructure, not developer experience. This post is going to argue that database internals are going to matter less long term; developer experience is what will differentiate offerings.
Most database products today focus on internals
To get some perspective on the shift happening in databases, it’s worth looking back at what are essentially 3 eras (at least from this perspective) of data stores. The common thread between them all is the improvements were infrastructure focused, not developer experience focused.
1. The beginning, relational land
An early history of databases is beyond the scope of this post, but it’s safe to say that databases were synonymous with relational data for the first couple of decades of their existence. MySQL’s initial release was in 1995 (that’s 26 years ago), and Postgres’s was a year later in ‘96. They’re actually still the most popular databases in terms of usage on the planet, and there’s probably a lesson there, but for another time.
Fundamentally, SQL was about rigidity and transactional integrity (i.e. ACID) – even though today, people use it to query data that’s less structured. Before OLAP was a thing, the priority for these databases was making sure your reads were clean and your inserts worked, every time, without exception; and so to do that, you structured your schemas in advance. And this was fine, for the most part. Where things started to go wrong was scale.
Note: While SQL refers to the actual language used to query relational databases, over the years it has become synonymous with the concept of pre-defined schemas and relational data, hence the moniker “SQL Database.”
The problem with MongoDB, and NoSQL databases generally, is that they’re not ACID compliant (they’re eventually consistent, as per CAP Theorem), and developers actually often liked SELECT * over something like .findOne().
NoSQL databases are maturing, for sure – we’re starting to see support for transactions (on some timeframe of consistency) and generally more stability. After years of working with “flexible” databases though, it has become clearer that rigidity up front (defining a schema) can end up meaning flexibility later on. If you decide down the road to analyze data in a different context or another area of your application, having a schema defined can produce reliable results from changing queries. And, fundamentally, developers actually often liked SELECT * over something like .findOne().
2. SQL scalers
Enter the “new thing” in databases, which people are calling SQL scalers – relational databases with transactional rigidity, but they scale horizontally. One example is Spanner, a relational data store developer at Google (paper here) that claims to scale horizontally and infinitely (e.g. cross shard transactions). Aurora, which is AWS proprietary, also claims to be distributed and faster than MySQL and co. Another good one to note is Vitess, an orchestration layer for MySQL, which is what we work on here at PlanetScale.
Again, the narrative centers around infrastructure – you don’t have to sacrifice scale for transactional rigidity, etc. Even Spanner’s product page talks a lot about scale, and not very much about anything else:
You can start to see the early seedlings of “SaaS talk” here – marketing around spending less time on manual tasks, and more time on what matters (whatever that is) – but the message is overtly performance oriented. Which makes sense, because that’s how we’ve been thinking about databases for the past 20 years.
Database internals will eventually just not matter
I believe there’s a good chance that databases will follow in the path of general compute, and what’s actually going on under the hood is going to be more commoditized; instead, databases will win based on superior developer experience. We’re converging on a shared understanding of what reliability means, and that it’s more important to move fast than own infrastructure, as long as things can scale later. This is the direction that all infrastructure is taking really, and it’s also a trend we’ve seen in developer-focused SaaS (think Stripe, Twilio). In some ways, of course, infrastructure is developer experience; but databases are getting so good and general purpose that it’s going to be academic.
First, defining terms: what’s the difference between infrastructure and DevEx as it relates to databases? Here’s a potential taxonomy:
|Right sizing instances||Infrastructure||Schema / migrations||Developer Experience|
|Scaling to meet demand||Infrastructure||Version control||Developer Experience|
|Optimizing cost||Infrastructure||CLI ergonomics||Developer Experience|
|Optimizing performance||Infrastructure||User management||Developer Experience|
|Upgrades and patches||Infrastructure||IDE / Client||Developer Experience|
|Networking||Infrastructure||Data Branching||Developer Experience|
A lot of these infrastructure tasks are already automated (or more accurately, outsourced) with DBaaS like RDS. But a lot of it isn’t – you still need to worry about sizing and scaling manually (in most cases). Infrastructure automation happens gradually; even with Lambda – which is “serverless” – you need to allocate specific amounts of memory, which sounds an awful lot like a server to me. But you don’t need to install and configure, and progress is progress.
A useful model to look to is data warehouses. When Redshift was state of the art (not long ago), you chose the size and power of your data warehouse in advance, and this is still how they price it. But Snowflake and BigQuery came along and completely removed infrastructure from the discussion. To do that, you need two things:
- Pricing that completely follows usage ($/GB stored + queried)
- A focus on the developer experience (query UI, permissions, marketplace, etc.)
It’s possible the reason this happened quickly in warehousing – as opposed to production data stores – is that use cases are more narrow and often not mission-critical.
In developer tools (and software more broadly), there’s always a split between smaller sized and larger sized customers:
- Smaller customers value simplicity and predictable pricing
- Larger customers want extreme flexibility and granularity
When I worked at DigitalOcean, this was a core dichotomy that colored how we approached building product, pricing, and go-to-market. And with databases it’s the same – the “serverless” notion is more exciting to smaller customers who don’t need to worry about what’s going on behind the scenes. Enterprises with mission critical applications care very much about the small details.
But that, too, eventually changes – at some point, the database gets good enough to scale from a tiny company all the way up to the largest apps in the world. Not to scare you away, fellow developers, but this is actually exactly the narrative in one of the most famous business books of all time, The Innovator’s Dilemma. Products start as disruptive on the low end, get laughed at by the larger guys, and then eventually eat those same larger guys.
Defining developer experience in databases
What does developer experience in databases actually mean, and what would something great look like? What do we have to look forward to, in other words?
Over the course of the eras of databases we outlined above, companies have backed their way into figuring out some answers to that question. You can separate the obvious ones into 3 large buckets, but beyond that, there are so many things that we probably can’t even imagine yet, but are going to be awesome.
Interacting with your database
How do you query your data? How easy is it to connect, wherever you are? How easy is it to get the data you want?
1. Application queries
How does your application interact with your database? NoSQL did a great job of normalizing (no pun intended) the use of client libraries in your codebase – instead of ugly triple quotes SQL formatting, you could write something like .insertOne() in whatever language you define your endpoints in. For SQL, this has existed for a while in the form of ORMs (like ActiveRecord for Ruby on Rails) but it has usually been the job of the framework, not the database.
In the future, I’d expect to see a tighter coupling between the frameworks we’re using for reactive frontends – React, Vue, etc. – and the database, via hooks or otherwise. We’re already seeing this with Prisma and Co. who define this as “a better ORM.” The model of an un-opinionated database with something like PostgREST on top is already changing (another good example is Fauna).
2. Ad-hoc queries
It’s becoming table stakes for DBaaS providers to include a UI for querying as part of the product. This started with data warehouses, but made its way into products like Supabase that are targeting production use cases. It’s a lot nicer to log into BigQuery and write queries there directly than what I had to do with Hive – install custom JDBC drivers into DBeaver.
3. Authentication and user management
User management and granular permissioning will be built into the database layer as a critical part of developer experience. You’ll be able to restrict specific tables, types of data, branches, etc. to specific users, revoke access after specific amounts of time, etc. You can technically do this in DBs like Postgres, but new databases will rethink it from the ground up and make it dead simple via a UI or/and CLI.
The CLI in general is another great place to focus on. What would psql look like if it was reimagined from the ground up? How tightly can we couple the CLI with a web UI to make authentication simple?
Workflows: migrations, version control, and environments
Ah, migrations – the reason your local environment isn’t working even though you pulled 10 seconds ago. Nobody can ever perfectly guess what their schema will be in 5 years; new features get built and existing ones get refactored. What could version control for a database look like?
If your database isn’t SQL or NoSQL under the hood but instead whatever you want, you’ll be able to choose when to apply schemas and when not to. Your schema will, paradoxically, be flexible, and because of that, it will be able to follow Git workflows just like your code. Imagine opening a pull request on your database, writing queries side by side to see the different results, having your teammates add comments, and then merging it into production.
We already (sort of) version control our databases – changes get made in staging, tested against a frontend also in staging, and then deployed to prod. Maintaining parity between your local/staging database and prod can be tedious. What if each database pull request created an entirely new deployment of your database (all data affected), and that connection string automatically got injected into your application.
Pricing, scale, and monitoring
Pricing is part of developer experience. As the infrastructure powering your database gets further abstracted, the primary component of scale will just be price: you’ll be paying per GB stored, which will scale linearly (maybe with discounts) as your app gets larger. The dashboards and UI you get will shift from monitoring infrastructure (latency, throughput, etc.) to monitoring cost and making sure you don’t get stuck with a huge bill you weren’t anticipating (I’m looking at you, AWS).
All of these questions (or guesses) are concretely more interesting than “why am I getting CONNECTION DENIED errors” and that’s the point – developer experience doesn’t have to be infrastructure, and it can be exciting.
What do you think the next era of databases is going to look like? What is DevEx going to look like for cloud databases in 5 years? Let the PlanetScale team know on Twitter (@planetscaledata) or join the discussion on HackerNews.