Self-managed Vitess vs Managed Vitess with PlanetScale
By Holly Guevara |
People often ask us — why would I use PlanetScale when Vitess is open source? Surely I can just run it myself? What added benefits do I get with PlanetScale? In this article, we're going to answer all of those questions and more.
PlanetScale’s relationship with Vitess
Let’s start by first defining the relationship between PlanetScale and Vitess.
History of Vitess
Vitess was created at YouTube in 2010 to scale their MySQL instances, primarily surrounding issues they faced with heavy write traffic and connection limitations. In 2015, Vitess was donated to the CNCF, and it achieved graduated status in 2018. Since then, Vitess has been adopted by hundreds of the largest sites on the internet to scale their MySQL clusters. Among these companies are Slack, Etsy, GitHub, HubSpot, Shopify, Square, Pinterest, and more.
PlanetScale on Vitess
PlanetScale was founded in 2018 by the original co-creators of Vitess from YouTube. PlanetScale is the largest contributor to the Vitess codebase, and we employ roughly 75% of the Vitess maintainers.
Our vision for PlanetScale has always been to create the database platform that much of our staff wish they had during their many years operating sites at massive scale. Essentially, making it as easy as possible to create and maintain databases — at any scale. Of course, a crucial part of that is ensuring the underlying technology itself can handle scale. Vitess was the perfect choice, but the mission doesn’t end there.
Every time you spin up a PlanetScale database, no matter how small, you're getting Vitess under the hood.
PlanetScale aims to make Vitess and MySQL easy to use, operate, and maintain. To do this, we have made it as easy as a click of a button to spin up a new Vitess cluster. We also put a lot of care into crafting the perfect developer experience. We have introduced features like database branching, deploy requests, and more to make online schema changes an afterthought.
Resources required to run and maintain Vitess
Companies that we talk to that are at the start of their Vitess implementation journey often ask us what resources are required to set up and maintain Vitess clusters on your own. The PlanetScale team has extensive experience running thousands of MySQL instances in Vitess clusters, not just at PlanetScale, but also in their previous roles at other companies.
In this section, I’ll walk through some of our findings about the cost, time, and other requirements involved to run and manage Vitess. I'll also highlight some public Vitess-user testimonials to help paint a better picture of this in practice.
Time to implement
As you can imagine, the time it takes to implement Vitess varies widely. There is often a non-trivial amount of prep work that you have to do to "ready" your system to use Vitess. You can take a look through the MySQL compatibility documentation to get an idea of what kind of prep work you may need to do. For example, we commonly see stored procedures and CTEs as blockers. The Vitess team is constantly working to close the gap between full compatibility, and has made great strides recently, but it is a good idea to look through the documentation to make sure you are not heavily using any unsupported MySQL features.
Once you do this underlying work, then the fun really begins — implementation. This involves identifying which tables (if any) you will shard, planning the sharding scheme, server provisioning, mapping out initial resource allocation, replica usage, etc. Next, you'll perform the actual migration, which can be just as time consuming if you run into issues.
Let’s look at some examples of companies who have implemented Vitess on their own.
Slack
The team over at Slack wrote a blog post about their experience implementing and working with Vitess. It is a fantastic read, so I highly recommend reading through the full post.
In short, their journey to implementing and fully moving over to Vitess began in July of 2017. It wasn’t until late 2020 that they were fully migrated over. Those three years of migration of course required a lot of resources and focus on this project, first starting with a PoC, then mapping out and implementing the migration, contributing changes to the Vitess codebase, and more.
To start, they decided to build a proof of concept — getting a small feature into production using Vitess.
We decided to build a prototype demonstrating that we can migrate data from our traditional architecture to Vitess and that Vitess would deliver on its promise. Of course, adopting a new datastore at Slack scale is not an easy task. It required a significant amount of effort to set up all the new infrastructure in place.
The authors go on to detail some of the work that went into this. Remember, this is just for a tiny service (relative to Slack's total datastores).
Our goal was to build a working end-to-end use case of Vitess in production for a small feature: integrating an RSS feed into a Slack channel. It required us to rework many of our operational processes for provisioning deployments, service discovery, backup/restore, topology management, credentials, and more. We also needed to develop new application integration points to route queries to Vitess, a generic backfill system for cloning the existing tables while performing double-writes from the application, and a parallel double-read diffing system so we were sure that the Vitess-powered tables had the same semantics as our legacy databases.
The PoC required a ton of upfront work, but proved to be the correct choice for scalability, so they decided to proceed with the full migration.
However, it was worth it: the application performed correctly using the new system, it had much better performance characteristics, and operating and scaling the cluster was simpler. Equally importantly, Vitess delivered on the promise of resilience and reliability. This initial migration gave us the confidence we needed to continue our investment in the project.
Source: https://slack.engineering/scaling-datastores-at-slack-with-vitess
Overall, their journey took a total of 3 years and required extensive work from the team. To get Vitess working for Slack, they even became one of the largest contributors to the Vitess repo.
There are many other stories to tell in these 3 years of migrations. Going from 0% to 99% adoption also meant going from 0 QPS to the 2.3 M QPS we serve today. Choosing appropriate sharding keys, retrofitting our existing application to work well with Vitess, and changes to operate Vitess at scale were necessary and each step along the way we learned something new.
Square
The team over at Square wrote a blog series detailing their experience implementing Vitess. The first post is subtitled “Ripping Vitess apart and putting it back together. And the first sentence in the last part of the series sets the stage for what was to come with their implementation:
It has been quite the challenge bringing Vitess online over our existing MySQL database, then sharding and operating it at greater scale over time.
Back in 2016, Square's Cash product was growing quickly. They knew they needed to find a new solution, and fast.
Cash was growing tremendously and struggled to stay up during peak traffic. We were running through the classic playbook for scaling out: caching, moving out historical data, replica reads, buying expensive hardware. But it wasn’t enough. Each one of these things bought us time, but we were also growing really fast and we needed a solution that would let us scale out infinitely. We had one final item left in the playbook.
We had to shard.
Once they started researching, they stumbled upon Vitess. They, like Slack and most Vitess users, had to first deal with some underlying prep work before they could begin the migration. Once that was complete, they ran into unexpected issues, some of which they had to solve while in production: deadlocks that caused outages, handling scatter queries, keeping transactions ACID, resharding, and much more.
Sharding Cash’s database with Vitess was a massive undertaking that set us up for the future, but it was just the start of the journey.
Other users of Vitess (like YouTube) can make different trade offs — maybe dropping a comment every once in a while isn’t the end of the world for them. But not us. So the first thing we had to do was change our application code so that it wouldn’t do cross shard transactions in critical money-processing portions of the code.
Although a long, arduous process, it was an essential step in scaling to the next level. And their efforts proved it was worth it, as they knew Vitess was the correct long-term solution to set them up for near-infinite scale.
Running Vitess on your own
Now, you may not be quite at Slack or Square scale (yet!), but as you can see, implementing and maintaining a system like Vitess isn’t exactly a hands-off task.
There are of course several companies out there that have deployed and continue to maintain Vitess on their own, so don't shy away from it if you're set on implementing Vitess yourself. You can find some excellent resources on the Vitess website to support you through your work. And if you do hit any roadblocks, the Vitess community is fantastic. The Vitess Slack channel has over 4,000 members at the time of writing this, and they are always happy to provide guidance if needed.
So, in short, it can take a long time to get all the pieces in production, but it really depends on the complexity of your application and the resources you have available to dedicate to the work.
Regardless, it's always nice to have a team with Vitess expertise on-hand to assist you with the process. Especially because making an infrastructure change of this magnitude can be downright scary.
After I run this command we will have completed the first (out of many many) shard splits.
It’s not reversible without very significant data loss.
I am utterly terrified.
I stand up and pace around the room. Is this really it? Have we thought about everything? What if I missed something? What if I screw everything up?
If you're interested, here are some additional stories about companies implementing and working with Vitess:
- Vinted Vitess Voyage
- Horizontally scaling the Rails backend of Shop app with Vitess
- HubSpot Improving Reliability: Building a Vitess Balancer to Minimize MySQL Downtime
- How HubSpot Upgraded a Thousand MySQL Clusters at Once
- Activision’s Journey to Scaling Databases with Vitess
Ongoing management tasks
Preparing for and implementing Vitess is one thing, but the work doesn't end there. PlanetScale has huge teams of engineers dedicated to managing the infrastructure that our customers run on. Version upgrades, running backups, Kubernetes cluster maintenance, 24/7 monitoring, managing integrations, and much more. If you're going to run Vitess on your own, you'll need to have a team on hand to manage all of this.
Version upgrades
One important thing to keep in mind is that the Vitess team typically releases 2-3 major versions per year. This means it's easy to fall behind, especially when you have a smaller team managing your Vitess instances. We frequently onboard customers that are 3+ versions behind who just didn't have the time to keep up with testing and implementing the upgrades.
One nice thing about running on PlanetScale's infrastructure is that we completely handle all of the version upgrades for you — without any downtime. In addition to the Vitess updates, we also handle MySQL updates as well.
Resharding
Once you're in production, your app will likely still continue to grow. Teams sometimes end up facing the challenge of resharding much sooner than anticipated. With PlanetScale's support, we are very hands-on in assisting you with this operation.
Cost
Much of the cost that comes with running Vitess amounts to the team size and time spent implementing and maintaining it. Machine-wise, you may end up saving a bit of money running on Vitess, as you can use smaller machines for each partitioned shard than you would if you just vertically scaled. This also allows you to have more predictable infrastructure pricing, as you aren't experiencing such large jumps to scale up.
To give a ballpark number, for smaller/mid-sized applications running Vitess (200GB-1TB), we usually see teams of around 3-6 DBAs focused on day-to-day database operations. Plus more (closer to 10) for the initial implementation. If you have a much larger application (1+ TB), you're looking at at least 10 people at all times.
With a PlanetScale Enterprise plan, the raw infrastructure costs are usually in line with what you'd pay running Vitess on your own. Apart from your cloud bill for the infrastructure, you'll pay PlanetScale for management and enterprise support. The PlanetScale costs scale with the resources and storage that you provision.
The time and additional people costs you save by having our team manage everything for you typically makes PlanetScale a more cost-effective solution than running Vitess on your own.
PlanetScale was an easy choice for us on the technology and support alone. However we were also able to generate meaningful savings on our AWS bill and reinvest that into growing our business.
What additional features do you get with PlanetScale?
While we do run on Vitess under the hood, that’s not all you get with PlanetScale. We give you the Vitess cluster with a bunch of add-ons on top.
Vitess maintenance and support
A PlanetScale-managed Vitess cluster gives you much more than Vitess under the hood. One of the biggest perks is your access to in-house Vitess support and expertise. Our support team truly becomes an extension of your own team.
Our Enterprise support package gives you direct access to our team through a shared private Slack channel and recurring video calls (if desired). You're able to ask us any questions you have about performance, architecture, query optimization, and whatever else you need. We work hard to make sure that all of our customers are successful and have a great experience on PlanetScale.
You'll also get hands-on assistance at every step of the migration process. We are in close contact with you starting with the proof of concept, onto planning out the sharding scheme, planning and implementing the migration, performing version upgrades, and continuing to support you throughout our entire relationship — well beyond getting into production. We even hold the pager for you, often detecting and mitigating any issues well before your team is even aware they existed.
Databases are hard. We would rather PlanetScale manage them. We wanted the support PlanetScale offers because they are the experts in the field. We’ve seen this come to fruition in our relationship.
Another huge perk is our close relationship with Vitess. Because we employ around 75% of the Vitess maintainers and are the largest contributors to the repository, we have a huge wealth of knowledge and a large concentration of Vitess expertise.
Dashboard for branching, deploy requests, and more
Because PlanetScale’s mission is to make it as easy as possible to manage your database clusters at any size, we put a lot of thought and care into our dashboard to make it simple to leverage both Vitess and PlanetScale features.
Vitess comes with built-in support for online schema changes. However, you cannot do schema changes through the VTAdmin UI. With PlanetScale, you get access to deploy requests. Deploy requests are used to make safe, reviewable, diffable schema changes to your database.
This enables your team to work quickly, efficiently, and safely.
PlanetScale Global Network
The PlanetScale Global Network is our edge infrastructure that is responsible for automatically routing reads to the closest replica. This is an additional layer on top of Vitess that you only get with PlanetScale. It also supports the following features:
- Latency-based routing
- Near-infinite connection pooling
- In-app private data access
- First-class serverless support
- PlanetScale Connect
Compliance and security
PlanetScale is committed to delivering a powerful and easy-to-use database platform while keeping your data secure. We have a number of crucial certifications and compliance measures in place as well, which are essential for customer audits. Some of these security measures that are baked into the PlanetScale product include:
- SOC 2 Type 2+ HIPAA
- BAAs available
- PCI compliant (on PlanetScale Managed for AWS)
- Compliance with GDPR and other global privacy regulations
- Data locality
- Private database connectivity through PrivateLink or GCP Private Service Connect
- Audit and security logs
- IP restrictions
- And more
Learn more in our Security documentation.
Query monitoring and insights
PlanetScale Insights is our in-dashboard query performance analytics tool that allow you to track performance down to the individual query level. This gives you a great view of metrics such as query latency, errors, and anomalies — any query activity that we detect as out of the ordinary. You'll also receive schema recommendations to improve database performance, reduce memory and storage, and improve your schema based on production database traffic.
PlanetScale can help
Adding a massive piece of technology to your infrastructure can certainly be daunting. The path to production sometimes doesn't go according to plan and can often require much more time and money than originally budgeted.
At PlanetScale, we see time and time again companies who know Vitess is the right solution for them, but aren't sure if they want to spend the time implementing and managing it on their own. If you're in that boat, it's easy to set up a call with our Technical Solutions team to see if we're the right fit for you. Our sales process aims to make it as easy as possible to get all of your questions answered as fast as possible. You're able to jump straight into a technical evaluation right away — either via email or call.
There are also instances where your desired configuration isn't something that would work well with PlanetScale, and we don't shy away from telling you if so. At the very least, a quick chat with us will hopefully at least bring clarity that you're definitely choosing the correct path.
Again, we employ the majority of the Vitess maintainers, and our entire team at PlanetScale has extensive experience implementing and maintaining huge Vitess clusters. If you are thinking about implementing Vitess on your own and are curious how we can help, we'd love to hear from you. Simply fill out our contact form, and we will be in touch.