Skip to content

Debunking 3 myths about Vitess fault tolerance

Here at PlanetScale we hear some concerns about the reliability of Vitess and its capabilities with regards to data loss.

Debunking 3 myths about Vitess fault tolerance

Here at PlanetScale we hear some concerns about the reliability of Vitess and its capabilities with regards to data loss. When one hears “cloud-native, highly-available, distributed database running on Kubernetes,” it does sound too good to be true, so we understand the initial apprehension. Even though multiple reputable companies such as Slack, Square, GitHub, and JD run their production databases on Vitess, we still get questions about whether or not Vitess will lose data. We’re here to debunk myths about Vitess and address some lingering questions about how Vitess handles failures.

Myth: Because Vitess does not use a consensus based commit protocol, if your master goes down, you will lose data.

Because Vitess is based on MySQL, it solves this potential problem using MySQL’s lossless semi-synchronous replication feature. Simply put, before a transaction is considered committed, the master must first acknowledge that at least one replica has received the transaction as well.

Myth: Even if you’ve saved your data, when your Vitess master goes down, you cannot automatically perform self-recovery.

While out of the box this is true, MySQL deployments (to which Vitess is no exception) are very commonly run with Orchestrator to automatically reparent upon master failure. Even better, PlanetScale’s proprietary Vitess operator will automatically detect a dead master and will do the reparenting for you.

Myth: Configuring self-recovery with Vitess requires a lot of extra steps.

Also not true! Vitess’ control plane includes workflows such as PlannedReparentShard and EmergencyReparentShard that are available right out of the box. Your orchestration tool simply needs to detect the failure, send the reparent request, and Vitess handles the rest. The case of a network partition does require manual intervention; this is due to Vitess’ tradeoffs to optimize around a very low p99 latency for high performance.

If you still don’t believe us, check out this longer form piece from our CEO Jiten Vaidya, where he dives into the difference between theoretical and practical durability.

If you want to know more about Vitess and its capabilities, contact us in the Vitess Slack community, try it out for yourself with the quickstart guide, or check out our newly open-sourced Kubernetes operator!