In the last lesson, we had the 101 script do the hard work of getting a Vitess cluster set up for us. Let’s take a deeper dive into what it actually takes to spin up a Vitess cluster, and each of the various components.
Open up 101_initial_cluster
. One of the first things you see it doing is starting up a topo server.
In our case we are using etcd, but Vitess also can operate with other servers such as Zookeeper. Etcd is a distributed key-value store, which Vitess uses to store and query for topology information. Etcd is used to store information about Vitess and to coordinate between the many components of a cluster, so having this up and running is a pre-requisite to running Vitess.
When creating a new Vitess cluster, even a simple one, there are several vitess-specific processes that need to be spun up. The first is typically VTCtld To see how this is started up in our simple example, check out ../common/scripts/vtctld-up.sh
VTCtld is a server that accepts commands from the vtctldclient
and can be used to control and configure a running Vitess cluster. It is typically started right after getting the topo server set up!
Vitess is built on top of MySQL. MySQL is what is used under-the-hood to store data and process queries.
You can take a look at ../common/scripts/mysqlctl-up.sh
to see how MySQL is started up. mysqlctl
is a tool used to start up instances of MySQL. This part of the script gets these three instances up-and-running. Ultimately, one of these will be used as a primary
, one as a replica
, and one as a rdonly
instance (which is also a replica of the primary). These are the three instances that are going to be managed in this example. Vitess can manage small MySQL clusters like this, but can also be used to manage clusters with thousands of MySQL instances.
For each instance of MySQL in a Vitess cluster, a buddy process called a VTTablet needs to be spun up. Check out ../common/scripts/vttablet-up.sh
to see how this is done. In the Vitess cluster, all connections to MySQL go through a tablet. This process allows Vitess to monitor the MySQL instances and helps facilitate connection pooling, leading to better performance and high availability.
Next it to start up the Vitess Orchestrator (VTOrc). You can see how this is started up by looking in ../common/scripts/vtorc-up.sh
.
The orchestrator is a component that monitors the health of the Vitess cluster, and is responsible for "repairing" it if problems are detected. For instance,s let's say you have a primary and two replicas as is set up in this simple intro cluster. Now, what happens if the primary crashes, or the underlying server hosting it goes down? Orchestrator can detect these types of issues, and handle automatic failover so that this can be handled with no overall cluster downtime.
After this, the script applies a schema and a schema to the database. The file create_commerce_schema.sql
has a bunch of create table statements in it, as you might expect for a schema definition for a MySQL database. The vschema_commerce_initial.json
file contains the VSchema, which allows you to do vitess-specific schema configuration. In this case, there isn't much interesting going on here. However, VSchema is used to do things like specify how a database is going to be sharded, how to handle incrementing IDs in a sharded environment, and more.
All of the connections to this database are going to pass through VTGates. In this cluster, we only spin up one, but in a production environment you typically spin up multiple for single Vitess cluster. The more you have, the better your cluster can handle a large number of connections. Connections hit the vVTGate, which then figure out which tablet to send to, which then communicates with MySQL itself. The ../common/scripts/vtgate-up.sh
script is responsible for getting the VTGate up and running here.
Given the complexity of a Vitess cluster, it's useful to have a UI to observe what is going on. Vitess provides a VTAdmin server and client, and it provides a web interface to monitor your cluster. That's what the ../common/scripts/vtadmin-up.sh
script does.
Now keep in mind, in this example, all of these processes were spun up on the same computer! Generally, if you’re using Vitess for a production cluster, you ’d want to spread these components out across many machines. For example, each of the MySQL / VTTablet pairs should probably reside on separate machines for the sake of (A) high availability and (B) so that they get their own separate sets of resources. You also might want to have your VTGate on a separate instance, or you could spin one up on each of the same machines that have the MySQL instances on them. You as the DBA have the flexibility to configure all of this the way that you want, but it can take a lot of tedious setup and customization.