What is MySQL replication?

Introduction

MySQL replication is a process that allows for your MySQL database to be synchronized across multiple servers. When configured properly, replication results in increased application performance, reliability, and improved scalability. The most common type of replication includes one primary server and one or more secondary servers known as replicas.. As data is written to the primary server, the replication process will automatically perform those same data operations on the replicas, which only permit read operations. Replication does, however, come with drawbacks including potential data inconsistency, while increasing infrastructure complexity and maintenance difficulty.

One of the primary components of the replication process is the binary log, which is a file that is maintained by the primary server that contains information in a serialized format about all the changes made to the database. The replica servers read the events stored on the log and execute them in the same order to ensure the data on the primary is replicated properly. An important thing to note is that newer versions of MySQL support another method of replication based on global transaction identifiers (GTIDs) which does not require the use of a log file.

MySQL replication types and formats

Types

Based on behavior, there are three types of MySQL replication, and they are synchronous, semi-synchronous, and asynchronous.

Synchronous

In synchronous replication, the primary server waits for the replica servers to confirm that it has received and applied the changes before committing. It ensures that all the changes made on the primary server are immediately copied to the replica servers before committing. This ensures that all the replicas are up to date, which provides strong data consistency. However, using this method could lead to an impact on the performance and response time of the primary server and limit the scalability of the system.

Semi-synchronous

Semi-synchronous replication provides a compromise between synchronous and asynchronous replication. It allows the primary server to commit without waiting for all the replica servers to confirm the changes. However, it makes sure that at least one of the replica servers has confirmed that it has received and applied the changes before it commits the transaction. This reduces the risk of data loss that using asynchronous replication could cause and offers better performance than synchronous. There is still a possibility of data loss if the replica server with the data fails before other replica servers receive the data.

Asynchronous

This is MySQL’s default and most used form of replication. Here, the primary server commits the transaction immediately without waiting for confirmation from the replica servers. There is no assurance that the data on the replica servers is always up to date with the primary server. This can lead to the possibility of data loss and weaker data consistency. Asynchronous replication is, however, faster and has increased scalability, as the primary server has no need to wait for the replica servers.

Formats

Formats are simply the different ways that events in the binary log are recorded given the type of event. The formats are statement-based, row-based, and mixed.

Statement-based

Statement-based replication copies the SQL statements executed on the primary server to the replica server by recording each SQL statement that modifies data (INSERT, UPDATE, DELETE) on the binary log of the primary server and then executing them on the replica servers. This type of replication is the default mode in MySQL 5.7 and earlier versions. However, it has some limitations, such as potential issues with non-deterministic functions and trigger-based operations.

Row-based

Row-based replication copies the actual changes made to individual rows in the primary database to the replica servers. Instead of replicating the SQL statements, the primary server records the before and after values of each modified row in the binary log. The replica server then applies the same changes to its own copy of the database. Row-based replication provides more precise replication, especially when dealing with non-deterministic functions, triggers, and stored procedures. This is the default mode in MySQL 8.0 and offers better compatibility and reliability compared to statement-based replication.

Mixed Mode

Mixed Mode replication is a combination of statement-based replication and row-based replication. It allows the server to choose the appropriate replication method based on the nature of the SQL statement. Most SQL statements are replicated using statement-based replication, while some statements that cannot be safely logged using statement-based replication (e.g., non-deterministic functions) are replicated using row-based replication. Mixed Mode replication provides flexibility and is the default mode in versions of MySQL between 5.7 and 8.0.

Setting up MySQL replication

In this guide, it is assumed that you have more than one MySQL server running to represent the primary and replica servers. If you don’t have two or more servers running, you can create a second server using a virtual machine if you’re using Windows OS or mysqlid_multi if you’re using a Linux-based OS, but configuring these is beyond the scope of this article.

Configuring the primary server

The first step to configuring the primary server is to look for the my.ini or my.cnf file. On Windows, it’s in the \ProgramData\MySQL\MySQL Server 8.0 folder. On Ubuntu, it’s in the /etc/mysql/ folder. Open the file in your favorite editor and change the following settings. Remember to save the file and restart MySQL service after you’re done editing the file:

server-id=1
log-bin=mysql-bin

The server-id value is 1 for the primary, and other replicas can have server ids of 2 and above. The most important thing is that the numbers are unique. The log-bin value determines the prefix for the binary log files. Run the command:

SHOW PRIMARY STATUS

It shows the name of the binary file and the position. Since we specified the log-bin value as mysql-bin in the previous step,, the name should be mysql-bin,followed by a series of numbers like .000001. Take note of this number as you’ll need to use it when configuring the replica. In this example, the name is mysql-bin.000004:

+---------------+----------+--------------+------------------+
| File          | Position | Binlog_Do_db | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| mysql-bin.000004 | 157  	|             |  	            |
+---------------+----------+--------------+------------------+
1 row in set (0.00 sec)

The next step is to create a replication user on the primary server. This can be done through a MySQL shell or through a graphic interface like MySQL workbench. The following script will create a user named replication_user and grant the necessary permissions to that user. If you are following along, make sure to use a complex password for security.

CREATE USER 'replication_user'@'%' IDENTIFIED BY '<SUPER_SECURE_PASSWORD>';
GRANT REPLICATION REPLICA ON *.* TO 'replication'@'%';
FLUSH PRIVILEGES;

Configuring the replica server

On the replica server, open the my.ini or my.cnf file and change the server-id to:

server-id=2

Save the file and restart the MySQL service. By default, this server will automatically assume that it is the primary. We need to change that by showing the primary server to the replica server. To do that, run the following command:

CHANGE PRIMARY TO
PRIMARY_HOST='primary_server_ip',
PRIMARY_USER='replication_user',
PRIMARY_PASSWORD='password',
PRIMARY_LOG_FILE='mysql-bin.0000X',
PRIMARY_LOG_POS=XX;
 	START PRIMARY;

Replace the XX with the numeric value from running the SHOW PRIMARY STATUS when configuring the primary server. If you get an error, try stopping the replica first before running the change primary command:

STOP REPLICA;

Initializing replication

To initialize the replication, run the command:

START REPLICA;

After configuring the replicas, you can check the status of the replicas by running the command:

SHOW REPLICA STATUS;

Check the Last_IO_Errno and Last_IO_Error fields to ensure the replication is running without errors.

Conclusion

In conclusion, MySQL replication is a powerful feature that enables the synchronization of data across multiple servers, providing redundancy, scalability, and improved performance. It allows you to create replicas of your primary database and distribute the workload, enhancing the availability and reliability of your system. By default, every single production database in PlanetScale comes with at least one replica, and this is done with zero additional configuration by your engineers. Additional replicas can be added any time to further increase application uptime, or to bring your data closer to your users which increases performance.