Replication allows you to copy data from the primary server (also known as the source server) to one or more replicas. MySQL replication automatically keeps data in sync between the source and replicas so that when changes are made to the data in the source, those changes are automatically applied to each replica in the order they were performed.
Database query traffic can be split between the primary and replicas, where read-only queries are dispatched to a replica, and queries that modify data are sent to the source. The source still handles all write operations, as it needs to ensure everything stays in sync and that transactions are handled correctly. However, the replicas are free to help with as much read traffic as they can handle.
While the main workload for the database comes from the application servers, there may be some additional analytical work that data engineering or business intelligence teams want to perform on a company's database. You can set up one or more dedicated replicas for performing these longer-running analytical queries and workloads. Having all these queries handled by a replica can keep these heavy workloads away from the server(s) responsible for keeping your application up and running.
Backups can also be performed on replicas. Taking a backup from a replica eliminates the need for the primary to bear the burden of sending backup feed information to another server.
Replicas can be used to help in failover situations. Your primary server may go down at times. This can be a planned outage, such as taking a server down for maintenance and upgrades. In this case, you can temporarily have one of the replicas take over as a primary during the maintenance window and then switch back over when the primary is back online.
MySQL uses what's known as the binary log to replicate changes from the source to the replicas. It uses the log file as well as the position in the log to determine which changes to replicate.
When data is added, updated, or deleted in a table in your database, those changes are logged to the binary log. An entry in the binary log is created for each changed row. Any replicas you have set up off the primary will connect to the primary and watch for changes to the binary log. For each change observed, the replica will perform the changes needed on the corresponding row within themselves to keep data synchronized. This process is known as row-based replication.
GTID (Global Transaction Identifier) mode enhances replication by assigning a unique ID to each transaction performed on the source. Replicas will use these IDs to track which statements were already performed in a table if this is enabled. This helps to ensure that statements are not accidentally double-replicated — so use it if you have high data integrity needs.
Replication can be set up manually with MySQL if you are configuring your servers. However, some managed db services configure replication for you, including PlanetScale! In the next lesson, I'll show how to set this up both manually and how to use replicas in PlanetScale.
For further reading about MySQL replication, check out this blog post.