Let's experiment with partitioning Juniper Studio's message table for the GalaxySurfer game. As a reminder, here's what the table looks like in the original game's schema:
CREATE TABLE IF NOT EXISTS message (
id INT AUTO_INCREMENT NOT NULL,
to_id INT NOT NULL,
from_id INT NOT NULL,
message varchar(512),
created_at DATETIME NOT NULL,
edited_at DATETIME,
deleted_at DATETIME,
flagged BOOLEAN,
PRIMARY KEY (id),
FOREIGN KEY (to_id) REFERENCES player(id),
FOREIGN KEY (from_id) REFERENCES player(id)
);
If we eventually end up with one million users, each user has an average of 10 friends they message, and each pair has an average of 100 messages, this table would reach 1 billion rows! Since this table could get large, it is a candidate to be partitioned.
Say I want to see all messages sent for the month of October 2020. To do this, I might write a query like this:
SELECT * FROM message WHERE created_at BETWEEN '2020-10-01 00:00:00' AND '2020-10-31 23:59:59';
If I don't have an index on the created_at
column, this would be slow for a large table. You can use EXPLAIN
and EXPLAIN ANALYZE
to further investigate how the query would be executed without partitioning.
The following commands can be run to change the table to be broken up into ten partitions:
ALTER TABLE message DROP CONSTRAINT message_ibfk_1;
ALTER TABLE message DROP CONSTRAINT message_ibfk_2;
ALTER TABLE message DROP PRIMARY KEY, ADD PRIMARY KEY(id, created_at);
ALTER TABLE message PARTITION BY RANGE ( year(created_at) ) (
PARTITION p1 VALUES LESS THAN (2015),
PARTITION p2 VALUES LESS THAN (2016),
PARTITION p3 VALUES LESS THAN (2017),
PARTITION p4 VALUES LESS THAN (2018),
PARTITION p5 VALUES LESS THAN (2019),
PARTITION p6 VALUES LESS THAN (2020),
PARTITION p7 VALUES LESS THAN (2021),
PARTITION p8 VALUES LESS THAN (2022),
PARTITION p9 VALUES LESS THAN (2023),
PARTITION p10 VALUES LESS THAN MAXVALUE
);
There are a few things to take note of here. For one, I had to drop the foreign key constraints since this is not allowed with a partition like this. Secondly, I added created_at
to the primary key, as the partition column must be a part of a PK.
Another thing to keep in mind here is that there are options that don't require you to do disk management from within MySQL. Suppose you want to spread data in a table across multiple physical disks. In that case, this can also be done by configuring a RAID or LVM setup of the disks outside of MySQL and then placing the database on one logical volume, though there may be multiple physical volumes under the hood that MySQL need not bother with. RAID / LVM may be a better route, depending on your use case.
Now that we have a partition, let's look at the explanation for the query.
mysql> EXPLAIN SELECT * FROM message;
+----+-------------+---------+-------------------------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+-------------------------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | message | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9 | ALL | NULL | NULL | NULL | NULL | 9923747 | 100.00 | NULL |
+----+-------------+---------+-------------------------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
This shows us that the partitions would be used in the scan. If we instead execute a query that better leverages the partitioning, such as getting all of the messages for a single month in 2015, we can see different results:
mysql> EXPLAIN SELECT * FROM message WHERE created_at BETWEEN '2015-10-01 00:00:00' AND '2015-10-31 00:00:00';
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | message | p5 | ALL | NULL | NULL | NULL | NULL | 322965 | 11.11 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Notice that it only needed to look at one partition!
When creating the partitioned table, you can specify a different location for each partition if desired. First, ensure any additional locations you want to store them in are added to the innodb_directories
list. You can change this option in the MySQL configuration file. For example, in my case, it would be:
innodb_directories="/Users/bddicken/Desktop"
Restart MySQL, and try re-running the partitioning command. This time, you can specify a DATA DIRECTORY
to send a partition to, as long as it is in the default data dir or one of the paths in the innodb_directories
configuration option.
....
PARTITION p0 VALUES LESS THAN (2015) DATA DIRECTORY = '/Some/path',
....