Three common MySQL database design mistakes
By Brian Morrison II |
Many years ago, I worked for a telematics company that ingested data from hundreds of thousands of devices worldwide. There was a point of incredible growth where we onboarded a customer that gave us a massive number of new devices and a huge bump in revenue. It was a great moment for the company's trajectory, but the increased amount of data being processed highlighted a massive flaw in our system.
The ID column of the data history table (which logged every event that occurred across all devices) was created with the INT
data type, and it was quickly running out of space.
It wasn't an issue immediately, but if that column ran out of space, our entire system would come to a halt. Funnily enough, we built a quick tool called “the doomsday clock,” which would roughly calculate the date this would occur. Had we expected this, we would have designed the database with a different type that would have more easily accommodated growth like this, but of course, the results of our decisions are always more obvious in hindsight.
Let's take a look at this issue and a few other common database design mistakes when setting up your database.
Suboptimal data type
The scenario described in the intro of this article highlights the importance of selecting a data type that's big enough to accommodate your existing data, as well as any potential growth you might experience.
This applies to more than just numerical types, though. For example, if you attempted to write a string with 300 characters into a VARCHAR(255)
column, MySQL would return an error and reject the write if it is in strict mode, which is the default setting for MySQL. If MySQL is not in strict mode, attempting to insert string data into a column that exceeds its length causes the data to be truncated, losing potentially important data.
Note
If you want to learn more about the different string types in MySQL, check out our article breaking down the text options available to you.
Conversely, you can also select columns that store too much data. While this won't have as much of a negative impact as not having enough room, there are storage and performance implications with over-provisioning columns. Let's assume you have a column storing the US zip code, which is typically five digits. You could default to using INT
for the column type (which stores a 32-bit integer), but you'd allocate far more storage than necessary. Utilizing a SMALLINT
would be a better choice, as it stores a 16-bit integer and would be more than enough to store a zip code.
These are only a few small examples of selecting an inappropriate data type for your columns.
Missing or redundant indexes
Indexes in MySQL speed up data access by building a separate structure that's optimized to return data if the query's criteria match the configuration of that index.
Indexes are very important when designing a fast database. When indexes aren't utilized, any SQL queries that do not use pagination or provide a LIMIT
will perform a scan on that table. When scanning, MySQL will start reading from the first row until it has found every row that matches the criteria. If you have a heavily used query on a particularly large table, repeatedly scanning the table can have massive negative performance implications.
Note
To see the practical effects of missing indexes, our very own Aaron Francis saved the SaaS for one of our customers and documented his process on YouTube!
On the other hand, you can have too many indexes as well.
Every index created will utilize additional storage, so having unused or duplicate indexes directly impacts how much you are paying for that storage. Whenever data is updated or inserted into a table with indexes, MySQL needs to update those indexes (along with their associated statistics) to ensure they are accurate regardless of whether they are used. This can be a time-intensive operation that can create a bad user experience.
Note
If you want to learn more about how to effectively use indexes, we have a whole section covering them in our MySQL for Developers course.
Improperly storing semi-structured data
Over the past 20 years, utilizing NoSQL to store semi-structured data has gained favor with companies that need to process vast amounts of data very quickly.
Plenty of dedicated solutions are available on the market for storing this kind of data. However, MySQL is actually very capable in this area as well. Most semi-structured data stored in a database is represented as JSON. The most obvious way to store this would be to store the string in a TEXT
column, but this is definitely not the most optimal way.
MySQL has a dedicated JSON
column type that is designed to store JSON in an efficient binary storage format.
Using JSON
over TEXT
has several key benefits. The first is that InnoDB, the most commonly used MySQL database engine, natively supports querying and filtering based on data within the JSON object stored in the column, removing the need to manually filter after results have been returned to your application code. MySQL also supports building indexes based on data within JSON. This enables fast searches, allowing you to return rows based on your queries more quickly.
Note
If you want to learn more about the JSON data type, our blog has several articles on the topic, including JSON type basics and how to index JSON columns.
Conclusion
Whatever happened to that ID column that was running out of space? Well, luckily enough, the column type was a signed integer, meaning we were able to reseed it to -2,147,483,648. This effectively doubled our ID capacity by assigning negative numbers as IDs and incrementing towards 0. It's not the prettiest solution, but it did help us avoid a rather large amount of downtime that would be required to update the schema for nearly all tables in our database.
Designing a database for growth is no simple task, and things can get out of hand in a hurry. We've touched on only a few potential database design mistakes here, but every MySQL use case is unique and has its own challenges.
If you've encountered design issues, tell us more on Twitter, and make sure to tag @planetscale!