Why isn’t MySQL using my index?
By Aaron Francis |
One of the most frustrating experiences when dealing with databases is when you've designed the perfect index, but MySQL still doesn't use it. There are several reasons why this could be the case, and in this article, we'll explore some of the most common ones.
Throughout this article, we'll be working with a very simple people
table that looks like this:
CREATE TABLE `people` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `first_name` varchar(50) NOT NULL, `last_name` varchar(50) NOT NULL, `state` char(2) NOT NULL, PRIMARY KEY (`id`), KEY `first_name` (`first_name`), KEY `state` (`state`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
We'll be adding and dropping keys throughout to show different scenarios, but this is a good starting place.
Determining what index is being used
Before you can determine why your index isn't being used, you must first determine that your index isn't being used. You can run an EXPLAIN
on your query to understand what indexes are considered and which index is ultimately used.
EXPLAIN select * from people where first_name = 'Aaron';
Running the EXPLAIN
gives you a look at how MySQL is planning to execute the query:
| id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |----|--------|------|---------------|------------|---------|-------|------|----------|-------| | 1 | people | ref | first_name | first_name | 202 | const | 180 | 100.00 | |
We can see that the first_name
index is the only index considered (possible_keys
) and that it is also the index that is chosen (key
).
Tip
Check out this article for more a more in-depth guide on how to read anEXPLAIN
output. Our index on first_name
was considered, and it was chosen. These are separate pieces of information, both of which are valuable! Before your index can be chosen, it must first be considered. MySQL's query optimizer, responsible for determining the best way to execute a query, looks at the query and the available indexes and decides which indexes are applicable. Having decided on the indexes that apply to the query, it must then choose between those indexes as to which one is the most efficient.1
Now that we know how to determine what index is (or isn't!) being used, let's look at some of the reasons why your index might not be used.
Another index is better
In our example above, only one possible index could satisfy the query, so the optimizer doesn't have to choose which one is best. In the case where multiple indexes might work, the optimizer must make a decision between multiple viable options.
Consider the following query, which searches for people named "Aaron" that live in Texas:
SELECT * FROM people WHERE first_name = 'Aaron' AND state = 'TX'
Running an EXPLAIN
on this query, we see two possible indexes that could be used: first_name
and state
.
| id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |----|--------|------|------------------|------------|---------|-------|------|----------|-------------| | 1 | people | ref | first_name,state | first_name | 202 | const | 180 | 50.00 | Using where |
In this case, the optimizer has decided that the first_name
index is the best choice because the first_name
index is more selective.
Selectivity and cardinality of indexes
Selectivity and cardinality go hand in hand but differ slightly. Cardinality refers to the number of distinct values in a particular column, while selectivity is a percentage of how unique those values are.
To calculate the cardinality of a column, you can use the COUNT(DISTINCT column)
function:
SELECT COUNT(DISTINCT first_name) as first_name, COUNT(DISTINCT state) as state FROM people
Running this query will tell you how many unique values are in each column:
| first_name | state | |------------|-------| | 3009 | 2 |
We can see here that the first_name
column has way more unique values than the state
column. This is interesting information, but it's not that helpful without another piece of information: the total number of rows. Knowing that there are 3,009 distinct values in the first_name
column is interesting, but we have no idea if that's a relatively high or low number compared to the whole table!
This is where selectivity comes into play. Selectivity is a measure of how unique the values in a column are. The higher the selectivity of an index, the better it is for optimizing query performance.
To calculate the selectivity of a column, you can use the formula COUNT(DISTINCT column) / COUNT(*)
. This will give you a decimal between zero and one that represents how unique the values in this column are on average.
SELECT COUNT(DISTINCT first_name) / COUNT(*) as first_name, COUNT(DISTINCT state) / COUNT(*) as state FROM people
By running this query, you can see that the first_name
column has a higher selectivity than the state column. The state column has such poor selectivity that it rounds down to zero!
| first_name | state | |------------|--------| | 0.0060 | 0.0000 |
This statistic tells us that in this table, filtering by state
is less useful than filtering by first_name
. More people share a common state
than share a common first_name
, so it would be faster to use the first_name
index. MySQL keeps track of this information and uses it when planning which index to use when presented with multiple options.
All unique indexes are, by their nature, perfectly selective, meaning that only one record will be returned for each value. We can prove this by adding our id
to our selectivity calculation:
SELECT COUNT(DISTINCT id) / COUNT(*) as id, COUNT(DISTINCT first_name) / COUNT(*) as first_name, COUNT(DISTINCT state) / COUNT(*) as state FROM people
Here you'll see that the id
has far higher selectivity than any other column, as you might expect!
| id | first_name | state | |--------|------------|--------| | 1.0000 | 0.0060 | 0.0000 |
Selectivity is query-dependent
Calculating selectivity across an entire table can be misleading if the data is not evenly distributed. In some cases, an index might be highly selective for one query and not selective at all for another.
Consider a table of one million users, where 99% are of type = "user"
and 1% are of type = "admin"
. In this case, an index on type
might seem useless because it's not very selective on average. But when you're querying for admins, it is highly selective. So while checking average selectivity is a good rule of thumb, pay careful attention to unevenly distributed data.
All other things being equal, MySQL will choose the most selective index possible. To speed this choice up, MySQL keeps statistics about the data's rough shape, which can become outdated.
Outdated or inaccurate statistics
We've been calculating cardinality by running COUNT(DISTINCT column)
each time. It would be inefficient for MySQL to calculate that each time, so instead, it keeps track of the cardinality over time using random sampling. You can see this stored value by running SHOW INDEXES from [table]
:
| Table | Non_unique | Key_name | Column_name | Collation | Cardinality | Index_type | Visible | |--------|------------|------------|-------------|-----------|-------------|------------|---------| | people | 0 | PRIMARY | id | A | 491583 | BTREE | YES | | people | 1 | first_name | first_name | A | 3028 | BTREE | YES | | people | 1 | state | state | A | 1 | BTREE | YES |
The Cardinality
column shows you the stored value MySQL will use to make its selectivity decisions. These statistics are automatically updated after 10% of a table has changed. This happens in the background, and you shouldn't ever notice it. If the statistics are so outdated that it is causing problems, you can force an update by running ANALYZE TABLE [table]
.
If you find that the statistics need to be more accurate for a particular table and are continually causing the optimizer to make poor decisions, you can change how the sampling is done. This is beyond the scope of this article, but the MySQL documentation has a comprehensive page on the topic.
Usually, you won't need to worry about the table sampling or updating the statistics manually.
Scanning the table is faster
Paradoxically, an index is only sometimes the fastest way to access data! The MySQL optimizer will always try to pick the quickest way to get the data it needs, which sometimes means choosing a slightly counter-intuitive method. For small tables or queries that select a large portion of a table, it can be faster for MySQL to skip the index scan and scan the table directly. Sometimes the dreaded table scan is the best access method possible!
An index is a secondary data structure (a B+ tree) apart from the table that must be traversed to find the matching row IDs. Once the IDs have been found, those rows must be found and read from disk. In situations where most of the rows will be fetched, reading all of the rows in order off of the disk is faster than going to the index first.
Usually, a table scan is bad news, but sometimes, even if rarely, it's the best possible outcome.
Index limitations
Our index may have been considered but not chosen in all the preceding scenarios. In the rest of this article, we will look at instances where your favorite index isn't even considered for use.
Indexes may seem magical, but they aren't magic. Without going too deeply into how a B+ tree works, it is essential to understand that there are some queries that an index cannot satisfy.
Tip
We have a 5 minute video overview of how B-trees work if you'd like to dive in further.Let's look at three situations where indexes cannot be considered, due to the nature of their underlying structure.
Wildcard searching
MySQL allows you to search against string columns using wildcards, which is commonly used for searching for strings that start with a particular substring.
Searching for first_name
s that start with "Aa," you'll see that MySQL considers and uses our index on the first_name
column.
explain select * from people where first_name like 'Aa%'; -- | id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- |----|--------|-------|---------------|------------|---------|-----|------|----------|-----------------------| -- | 1 | people | range | first_name | first_name | 202 | | 356 | 100.00 | Using index condition |
However, if we search for names that end with "ron," we have no such luck.
explain select * from people where first_name like '%ron'; -- | id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- |----|--------|------|---------------|-----|---------|-----|--------|----------|-------------| -- | 1 | people | ALL | | | | | 493889 | 11.11 | Using where |
MySQL can use the index until it reaches the first wildcard character. The index is not considered if the search string starts with a wildcard character.
If you need more robust string searching, we have videos on strategies for indexing wildcard searches and an introduction to fulltext indexes that may be helpful.
Composite indexes
Like wildcard searches, there are precise rules for how composite indexes can be used. A composite index is an index that covers more than one column. Instead of creating an index on first_name
, we create one that covers two columns: (first_name, state)
. Let's drop all of our indexes and create a new one.
ALTER TABLE people drop index first_name; ALTER TABLE people drop index state; ALTER TABLE people ADD INDEX multi (first_name, state);
When creating a composite index, think carefully about the order you put the columns in, because MySQL will only be able to use the columns starting on the left and working toward the right. It cannot skip any columns.
This means that our new multi
index is useful when we're querying for both first_name
and state
but useless when we're only querying against state
.
Let's run an EXPLAIN
both queries to prove that. The multi
key is considered and chosen for the first query.
explain select * from people where first_name = 'Aaron' and state = 'TX' -- | id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- |----|--------|------|---------------|-------|---------|-------------|------|----------|-------| -- | 1 | people | ref | multi | multi | 210 | const,const | 178 | 100.00 | |
The key is not considered for the second query that doesn't include a condition on first_name
.
explain select * from people where state = 'TX' -- | id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- |----|--------|------|---------------|-----|---------|-----|--------|----------|-------------| -- | 1 | people | ALL | | | | | 493889 | 10.00 | Using where |
You must form what is called a "leftmost prefix" for the index to be used. Start at the left and work your way toward the right. You don't have to use every part of a composite key, but it is only accessible from the left side. If your index isn't being considered, ensure you've formed a leftmost prefix.
Joining on mismatched columns
MySQL can use an index to speed up the operation when joining two tables. There are, again, a few rules you must pay attention to, though! If the columns are not of the same type and size, this will preclude using an index.
For this purpose, VARCHAR(10)
and CHAR(10)
would be considered the same type and size, but VARCHAR(10)
and CHAR(15)
would not be. It may, in fact, be beneficial to lengthen the VARCHAR
column to match the CHAR
and allow the use of an index, even if the data won't be 15 characters long.
String columns must also use the same charset for an index to be used. If one column uses utf8mb4
and the other uses latin1
, this will also preclude the use of an index.
There are cases when dissimilar types can be compared using an index, but it's always best to declare columns that you plan to use in joins as the same size and type.
Index obfuscation
Index obfuscation refers to the scenario where you've wrapped your indexed column in a function, thereby hiding it from MySQL.
This can happen easily, especially when using a SQL abstraction like an ORM or query builder. Given a table of people where you want to find people created this year, you might wrap a created_at
column in the year
function:
SELECT * FROM people WHERE YEAR(created_at) = 2023;
This is a prime example of index obfuscation. If you had an index on created_at
, MySQL cannot use it! Because you've wrapped the indexed column in a function, it no longer matches the data stored in the index, which is the full timestamp value of created_at
. You've hidden the indexed column, and MySQL cannot see it.
In this case, it would be better to unwrap that function and use a range scan on the index:
SELECT * FROM people WHERE created_at BETWEEN '2023-01-01' AND '2023-12-31';
Not all obfuscation can be unwrapped or undone. Sometimes you need to use functions on columns, and that's perfectly ok! As a general rule of thumb, try to leave your columns untouched and move any operations to the other side of the comparison.
Invisible indexes
This is perhaps the least common reason your index wouldn't be used, but it is possible! You can explicitly make an index invisible, preventing it from being considered for query usage. Usually, this is done to test the effects of dropping an index before committing it.
When you make an index invisible, MySQL maintains it even though it won't be used for queries. If you realize that making it invisible negatively affects performance, you can turn it back on immediately without rebuilding the index.
To make an index invisible, you can alter it and add the INVISIBLE
keyword:
ALTER TABLE people ALTER INDEX first_name INVISIBLE;
Now, running SHOW INDEXES
will show you that the first_name
index is no longer Visible
:
| Key_name | Column_name | Collation | Cardinality | Index_type | Visible | |------------|-------------|-----------|-------------|------------|---------| | PRIMARY | id | A | 493889 | BTREE | YES | | first_name | first_name | A | 2965 | BTREE | NO | | state | state | A | 1 | BTREE | YES |
If your index isn't being considered when you think it should be, double-check that it hasn't been turned invisible.
Forcing an index
The optimizer is a complicated and sophisticated piece of software written by talented people over decades. It usually makes the right decision. Usually... but not always. If you're entirely sure that the optimizer is wrong and you're right, you can force an index to be used. Forcing in an index is as easy as putting USE INDEX([name])
in your query:
EXPLAIN SELECT * FROM people USE INDEX (state) WHERE first_name = 'Aaron' AND state = 'TX' | id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |----|--------|------|---------------|-------|---------|-------|--------|----------|-------------| | 1 | people | ref | state | state | 8 | const | 246944 | 10.00 | Using where |
Taking control away from the optimizer should be done with caution. If you understand what the optimizer is doing and why it's making a bad choice, telling it which index to use is a good escape hatch. Remember that as your data changes over time, you'll need to reevaluate if forcing a particular index is still the most performant option.
Learn more
Indexing is a broad and deep topic about which many books have been written. If you'd like to learn even more, here are some good resources for you:
- How MySQL uses indexes (MySQL Documentation)
- Video course on indexes in MySQL (PlanetScale)
- High Performance MySQL Chapter 4 (O'Reilly)
Footnotes
- Technically more than one index can be used to satisfy a single query, but it's not an ideal strategy. An optimization called the index merge optimization can combine the results of two index scans. It's still better to plan your indexes so that only one is used, but it's good to know this optimization exists!