In our previous video, we talked about joins and how they work in MySQL. In this post, we're going to take a deeper dive into indexing joins and how it can affect the performance of your queries.
When MySQL joins tables together, it needs to figure out which rows from one table match which rows from the other table. One way to do this is by doing a full table scan, which is slow and inefficient. The better way is to use an index on the related columns, which allows MySQL to quickly retrieve the matching rows and combine them.
Sometimes you'll come across a many-to-many relationship between tables. For example, in a film database, an actor can be in many movies and a movie can have many actors. To link these two entities, you need a joining table with composite primary keys.
Let's take a look at an example. We have a film
table full of movies and an actor
table full of actors. To get a list of all the movies and the actors that were in those movies, we need to use the film_actor
table as the joining table. This table has a composite primary key made up of actor_id
and film_id
.
To illustrate the impact of indexing, let's run some queries with and without indexes. We'll use the explain
statement to see the query execution plan and the cost of each query. Let's say we want to get a list of the first 10 films and all the actors that were in those movies. Here's the query we'll use:
SELECT
film.title, actor.first_name, actor.last_name
FROM film
LEFT JOIN film_actor ON film_actor.film_id = film.id
LEFT JOIN actor ON actor.id = film_actor.actor_id
WHERE
film.id <= 10
LIMIT 10;
We'll start by adding indexes to the joining table (film_actor
) on the film_id
columns:
ALTER TABLE film_actor ADD INDEX idx_fk_film_id (film_id);
Now let's run with the added index:
SELECT
film.title, actor.first_name, actor.last_name
FROM film
LEFT JOIN film_actor ON film_actor.film_id = film.id
LEFT JOIN actor ON actor.id = film_actor.actor_id
WHERE
film.id <= 10
Using the explain format=tree
statement, we can see that MySQL is using the added indexes to join the tables and the cost of the query is only 29 cost units.
-> Nested loop left join (cost=29.42 rows=55)
-> Nested loop left join (cost=10.25 rows=55)
-> Filter: (film.id <= 10) (cost=2.82 rows=10)
-> Index range scan on film using PRIMARY (cost=2.82 rows=10)
-> Covering index lookup on film_actor using idx_fk_film_id (film_id=film.id) (cost=1.06 rows=5)
-> Single-row index lookup on actor using PRIMARY (id=film_actor.actor_id) (cost=0.84 rows=1)
Next, let's turn the index off (by making it invisible) and run the same query again:
ALTER TABLE film_actor ALTER INDEX idx_film_id INVISIBLE;
SELECT
film.title, actor.first_name, actor.last_name
FROM film
LEFT JOIN film_actor ON film_actor.film_id = film.id
LEFT JOIN actor ON actor.id = film_actor.actor_id
WHERE
film.id <= 10
This time, using the explain
statement, we can see that MySQL is doing a full table scan and the cost of the query is now 12,000 units.
-> Nested loop left join (cost=12347.65 rows=54620)
-> Left hash join (film_actor.film_id = film.id) (cost=5519.98 rows=54620)
-> Filter: (film.id <= 10) (cost=3.02 rows=10)
-> Index range scan on film using PRIMARY (cost=3.02 rows=10)
-> Hash
-> Index scan on film_actor using PRIMARY (cost=54.93 rows=5462)
-> Single-row index lookup on actor using PRIMARY (id=film_actor.actor_id) (cost=0.08 rows=1)
Indexing is critical when it comes to joins in MySQL. By properly indexing the related columns between tables, you can significantly improve the performance of your queries. Always use explain
to analyze query execution plans and optimize your queries for performance.