Navigation

Blog|Tutorials

What is the N+1 Query Problem and How to Solve it?

By JD Lien |

Have you ever been working on an app, staring at your screen waiting for it to load, wondering what on Earth is going on? There are a lot of reasons why you could be experiencing performance issues, but a classic cause of performance issues in database-driven applications is the dreaded N+1 query problem.

Tip

If you're wondering if you have an N+1 problem, you can sign up for a PlanetScale account to access our Insights query monitoring dashboard. More information about identifying N+1s with Insights at the end of this article.

What is the N+1 query problem?

The chief symptom of this problem is that there are many, many queries being performed. Typically, this happens when you structure your code so that you first do a query to get a list of records, then subsequently do another query for each of those records.

You might expect that many small queries would be fast and one large, complex query will be slow. This is rarely the case. In practice, the opposite is true. Each query has to be sent to the database, the database has to perform the query, then it sends the results back to your app. The more queries you perform, the more time it takes to get the results back, with each trip to the database server taking time and resources. In contrast, a single query, even if it's complex, can be optimized by the database server and only requires one trip to the database, which will usually be much faster than many small queries.

An example N+1 query

Tip

As you read through these examples, you can view a live demo here detailing the results and query run time.

Let's look at an example. Applications typically query several related records from the same database tables. Let's take an example of grocery items and categories from my previous article on joins. In this example scenario, we have a PlanetScale database with an items table and a categories table. The items table contains a list of grocery store items with their corresponding categories in the categories table. The examples are in PHP, but the same principles apply to any language.

categories table:

idname
1Produce
2Deli
3Dairy

items table:

idnamecategory_id
1Apples1
2Cheese2
3BreadNULL

Let's say we want our application to list all of the items, including the name of the category they belong to. One straightforward way we could do this is by first querying a list of categories, and then looping over each of the categories, querying for each category's items.

First query — Grabbing the categories:

<?php
    $dbh = new Dbh();
    $conn = $dbh->connect();
    $sql = "SELECT * FROM categories;";
    $stmt = $conn->prepare($sql);
    $stmt->execute();

Second query — Looping over each category and grabbing the items:

<?php
while ($row = $stmt->fetch()) {
    // Show category name
    echo $row['name'];

    // Now query for the items for this category
    $sql = "
        SELECT id, name FROM items
        WHERE category_id = :category_id
        ORDER BY name;
    ";

    $stmt2 = $conn->prepare($sql);
    $stmt2->bindParam(':category_id', $row['id']);
    $stmt2->execute();
    $rowCount += $stmt2->rowCount();

    while ($row2 = $stmt2->fetch()) {
        // Show item ID and name
        echo $row2['id'];
        echo $row2['name'];
    }
}

This approach has the benefits of having two simple queries and clear, procedural code. Unfortunately, this approach is flawed, and you should avoid this situation where you are executing many database queries in a loop.

What caused the N+1 query problem?

This type of query execution is often called "N+1 queries" because instead of doing the work in a single query, you are running one query to get the list of categories, then another query for every N categories. Hence the term "N+1 queries".

You can find the demo of the results being ran here: Demo — N+1 queries to get categories and items

In the above example, our database contains about 800 items across 17 categories. It takes over 1 second to run the 18 simple queries involved in this! That's pretty slow. If you have a more complex queries with a lot of data, it will take even longer.

For this simple example, it's possible to perform the exact same job 10× faster by using only one query that uses a JOIN clause. We could refactor the above code to look something like this:

<?php
    $dbh = new Dbh();
    $conn = $dbh->connect();
    // Record the time before the query is executed
    $timeStart = microtime(true);

    $sql = "
        SELECT
            c.id AS category_id,
            c.name AS category_name,
            i.id AS item_id,
            i.name AS item_name
        FROM categories c
        LEFT JOIN items i ON c.id = i.category_id
        ORDER BY c.name, i.name;
    ";
    $stmt = $conn->prepare($sql);

    $stmt->execute();
    $rowCount = $stmt->rowCount();

    $lastCategoryId = null;

    while ($row = $stmt->fetch()) {
        // Render the heading for each category if this category is new
        if ($row['category_id'] != $lastCategoryId) {
            echo $row['category_name'];
        }

        // Display the row for each item
        if (!is_null($row['item_id'])) {
            echo $row['item_id'];
            echo $row['item_name'];
        }

        $lastCategoryId = $row['category_id'];
    }

With this update, we accomplished much the same work with a single, slightly more complicated query. Attempting our demo of this again, we can observe a significant performance difference between the original page and this one! The page loads in about 0.16 seconds, instead of 1.4 seconds.

In this simple example, with a database that isn't very large, the n+1 approach takes about 10 times longer!

Imagine you had thousands, or millions of records. The performance delta could be the difference between a reasonable load time and a page that takes so long to load that it causes a timeout on the server.

Creating data structures for more complicated queries

Sometimes you may have a more complicated operation in mind. Say you wanted to show the categories along with the count of each item in each category. You could use an aggregate query (GROUP BY), as shown below:

SELECT c.id, c.name, count(i.id) AS item_count FROM categories c
LEFT JOIN items i ON c.id = i.category_id
GROUP BY c.id, c.name
ORDER BY c.name;

But then how would we also get the list of items from a query like this where we are grouping?

While it's often most efficient to let the database server do a lot of the heavy lifting instead of our server-side code, for something like a simple count of items, it may not be necessary. If we actually just queried for the items, it's pretty easy to let the server-side code (PHP, in our example) do the count for us!

We can refactor this such that we do the job with a single query, then turn that query into a clean data structure.

<?php
$dbh = new Dbh();
$conn = $dbh->connect();
// Record the time before the query is executed
$timeStart = microtime(true);

$sql = "
    SELECT
        c.id AS category_id,
        c.name AS category_name,
        i.id AS item_id,
        i.name AS item_name
    FROM categories c
    -- Using a normal JOIN would not get the categories with 0 items
    LEFT JOIN items i ON c.id = i.category_id
    ORDER BY c.name, i.name;
";
$stmt = $conn->prepare($sql);

$stmt->execute();
$rowCount = $stmt->rowCount();

$lastCategoryId = null;
$lastCategoryName = null;

// Build a 2D array of categories with their items
$categories = [];
// A categoryItems array will become the value for each category
$categoryItems = [];

// Alternative approach: build a data structure with the data we want as a 2D array.
while ($row = $stmt->fetch()) {
    // Render the heading for each category if this category is new
    if (!is_null($lastCategoryId) && $row['category_id'] != $lastCategoryId) {
        $categories[$lastCategoryName] = $categoryItems;
        // Reset the categoryItems array
        $categoryItems = array();
    }

    // Create an array of all the non-null items
    if (!is_null($row['item_id'])) $categoryItems[$row['item_id']] = $row['item_name'];

    $lastCategoryId = $row['category_id'];
    $lastCategoryName = $row['category_name'];
}
// Add the last category to the array with its items
$categories[$lastCategoryName] = $categoryItems;

Now that we have this $categories array with arrays of items within, we can do a nested loop to render the data in the way we see fit. When we want the count of items, you can simply run count($items) to get the quantity.

<?php
foreach ($categories as $categoryName => $items) {
    echo $categoryName;
    // Show the count of items in the category
    echo count($items) . ' items';

    if (count($items)) {
        // Loop through all the items in the category and display them
        foreach($items as $itemId => $itemName) {
            echo $itemId;
            echo $itemName;
        }
    }

Using techniques like this, you can keep your page load times quite fast by being efficient with your use of the database. Instead of writing your code such that you have 1 query plus another for each record of that query, it is well-worth the effort to write your code such that you have 1 query that returns all the data you need.

Using this approach, you can also create data structures that are more useful for your application. For example, you may want to create a data structure that is keyed by the category ID, and then have the items as sub-arrays. This would allow you to easily access the items for a specific category by its ID.

Identifying N+1 queries

If you have a more complex application, you may have a lot of N+1 queries and not know it. There are a few ways to identify these queries and fix them.

If you're working on a Laravel app you can use Laravel Debug Bar. Laravel also allows you to fully disable N+1 queries by adding the following line to your AppServiceProvider inside the boot method:

Model::preventLazyLoading(!app()->isProduction());

This will cause the application to throw an exception if it detects an N+1 query when not in production, allowing you to detect and fix these issues.

PlanetScale Insights

PlanetScale also offers an analytics and monitoring solution called PlanetScale Insights. This is accessible from your PlanetScale dashboard and allows you to see the queries that are being run on your database. Using this, you can identify many types of issues with your queries, including N+1 queries and long-running queries. The screenshot below is from the demo database we've been using in this article.

PlanetScale Insights queries over 24 hours

The first query is our more complex but efficient JOIN query, which read 834 rows, returned 815 rows, and took a total of 14ms.

The two queries below that are inefficient queries that resulted in the N+1 problem. Together, they took a total of 42ms and 13,889 rows read to give us the same results as the more complex query.

Overall, this shows us right away that our N+1 queries:

  • Ran way too many times
  • Read way more rows than returned
  • And performance was relatively slow

Now you know how to identify N+1 queries, how to fix them, and how to use PlanetScale Insights to monitor your queries and identify performance issues so you can get out there and write some fast, lean code!