Pagination is essential for any website or application that displays a large amount of data. It allows users to navigate the content more easily and intuitively. However, traditional offset limit based pagination can become slower as users go deeper into the results. This is where the deferred join technique comes in handy.
The deferred join technique is an optimization solution that enables pagination in a more efficient way. It works by performing the pagination on a subset of the data, instead of the entire table. This subset is generated by a subquery, which is joined with the original table later on. The technique is called "deferred" because the join operation is postponed until after the pagination is done.
This technique has been widely adopted, and there are libraries available for popular web frameworks such as Rails (FastPage) and Laravel (Fast Paginate). However, in this post, we'll show you how to implement it manually, step by step.
Let's start with an example of traditional pagination. Suppose we have a table called "people," and we want to display the results sorted by birthday ID. We'll limit the results to 20 records and offset them by 450,000.
Here's what the query would look like:
SELECT
*
FROM
people
ORDER BY
birthday, id
LIMIT 20
OFFSET 450000;
When we run this query, it takes around 600-700 milliseconds to execute. As the number of offset increases, the query becomes slower and slower.
Now, let's implement the deferred join technique and see how it performs.
The first step is to create a subquery that generates the subset of data that we'll use for pagination. The subquery should have a simple SELECT
statement that retrieves only the ID column:
SELECT id FROM people;
Our main query will then join this subquery with the original table using the ID column as a matching criterion:
SELECT * FROM people
INNER JOIN (
SELECT id FROM people ORDER BY birthday, id LIMIT 20 OFFSET 450000
) AS people2 USING (id)
ORDER BY
birthday, id
Here, we're using the USING
clause to specify the matching column between the main table and the subquery. We're also sorting the results by birthday in ascending order.
If we run this query, we'll see that it takes only 200 milliseconds to execute. That's around three times faster than the traditional pagination!
The deferred join technique may seem counterintuitive at first, but it's actually quite simple. By generating a subset of data that contains only the ID column, we're able to apply the pagination on a much smaller dataset. This means we're throwing away less work.
When we join this subset with the main table, we're able to retrieve only the rows that match our pagination criteria. The join operation is more efficient than filtering the whole table with a large offset.
To further optimize the deferred join technique, we can use indexes. If we add an index to the birthday
column, for example, we can make the inner subquery use a covering index. This means that our pagination can be done entirely on the index without retrieving the actual data.
Here's how we can add an index on the birthday column:
ALTER TABLE people ADD INDEX birthday (birthday);
When we run our optimized query again, we'll see that it takes only 60 milliseconds to execute! That's a full ten times faster than the traditional pagination.
The deferred join technique is an excellent solution for websites or applications that require deep pagination. It's more efficient than traditional pagination and can be optimized further with indexes.
Although there are libraries available for popular web frameworks, it's still useful to know how to implement it manually. By understanding the principles behind the technique, you'll be able to apply it in other contexts where libraries aren't available.
We hope this post has been useful in explaining the deferred join technique.