As we continue our discussion of efficient querying, we will now focus on the importance of limiting the number of rows returned in a query. The principle remains the same: do not bring back data that you do not need and will simply throw away.
Indexes play a crucial role in optimizing queries in a relational database. When querying a table, we want our indexes to narrow down to just the data that we need so that we can minimize the amount of data that we request from the actual table. Sometimes, even when using indexes, we may still need to do some additional elimination on a column that was not indexed! That's totally fine.
When querying a table, we should only request the rows that we need instead of requesting all of the rows and then discarding most of them. Here are a few scenarios where we should only return the necessary rows:
If we want to count the number of rows in a table, we should not select all of the data and send it back to our application. Instead, we should push this down to the database and have the database do the counting. This way, the database does not need to access every row in the table and send it back over the wire. To count rows, we can use the following:
SELECT COUNT(*) FROM table_name;
Calculations such as minimums, maximums, and averages should be done in the database instead of in our application. By doing these calculations in the database, we eliminate the need to look at all of the rows and send them back. To find the minimum in a column, we can use the following:
SELECT MIN(column_name) FROM table_name
Likewise, to find the maximum or average in a column, we can use SELECT MAX(column_name) FROM table_name
. To find the average in a column, we can use SELECT AVG(column_name) FROM table_name
.
If we want to get a distinct list of values for a particular column, we should use SELECT DISTINCT column_name FROM table_name
. By doing this in the database, we can avoid sending back unnecessary rows to our application. However, if we want to get a distinct list of values for multiple columns, we will need to issue multiple queries.
Pagination refers to the technique of only returning a specific page of results that our users might be interested in. For example, if we return 500,000 results but only show 10 to our users, we have wasted a lot of processing. Thus, it is important to limit the number of rows returned and only return as many rows as will be useful.
When using pagination queries such as LIMIT 10 OFFSET 20
, it is important to note that there are specific rules and best practices to follow. We should always put an ORDER BY
clause in our limits, even if it is simply ORDER BY id
. Otherwise, MySQL gets to decide how to order the rows, which can cause inconsistencies. We will explore how to perform pagination in an index-assisted way to minimize the amount of data that we request from the actual table.
By using indexes and limiting the number of rows returned, we can greatly optimize our queries and improve the performance of our relational databases. We should always strive to only request the data that we need and minimize the amount of unnecessary data that we send back and forth between the database and our application.