MySQL for Developers

Using a summary table to optimize your data

As businesses collect more and more data, it can become difficult to manage and process all of it in a timely manner. One optimization technique that can help reduce the burden on your database is using a summary table, also known as a roll-up table. This involves taking large sets of historic data and rolling it up into a smaller, more manageable summary table. In this video, we'll explore how to create a summary table and integrate it into your existing database.

Getting started: filtering your data

Let's start by examining an example payments table. Our goal is to roll up old months and put them into a summary table. To begin, we need to filter down to just older months:

SELECT
  amount,
  YEAR(payment_date),
  MONTH(payment_date)
FROM
  payments
WHERE
  payment_date < DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')

Here, we're selecting the amount of the payment as well as the year and month that the payment was made. By limiting the results to payment_date values that are less than the first day of the current month, we ensure that we're only selecting data from previous months.

Grouping your data

Once we have our filtered data, we need to group it by year and month so we can roll it up into the summary table:

SELECT
  sum(amount) as amount,
  YEAR(payment_date) as `year`,
  MONTH(payment_date) as `month`
FROM
  payments
WHERE
  payment_date < DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')
GROUP BY
  `year`, `month`

This gives us a sum of the amount for each month that is in our historical data set.

Creating your summary table

Now it's time to create the summary table itself. We define the schema using the following statement:

CREATE TABLE payment_summary (
  id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  amount DECIMAL(9,2),
  `year` YEAR,
  `month` TINYINT UNSIGNED
);

We'll use this table to store our rolled-up historical data. Note that we've defined the year column to use the YEAR data type, which is typically not used since it only stores four-digit years. In this case it is actually useful!

Populating your summary table

To insert the data from our filtered query into the summary table, we can use the following statement:

INSERT INTO payment_summary (amount, year, month) SELECT
  sum(amount) as amount,
  YEAR(payment_date) as `year`,
  MONTH(payment_date) as `month`
FROM
  payments
WHERE
  payment_date < DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')
GROUP BY
  `year`, `month`

This will populate the summary table with the historic data we've rolled up.

Combining your summary table with live data

The last step is to incorporate this summary table with live data. We want to ensure that the summary table is up-to-date, but we also want to be able to query both historic and current data together. Here's one way we can do it:

SELECT
  amount,
  year,
  month
FROM
  payment_summary

UNION ALL

SELECT
  sum(amount) as amount,
  YEAR(payment_date) as `year`,
  MONTH(payment_date) as `month`
FROM
  payments
WHERE
  payment_date >= DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')

This combines our rolled-up historical data with our current data. Note that we're using UNION ALL instead of UNION here, which preserves duplicate rows (of which there should be none in our case). We're using the same extract statement as before, but with the opposite condition to ensure that we're only selecting data that's been added this month.

Using a common table expression

Finally, we can use a common table expression (CTE) to treat this entire query as a single table:

WITH payment_data AS (
  SELECT
    amount,
    year,
    month
  FROM
    payment_summary

  UNION ALL

  SELECT
    sum(amount) as amount,
    YEAR(payment_date) as `year`,
    MONTH(payment_date) as `month`
  FROM
    payments
  WHERE
    payment_date >= DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')
)

SELECT * FROM payment_data

Here, we wrap our combined query in a WITH statement to define a new CTE called payment_data. We can then select from this CTE as if it were a real table.

Conclusion

By using a summary table, we can greatly reduce the amount of data processing required by our query. We've demonstrated how to filter and group large data sets, create a new summary table, and roll up data into it. We've also combined live and historical data using UNION and CTE statements. Taking the time to optimize your database in this way can result in faster queries and a more efficient system overall.

About this lesson

Calculating statistics over historical data can be inefficient as data grows. Using a summary table is one way to work around that.

5:27
Closed captioned

Feedback or questions? Reach out to our team at education@planetscale.com.