MySQL for Developers

Creating unique MD5 hashes from multiple columns

In previous videos, we discussed using an MD5 hash to enforce uniqueness on a single column. In this post, we will take a look at how to create MD5 hashes from four columns to enforce uniqueness and perform equality lookups without needing a composite index.

Consider the example of a table that stores normalized addresses. The table consists of four columns – primary_line, secondary_line, urbanization, and last_line. The goal is to store unique addresses and only keep the pretty, normalized version of an address. However, since there is no natural ID or other unique identifier, we need to create our unique ID.

Creating the MD5 hash

To generate the MD5 hash, we create a new column, simply called MD5. Since the table includes binary 16, we will use that for the MD5 hash column. We can do this by running the following command:

ALTER TABLE addresses ADD COLUMN md5 BINARY(16) GENERATED ALWAYS AS (
  UNHEX(MD5(
    CONCAT_WS('|', primary_line, secondary_line, urbanization, last line)
  ))
);

Note that we are using MD5 hash concatenated with separator (|). The first value passed in is the separator. We use the CONCAT_WS function to combine all four columns and then generate the MD5 hash.

Making the MD5 hash unique

Since we want to ensure that we store only unique values, we can add a unique index on the MD5 column. This is done by running the following command:

ALTER TABLE addresses ADD UNIQUE INDEX (md5);

This ensures that we can perform strict equality checks across multiple columns since we have one logical unit with one logical ID created through this new column.

Binary vs. character columns

Note that because the MD5 column is a binary column (a string of bytes, not characters), we must use the UNHEX function to convert the characters to a binary string.

By using this method to create unique MD5 hashes from multiple columns, we can now enforce uniqueness and perform lookups faster than a composite index across multiple character columns. This also saves us from having to maintain a more complex index structure.

Creating an MD5 hash from multiple columns is a powerful tool that can be used in various applications to achieve fast and efficient search results.

About this lesson

If you need to enforce a uniqueness constraint across multiple columns you can use an generated MD5 column to keep that index very very small.

6:12
Closed captioned

Feedback or questions? Reach out to our team at education@planetscale.com.