Creating unique MD5 hashes from multiple columns
In previous videos, we discussed using an MD5 hash to enforce uniqueness on a single column. In this post, we will take a look at how to create MD5 hashes from four columns to enforce uniqueness and perform equality lookups without needing a composite index.
Consider the example of a table that stores normalized addresses. The table consists of four columns –
last_line. The goal is to store unique addresses and only keep the pretty, normalized version of an address. However, since there is no natural ID or other unique identifier, we need to create our unique ID.
Creating the MD5 hash
To generate the MD5 hash, we create a new column, simply called MD5. Since the table includes binary 16, we will use that for the MD5 hash column. We can do this by running the following command:
ALTER TABLE addresses ADD COLUMN md5 BINARY(16) GENERATED ALWAYS AS ( UNHEX(MD5( CONCAT_WS('|', primary_line, secondary_line, urbanization, last line) )) );
Note that we are using MD5 hash concatenated with separator (
|). The first value passed in is the separator. We use the
CONCAT_WS function to combine all four columns and then generate the MD5 hash.
Making the MD5 hash unique
Since we want to ensure that we store only unique values, we can add a unique index on the MD5 column. This is done by running the following command:
ALTER TABLE addresses ADD UNIQUE INDEX (md5);
This ensures that we can perform strict equality checks across multiple columns since we have one logical unit with one logical ID created through this new column.
Binary vs. character columns
Note that because the MD5 column is a binary column (a string of bytes, not characters), we must use the
UNHEX function to convert the characters to a binary string.
By using this method to create unique MD5 hashes from multiple columns, we can now enforce uniqueness and perform lookups faster than a composite index across multiple character columns. This also saves us from having to maintain a more complex index structure.
Creating an MD5 hash from multiple columns is a powerful tool that can be used in various applications to achieve fast and efficient search results.