As a developer, one of the challenges you face regularly is deciding which data type to use in your database. While the choice often depends on the specifics of the project and its requirements, it is essential to weigh the advantages and disadvantages of each option. In this video, we'll explore using timestamps instead of booleans and the tradeoffs that come with them.
To better illustrate this, let's consider an example where we'll create a table named "posts." This table will have several columns, but for demonstration, we'll focus on the "is_archived" column, which indicates whether a post has been archived.
CREATE TABLE posts(
title VARCHAR(125),
-- ....
is_archived BOOLEAN
);
Initially, we set the "is_archived" column as boolean, which is a useful data type. It allows us to store a true/false value in a single bit, providing an optimal storage solution for storing binary values. However, using timestamps instead of booleans has its benefits too.
When we change the data type to timestamps, we get access to an extra piece of information. Instead of just storing a boolean value, we get to store an archive timestamp, which tells us when the post was archived.
CREATE TABLE posts(
title VARCHAR(125),
-- ....
archived_at timestamp null
);
With this in place, we can still use the archived_at
column as if it is a boolean value with the following query:
SELECT * FROM posts WHERE archived_at IS NULL;
This query returns all unarchived posts, which is equivalent to the boolean query
SELECT * FROM posts WHERE is_archived = false;
However, the archived_at
column provides us with more information, namely the time at which the post was archived, which could be useful in the long run.
While using timestamps provides us with extra information, it has a few tradeoffs that we must consider.
A timestamp column uses four times more storage space than a boolean column, which uses only one byte. As a result, when using timestamps, we need to be mindful of the amount of storage space we use. In a large database, the extra space usage can be significant but it usually doesn't make too much of a difference.
If we decide to put an index on the is_archived
column, it is unlikely to be selective enough to benefit the query. Most posts won't be archived, so querying for posts that aren't archived will not use the index. This is another reason why using a slightly larger column doesn't matter that much, as it's unlikely to be indexed.
As a developer, it is always essential to understand the available options and make informed decisions based on the project's specific requirements. Ultimately, it depends on the project's specific requirements and the tradeoffs you're willing to accept.