What are data structures? Types of data structures used in SQL
Introduction
Data structures are fundamental to computer science and programming. They are crucial to making sense of data for programmers and for computers to follow the instructions of programmers.
It’s also important to learn data structures when starting out in your career. That’s because they play a crucial role in enabling efficient storage, retrieval, and manipulation of information.
In this article, we’ll discuss the importance of data structures, the most common ones, and how to model these data structures in SQL.
Understanding data organization
Data is naturally chaotic and unstructured. Like products in a supermarket, it’s not useful without proper organization.
Organizing data involves choosing what data to store and what not to store. It also involves choosing how to store that data in computer programs to make it easily retrievable, sortable, and searchable. Organizing data poorly can lead to programmers having to use inefficient queries to achieve their goals.
What is a data structure?
Data structures are formats for organizing and storing data. They help organize different types of data based on what the data is and how users will interact with it.
Computers need data to be organized in consistent and predictable ways. Over the years, data science programmers have invented various data structures that make sense to computers’ operating systems, while also serving practical needs for programmers.
It’s crucial for programmers to select the right data structure, as different types of data structures have pros and cons. Some data structures are good at organizing linear or hierarchical data. Others are better at efficiently accessing individual elements without complex algorithms. This is why knowing the available data structures can help you make an educated decision in architecting your data.
The complexity of implementing different data structures in SQL also varies. Programmers should consider how maintainable and efficient their desired data structure will be both at the application and database level. It also depends on their data values.
Common types of SQL data structures
Array data structure
An array is a collection of items, generally of the same data type, where each item is stored in contiguous memory locations. Each item in an array has an index position starting from 0 for the first item and is incremented by 1 for each subsequent item in the array. Data items are directly accessible without traversing the whole collection by using an item’s index to select it.
If the programmer, however, wants to traverse the whole array, most programming languages have baked-in methods for looping through an array item-by-item. There are also numerous algorithms for searching and sorting arrays.
Arrays have some limitations. For example, the required algorithm for inserting or deleting a data element is very inefficient because it requires shifting each element after the modified one.
If you need to store arrays in SQL, use multiple tables where the child table has a foreign key column referring to the parent table record. For example, for a user object with an array of blog posts, you would create tables for users and blog posts. The blog posts table would have a “user_id” column, referring to the user who created those blog posts.
CREATE TABLE user (
Id INT PRIMARY KEY,
username VARCHAR(50),
);
CREATE TABLE blog_posts (
post_id INT PRIMARY KEY,
user_id INT,
post_title VARCHAR(255),
post_content TEXT,
FOREIGN KEY (user_id) REFERENCES user(id)
);
INSERT INTO user (id, username)
VALUES (1, 'John'),
(2, 'Jane');
INSERT INTO blog_posts (post_id, user_id, post_title, post_content)
VALUES (1, 1, 'First Post', 'This is the content of the first post.'),
(2, 1, 'Second Post', 'This is the content of the second post.'),
(3, 2, 'Third Post', 'This is the content of the third post.');
Linked list data structure
A linked list is a collection of items, usually called nodes, where each node points to the next node in the collection. Some linked list implementations also have pointers to previous nodes.
Linked lists are more efficient than arrays for inserting and removing items from a collection because you only need to update the pointers around the modified object. The disadvantage is that linked lists do not give direct random access like arrays. You have to traverse the linked list to find a particular element.
Linked lists could be used to navigate through the pages of an ebook, where each page has a pointer to the next page. This example could be modeled in a SQL database with a pages table that has a “next_page_id” column for each page to point to the next page.
CREATE TABLE page (
id INT PRIMARY KEY AUTO_INCREMENT,
next_page_id INT NULLABLE,
content TEXT
);
INSERT INTO page (id, next_page_id, content)
VALUES (1, 2, ‘Page #1’),
(2, 3, ‘Page #2’),
(3, 4, ‘Page #3’),
(4, null, ‘Last page’);
Stack data structure
A stack is a collection of items of the same type, similar to an array, but follows a last-in-first-out (LIFO) model, where the last item added is the first one removed.
One of the most common stacks in software is the undo function in text editors. The text editor tracks each change in a stack, and when the user hits ‘undo’, it rolls back the most recent change. As the user continues to hit ‘undo’, additional changes are selected and rolled back.
You can implement a stack in a SQL database by creating a table for your stack that has either a “created_at” timestamp column or a numerical ID that increments for each created item. One of these is required to identify which item was the last one created.
When your application needs to retrieve the last item in the stack, it selects the record with the most recent “created_at” timestamp or the largest ID. Items are removed from the stack by hard-deleting the database row, or soft-deleting them by populating a “deleted_at” timestamp or a boolean “deleted” column.
CREATE TABLE stack (
id INT PRIMARY KEY AUTO_INCREMENT,
data BLOB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL
);
INSERT INTO stack (data)
VALUES ('Item A'),
('Item B'),
('Item C');
Hard delete:
DELETE FROM stack
WHERE id = 1; -- Delete item with specific ID
Soft-delete:
UPDATE stack
SET deleted_at = CURRENT_TIMESTAMP
WHERE id = 1;
Tree data structure
Trees are used to model hierarchical structures. They function similarly to a linked list, but instead of pointing to a next node, nodes have child nodes. Some types of trees used in computer science are binary search trees, AVL trees, and B-trees.
SQL database indexes are actually implemented with binary trees, making for efficient record lookup with O log(n) time complexity.
Family trees are an everyday example of this data structure. Grandparents would be at the top, and each subsequent level of the tree represents children of the previous generation. You could implement a family tree in SQL by creating a people table where each row has a “mother_id” and “father_id”.
CREATE TABLE people(
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(64),
mother_id INT,
father_id INT,
);
INSERT INTO people (id, name, mother_id, father_id)
VALUES (1, 'Grandparent A', NULL, NULL),
(2, 'Grandparent B', NULL, NULL),
(3, 'Parent A', 1, 2),
(4, 'Parent B', NULL, NULL),
(5, 'Child A1', 3, 4),
(6, 'Child A2', 3, 4),
(7, 'Last Person', NULL, NULL);
Graph data structure
Graphs are a collection of nodes where any node can point at any other node. There is not necessarily any linear or hierarchical structure as found with linked lists and trees.
These are commonly seen with social networks. For example, a user has other users that either follow them and/or are followed by them. Because a user can follow or be followed by an infinite number of other users, implementing this in a SQL database requires a table to manage these relationships. This relationships table would have a “followed_id” and “follower_id.” The ID of the user who is the follower gets populated in “follower_id” and the ID of the followed user is set in “followed_id.”
CREATE TABLE user (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50),
);
CREATE TABLE relationships (
relationship_id INT PRIMARY KEY,
follower_id INT,
followed_id INT,
);
INSERT INTO user (user_id, username)
VALUES (1, 'User A'),
(2, 'User B'),
(3, 'User C'),
(4, 'User D');
INSERT INTO relationships (relationship_id, follower_id, followed_id)
VALUES (1, 1, 2), -- User A follows User B
(2, 1, 3), -- User A follows User C
(3, 2, 1); -- User B follows User A
Queue data structure
A queue is a collection of items of the same type that follows a first-in-first-out (FIFO) model. Queues are used in computer science for processing records in the order they were added to the queue. For example, an email application enqueues messages for the server to process in order.
SQL can support priority queues in the same way as a stack, but instead of processing items with the most recent “created_at” timestamp or largest numerical ID, you would process items ordered by the oldest “created_at” timestamp or smallest numerical ID. Records can be deleted the same way as stacks by hard-deleting the entire row or soft-deleting it.
CREATE TABLE queue (
id INT PRIMARY KEY AUTO_INCREMENT,
data BLOB,,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO queue (data)
VALUES ('Item A'),
('Item B'),
('Item C');
Retrieve the first element inserted in the queue:
SELECT * FROM queue
ORDER BY created_at ASC -- Order by oldest timestamp
-- or ORDER BY id ASC -- Order by smallest ID
LIMIT 1;
Hash table data structure
A hash table, also known as a dictionary, is a collection of keys and values. Each key points to a value. Programmers access values by referencing the appropriate key.
Hash tables are useful for storing data where the value has a descriptive field name. In many ways, a SQL row is already similar to a hash table because each row has columns that function like hash keys and refer to values.
MySQL doesn’t natively support storing hash tables. But MySQL version 8 supports storing JSON, which effectively functions the same way as a hash table with keys and values. When your application pulls a row with a JSON field, it is easy to convert the JSON into a hash object in whatever programming language you’re using.
Conclusion
This was only a brief tour of the most common data structures, and the best way to continue learning about data structures is through practical application. Fire up a text editor and try creating real-world examples of each of these data structures in Python, Java, or whatever your preferred programming language is.