What is a database? Databases 101

Introduction

Before the time of databases, electronic data was stored in flat files in an hierarchical structure within directories and subdirectories. To overcome the inefficiencies of this method, the Database Management System (DBMS) was developed by Charles Bachman in the 1960s. It was originally called Information Management System (IMS). The evolution of DBMS has seen significant advancements from that period. This is driven by the need for more efficient data storage, improved data management, and the emergence of new technologies. Some DMBS are open source, while others are closed source.

What is a database and what are databases used for?

A database is a collection of data that is structured, organized, and stored in a manner that makes it easy to access, retrieve, and manage information.

To simplify things, think of a database as a virtual filing cabinet for storing and organizing information. It is like a collection of spreadsheets, where each spreadsheet represents a table with rows and columns. Each row in a table holds specific information about something, like a product on an ecommerce site, and each column represents a specific attribute or characteristic of that thing (such as the price of the product).

Databases have a wide range of applications, from small-scale personal projects to large-scale enterprise systems. If a project involves data, a database is needed. Examples include business applications where databases are used for customer relationship management (CRM), enterprise resource planning (ERP), inventory management, sales and marketing, human resources, and financial systems. They play a crucial role in healthcare systems by enabling the storage of patient records, medical histories, diagnostic reports, and treatment plans.

Types of databases

There are different types of databases, depending on their methods of operation and use cases. Relational databases, object-oriented databases, NoSQL databases, graph databases, and distributed databases serve specific data management needs and requirements. An increasing number of database providers like PlanetScale now offer DBaaS (Database-as-a-Service) solutions. Such solutions offer cloud database systems where clients do not have to install a database software, set up their own hardware, or hire database administrators to manage their infrastructure.

Relational databases

Relational database management systems (RDBMS) are one of the most popular types of databases. They are based on the concept of relationships between data entities. These databases organize data into tables, with each table representing a specific entity or concept. Database applications like MySQL, PostgreSQL, Oracle Database, Microsoft SQL server, and IBM Db2 are examples of RDBMS. The relationships between tables are established through keys, allowing for efficient data retrieval and manipulation. Relational databases are recognized by their characteristics, which are:

Tabular structure, where tables are entities, each row represents a unique record of the entity, and each column represents a specific attribute of that record.
Primary and foreign keys, where primary keys uniquely identify each record, and foreign keys establish relationships between tables by referencing the primary table of another table.
Use of structured query language (SQL) to query the databases. Oracle Databases, however, use procedural language SQL (PL/SQL), which is simply an extension of SQL.
Their capability to scale and handle heavy workloads. Due to their resilience, they are used at companies such as Twitter, Facebook, Airbnb, Uber, LinkedIn and more, to handle heavy-traffic applications and manage massive volumes of data with ease.
ACID compliance, which gives SQL an edge over NoSQL by ensuring reliable processing, data integrity, and complex relationship management in a single transaction.

Non-relational databases

Non-relational databases, also called NoSQL (Not only SQL) databases provide an alternative approach to data storage and management. They have the ability to store data that do not come in the traditional table and column format. Types of NoSQL database applications and their examples are document databases (MongoDB, Couchbase), key-value stores (Redis, Amazon DynamoDB), wide-column stores (Apache Cassandra, Apache HBase), graph stores (Neo4j, Amazon Neptune), and time series database (InfluxDB, Prometheus). NoSQL databases are recognized by their characteristics, which are:

The ability to handle unstructured data, semi-structured data, and structured data, making them suitable for handling a wide range of data types.
Having high scalability and performance, especially while dealing with enormous amounts of data and high read/write loads. This is achieved by distributing data across multiple nodes to achieve throughput and horizontally scalability.
Having a high tolerance for dynamic and schema-less data models, where each record can have different attributes or structures.

Object-oriented databases

Object-oriented databases (OODBs) are a type of database management system that extends the concepts of object-oriented programming to data storage and management. They store data in the form of objects, which encapsulate both data and the operations that can be performed on that data. Object-oriented databases are characterized by:

Having a data model where data is represented as objects with attributes (data) and methods (operations).
Providing persistent storage for objects, allowing them to be stored beyond the runtime of an application.
Excelling at managing complex relationships between objects, which could be one-to-one, one-to-many, and many-to-many relationships through object references, navigation, and object traversal.

Distributed databases

Distributed databases are designed to store and manage data across multiple interconnected nodes or servers. Characteristically, they make up more than one system, and data processing and storage are distributed across several systems. This allows for improved performance, scalability, fault tolerance, and data availability. Distributed databases are characterized by:

Partitioning and storing data across multiple nodes in a network.
Having a great capacity for scalability. They offer horizontal scalability by allowing additional nodes to be added to the system, thereby increasing the processing power and storage capacity.
Providing fault tolerance in case one or more nodes fail. That way, they can continue to function without data loss or downtime.

Data warehouses

A data warehouse is a central repository that stores structured, historical, and integrated data from various sources within an organization, and is designed for data analytics. It is designed to support business intelligence (BI) and decision-making processes by providing a consolidated view of data for analysis and reporting. Popular warehouses include AWS Redshift, Microsoft Azure Synapse Analytics, and BigQuery. An increasing number of them are becoming cloud database warehouses, as they store and analyze large volumes of data in the cloud. Data warehouses are characterized by:

Ability to integrate data from multiple operational systems, such as transactional databases, CRM systems, ERP systems, and external sources, and optimize it for data analytics.
Use of cloud storage for storing and analyzing big data in recent times.
Being a foundation for business intelligence and reporting. They provide a consolidated view of data that can be used for ad hoc queries, predefined reports, dashboards, and data visualizations to support decision-making processes at various levels within an organization.

Graph databases

Graph databases are a type of NoSQL database that is designed to represent and store data in the form of nodes and relationships. They are particularly well-suited for situations where connections between entities are crucial and excel at managing densely linked data. They allow for the management of complex hierarchies. Some allow for one-way relationships (directed edges) between nodes. Graph databases are characterized by:

Representing entities with nodes, and relationships representing the connections or associations between those entities.
Providing efficient traversal and querying capabilities by enabling pattern matching and pathfinding, making it easy to navigate through the graph.
Not having a predefined structure or schema, which supports flexibility.
Components that make up a database
A database typically consists of several key components that work together to store, manage, and organize data.

Data models

Data models define the structure, relationships, and constraints of the data stored in a database. As discussed earlier, the main types of database models are relational, non-relational, entity-relationship data model, and object-based data model.

Tables

Tables are the foundation of a relational database. They are made up of rows (records) and columns (fields) and represent entities or concepts. Each column defines a specific attribute of the entity. For example, a user entity might have the columns: name, age, and email. The actual data values (e.g. Tim, 52, me@test.com) for each attribute are included in rows. Using keys, linkages between tables are created, providing for the consistency and integrity of data.

Queries

Queries are used to retrieve and manipulate data within a database. They allow users to interact with the database and perform operations such as retrieving specific data, filtering, sorting, aggregating, and updating records. The most common query language is SQL.

Advantages of using databases

Using databases offers a lot of advantages that contribute to efficient and effective data management.

Data organization and structure

Databases provide a structured and organized way of storing data. Relational databases are organized in rows and columns, and NoSQL databases employ different data models such as key-value, document, etc. to ensure efficient storage, retrieval, and management of data.

Data consistency and integrity

Databases enforce data integrity by applying constraints, validations, and relationships. They ensure that data is accurate, valid, and consistent, preventing the entry of duplicate or conflicting information.

Data scalability and performance

Databases are designed in a manner in which they can handle large volumes of data and scale to accommodate growing datasets and data requirements.

Conclusion

Databases are essential tools for efficient information management. They enable structured storage, management, and retrieval of data, ensuring data integrity and consistency. With various types of databases available, organizations can choose the most suitable one based on their specific requirements.

If you’re interested in learning more about databases and database solutions, you can read more articles on the PlanetScale blog or sign up for a PlanetScale account.