Acording to the DB-Engines ranking, four of the most popular database management systems are of the relational type. They do take up the lion's share of the market – hence, they're often also the only ones a beginner is aware of. However, there are multiple types of databases, each representing a different way of handling data storage.
Understanding their distinctions and unique selling points is crucial for making the right choice. We have prepared a detailed guide on database types to help you know your options as the back of your hand.
What came before the modern database
The early types of databases had rather limited functionalities. At first, computer enthusiasts used what they called flat databases, which were basically plain text files. This means the data has to be of a textual format and somewhat modest length. To mark the start of each new field, the programmer had to type a delimiter – a special character, chosen to define the border (e.g., a comma or a colon). As there are no relations between the fields, a flat database is hard to search and navigate. However, it works for a small amount of data that only needs to be read and not manipulated. To see basic examples of databases of this type, see CSV (Comma Separated Values) files.
In 1960s, IBM introduced hierarchical databases. As the name itself suggests, the records are connected by a tree structure, based on parent-child relationships. One item can only have one parent, while one parent can have multiple children. This was the first step toward relational databases. However, the implicitness of the hierarchical relation does not work well for all types of records, making it tricky to organize data in some cases. To solve this, a decade later Charles William Bachman III presented a more flexible model called network databases. They still had a tree structure, but children could now have multiple parents as well. However, by now network databases are virtually extinct: most companies that used them jumped on the relational database bandwagon as soon as it arrived.
Relational vs. non-relational database
When looking at the modern types of databases, relational ones are clearly the most prominent. Edgar F. Codd coined the term in 1962 while working at IBM. MySQL, PostgreSQL, or SQL Server are all great relational database examples. Their names also contain a hint: to access and manipulate the data, you need to know SQL (Structured Query Language). SQL has well-established standards and allows your data to be easily portable.
When using a relational database, you can store the data in tables that consist of columns and rows. Every row represents an individual record, and a column stands for a field with a data type assigned to it. Tables that contain related information can be linked with primary and foreign keys.
Now, in the recent years, the non-relational databases have experienced some impressive rise as well. The main reason for this is the growing need for unstructured data storage. In the age of big data, we often need to deal with information diversity. Data now can also mean images, videos, and even posts on social media networks. To work with non-tabular data, you need a non-relational database. Developers sometimes refer to them as NoSQL databases: unlike relational ones, they do not support SQL queries.
There are four types of databases that do not use the relational model. Based on your choice, you can store your data as documents, key-value pairs, graphs, or column families.
In a document oriented database (which is often simply called a document store), the data is kept in document collections, usually using the JSON, XML, or BSON formats. One record can hold as much data as you need, in any data type (or types) you prefer – there are no constraints. There is a certain internal structure within a single document, however, it can differ from one document to the other. You can nest them as well.
Out of all the non-relational types of databases, document stores are the most popular. The best example could be MongoDB, which currently has over 400 million downloads globally. Initially introduced in 2008, it is now used by industry giants like Barclays and Bosch. Developers like its smooth learning curve and superior agility. You can use a free Community version and a paid Enterprise one – both run on Windows, Linux, and macOS.
As the name itself suggests, each record in this kind of a non-relational database has a key and a value. Similarly to a dictionary, the key can be used to identified the value. It really is as simple as that. Developers mostly use key-value databases when the data they're dealing with is not too complex and speed is a priority. For example, it is a great choice for storing configuration data.
The stored data is assigned no schema, and the database itself is much more lightweight when compared to a relational one. This also makes it one of the best types of databases for embedding. As of 2020, the most popular key-value database is Redis. It was also voted the most loved database in the StackOverflow yearly survey for three years in a row (2017, 2018, and 2020).
Using a graph database, you have two types of data to handle. Nodes stand for the items in the database, and edges define their relationships, also called graphs. At the first glance, graph databases seem similar to the old-timey network databases, and yet there is one distinction. The network databases fell short in terms of abstraction, which is modelled much more professionally in graph databases like Neo4J or Dgraph.
Out of all the types of databases, this one is the best option in cases when the relationships and their analysis is a priority. However, graph databases have one clear disadvantage: while you do need a query language to access the data, you can use neither SQL, nor any other universally adopted approach. The lack of standardization means most of the query languages can only be used in one or a few types of graph databases.
Column store databases
The last one of the non-relational database types is called a column store database, a column family database, or a wide column store. What makes them a good option for handling big data is fast performance, efficient data compression, and great scalability.
Instead of a schema found in relational databases, column store databases use the keyspace to store the column families. Similarly to a table, a column family contains columns and rows. Yet, there's a clear difference: in this case, a column does not span across all the rows. Instead, it is contained in a row, which also means different rows can have different columns. Apart from columns, each row also has an identifier, called a key, and every column holds a name, a value, and a timestamp. A few good examples of databases using the column family model are Cassandra, Vertica, and Druid.
Types of databases: what's next?
In 2011, Matthew Aslett was the first to use the term NewSQL. What it refers to is the newest generation of data storage solutions: ones that combine the scalability of NoSQL with the ACID-compliance of relational databases. ACID stands for Atomicity, Consistency, Isolation, and Durability – the most crucial principles for data storage. One way to get the best of both worlds is getting rid of the general-purpose ideal and trying to deal with one task perfectly – for example, MemSQL deals specifically with clustered analytics.
According to The Economist, data is the new oil – therefore, it's only natural the choice of database types constantly keeps growing. While relational databases are still the most popular, different cases do require different tools. We hope our detailed guide has shed some light on this topic – after all, understanding various types of databases does makes it easier to make a better choice.