When you stumbled onto this post, you were probably hoping for some simple advice that would help you choose correctly every single time, but there really isn’t an absolute answer to this question. The decision generally depends on the application and how that application will be developed and used. I know, developers really hate answers like that. Just tell me how to choose! has been a refrain I’ve heard over and over again in my time as a technology architect. My goal is to help you learn how to choose, not to make the choice for you or to give you a step-by-step decision tree, because it’s never that easy. Let’s get started.
It helps a lot to really invest the time to understand the architectural differences between relational and non-relational database products, which go well beyond the rather superficial differences conveyed by the term relational. There are some general differences in approach that characterize the SQL vs no-SQL worlds, but you’ve really got to compare specific technologies (e.g. PostgreSQL compared to MongoDB, or MySQL compared to DynamoDB) because as these products continue to evolve there is more and more overlap between them in areas that used to be key differentiators.
In comparing, be sure you’re thinking about the needs of your application and not just comparing the capabilities in the abstract. Think about the patterns of data query and update in your application and consider how this will be accomplished with a given database. A significant amount of your thought and effort in developing and maintaining an application will go into the work of getting your data into and out of your database of choice reasonably efficiently (relative to the expectations of your users), and the time you invest to learn how that will work for different database choices will pay off handsomely.
Learn the CAP theorem and how it applies to the database products you’re considering. Learn about transactions and how transactions support consistency and over what scopes of changes to the structures stored in the database. Learn about how a particular database choice scales to accommodate a growing data set. As a developer you don’t need to have a deep understanding of all of these different aspects of the choice, but you really do need to have a basic understanding in order to make good choices for yourself or to help your team make good choices.
All of that said, there are some key considerations that often help in making a decision.
- Relational databases have been around for a long time and there are many excellent and technically comparable implementations, which are well understood and supported. There’s a lot of collective know-how that you can leverage in getting things done with relational databases. There are some great communities of support around NoSQL databases, but knowledge of SQL and relational databases is unquestionably more pervasive.
- Non-relational databases tend to be very flexible about data modeling, allowing you to store almost any data structure without having to statically define that structure through a schema. This allows the database structure to evolve somewhat more easily and naturally with the evolution of your application, and is often cited as the reason to choose a non-relational approach. That said, there are lots of nice approaches available these days to support schema migration as your application evolves, which eases the burden if your choice is to use a relational database.
- It used be the case that transaction support was a key differentiator. Because non-relational databases tend to be really good at (horizontal) scaling and partitioning, the CAP theorem implies that they can’t be quite as good at consistency, which is where transactions come in. You can’t just say database A supports transactions but database B does not, because there are shades of support across most database products. You have to think about the transactions your application needs to perform on your data and consider your consistency requirements. A very large set of applications that perform relatively basic CRUD operations, even on rather sophisticated data structures, can be implemented quite successfully with the “less capable” transaction support of almost any NoSQL database.
- In enterprise situations, there are a host of other factors to consider because no application database is an island in an enterprise. Here you’ll want to chat with the person who plays the role of enterprise architect to make sure that all of those larger enterprise-y considerations (like distributed transactions, data integration, enterprise-perspective consistency across databases, and many many more) are properly addressed. Also some enterprises tend to favor singular technology choices, and that choice is rarely left to development teams.
- If your database is going to be very large or distributed to more than one location, or needs to support a very large set of concurrent users, you should spend a lot more time consulting with experienced database architects before making any decision about database choice.
Like so many architecture decisions, good technology architects rely on a very broad set of knowledge along with experiences and subtle nuances when making choices about database technologies. As a developer, it will serve you well to collect experiences to better inform your understanding of how these choices really work in practice. When you have the freedom to choose for yourself, be sure to take the opportunity to expand your experience by making a different choice every once in a while.
Lastly, don’t get too hung up on the making the “correct” choice. Except in those extremes mentioned in item 5 above, few of the decisions that you might make about database choice will be categorically wrong, because most database products that you might choose can successfully support a very wide range of applications. There might be some significant advantages in one choice versus another, but it can often be difficult to appreciate those advantages until you’ve had a lot of experience working in many different camps.