One of the most fundamental choices to make when developing an application is whether to use a SQL or NoSQL database to store the data. Conventional databases, meaning relational databases that use SQL (Structured Query Language) for queries, are the product of decades of technology evolution, good practice, and real-world stress testing. They are designed for reliable transactions and ad hoc queries, the staples of line-of-business applications. But they also come burdened with restrictions, such as rigid schema, that make them less suitable for other kinds of applications.
NoSQL databases arose in response to those limitations. NoSQL systems store and manage data in ways that allow for high operational speed and great flexibility on the part of the developers. Many were developed by companies like Google, Amazon, Yahoo, and Facebook that sought better ways to store content or process data for massive websites. Unlike SQL databases, many NoSQL databases can be scaled horizontally across hundreds or thousands of servers.
The advantages of NoSQL don’t come without a cost, though. NoSQL systems favor speed and scalability over the ACID properties behind reliable transactions promised by SQL databases. And the metaphors used to work with data in NoSQL systems are also relatively new, compared to the decades of institutional knowledge built up around SQL.
SQL and NoSQL databases offer different tradeoffs. While they may compete in the context of a specific project—as in, which to choose for this application or that application—they are complementary in the bigger picture. Each is suited to different use cases. The decision is not so much a case of either/or as it is a question of which tool is right for the job.
NoSQL vs. SQL
The fundamental difference between SQL and NoSQL is not all that complicated. Each has a different philosophy for how data should be stored and retrieved.
With SQL databases, all data has an inherent structure. A conventional database like Microsoft SQL Server, MySQL, PostgreSQL, or Oracle Database uses a schema—a formal definition of how data inserted into the database will be composed. For instance, a certain column in a table may be restricted to integers only. As a result, the data recorded in the column will have a high degree of normalization. The rigid schema of a SQL database also makes it relatively easy to perform aggregations on the data, for instance by combining data from two tables using the SQL
With NoSQL, data can be stored in a schema-less or free-form fashion. Any data can be stored in any record. Among the NoSQL databases, you will find four common models for storing data, which lead to four common types of NoSQL systems:
- Document databases (e.g. MongoDB). Inserted data is stored in the form of schema-less JSON structures, or “documents,” where the data could be anything from integers to strings to free-form text. There is no inherent need to specify what fields, if any, a JSON document will contain.
- Key-value stores (e.g. Redis). Free-form values, from simple integers or strings to complex JSON documents, are accessed in the database by way of keys, such as strings.
- Wide column stores (e.g. Cassandra). Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
- Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their relationships, where each node in the graph is a free-form chunk of data.
Schema-less data storage is useful in the following scenarios:
- You want fast access to the data, and you’re more concerned with speed and simplicity of access than reliable transactions or consistency.
- You’re storing a large volume of data, and you don’t want to lock yourself into a schema, as changing the schema later could be slow and painful.
- You’re taking in unstructured data from one or more sources, and you want to keep the data in its original form for maximum flexibility.
- You want to store data in a hierarchical structure, but you want those hierarchies to be described by the data itself, not an external schema. NoSQL allows data to be casually self-referential in ways that are more complex for SQL databases to emulate.
Querying NoSQL databases
The Structured Query Language used by relational databases provides a uniform way to communicate with the server when storing and retrieving data. SQL syntax is highly standardized, so while individual databases may handle certain operations differently (e.g., window functions), the basics remain the same.
By contrast, each NoSQL database tends to have its own syntax for querying and managing the data. CouchDB, for instance, uses requests in the form of JSON, sent via HTTP, to create or retrieve documents from its database. MongoDB sends JSON objects over a binary protocol, by way of a command-line interface or a language library.
Some NoSQL products can use SQL-like syntax to work with data, but only to a limited extent. For example, Apache Cassandra, a wide column store, has its own SQL-like language, the Cassandra Query Language or CQL. Some of the CQL syntax is straight out of the SQL playbook, like the
INSERT keywords. But there is no native way to perform a
JOIN or subquery in Cassandra, and thus the related keywords don’t exist in CQL.
A design choice common to NoSQL systems is a “shared-nothing” architecture. In a shared-nothing design, each server node in the cluster operates independently of every other node. The system doesn’t have to get consensus from other nodes to return data to a client. Queries are fast because they can be returned from whichever node is closest or most convenient.
Another advantage of a shared-nothing system is resiliency and scale-out expansion. Scaling out the cluster is as easy as spinning up new nodes in the cluster and waiting for them to sync with the others. If one NoSQL node goes down, the other servers in the cluster will continue to chug along. All of the data remains available, even if fewer nodes are available to serve requests.
Note that a shared-nothing design is not exclusive to NoSQL databases. Many conventional SQL systems can be set up in a shared-nothing fashion, such as MySQL, although that typically involves sacrificing consistency across the cluster for performance.
If NoSQL provides so much freedom and flexibility, why not abandon SQL entirely? The simple answer is that many applications still call for the kinds of constraints, consistency, and safeguards that SQL databases provide. In those cases, some “advantages” of NoSQL may turn to disadvantages. Other limitations stem from the fact that NoSQL systems lack certain features one takes for granted in the SQL space.
Even if you’re taking in free-form data, you almost always need to impose constraints on data to make it useful. With NoSQL, imposing constraints involves shifting the responsibility from the database to the application developer. For instance, the developer could impose structure through an object relational mapping system, or ORM. But if you want the schema to live with the data itself, NoSQL does not typically support that.
Some NoSQL solutions provide optional data typing and validation mechanisms for data. Apache Cassandra, for instance, has a slew of native data types reminiscent of those found in conventional SQL.
NoSQL systems offer the option of trading strong or immediate consistency for better availability and performance. Conventional databases ensure that operations are atomic (all parts of a transaction succeed, or none do), consistent (all users have the same view of the data), isolated (transactions don’t compete), and durable (once completed they will survive a server failure).
These four properties, collectively referred to as ACID, can be handled differently in NoSQL systems. Instead of demanding strong consistency across the cluster, which would necessarily delay responses to requests, you can opt for eventual consistency, which allows requests to be served without waiting for the latest writes to be copied to other nodes in the cluster. Data inserted into the cluster is eventually available everywhere, but you can’t guarantee when.
For some NoSQL systems, you can choose one of a number of compromises between consistency and speed, although what’s available will vary between products. Microsoft’s Azure Cosmos DB, for instance, lets you select a level of consistency per request, so you can choose the behavior that fits your use case. Transaction semantics, which in a SQL system guarantee that all steps in a transaction (e.g. executing a sale and reducing inventory) are either completed or rolled back, are available in some NoSQL systems, such as MongoDB.
Most NoSQL systems are conceptually similar, but implemented differently. Each tends to have its own metaphors and mechanisms for how data is queried and managed.
One side effect of that is a potentially high degree of coupling between the application logic and the database. This coupling isn’t so bad if you pick one NoSQL system and stick with it, but it can become a stumbling block if you change systems down the road.
If you migrate, say, from MongoDB to CouchDB (or vice versa), you must do more than just migrate data. You must also navigate the differences in data access and programmatic metaphors. In other words, you must rewrite the parts of your application that access the database.
Another downside to NoSQL is the relative lack of expertise. Where the market for conventional SQL talent is quite large, the market for NoSQL skills is nascent.
For reference, Indeed.com reports that as of 2022, the volume of job listings for conventional SQL databases—MySQL, Microsoft SQL Server, Oracle Database, and so on—remains higher than the volume of jobs for MongoDB, Couchbase, and Cassandra. The demand for NoSQL expertise remains a fraction of the market for SQL skills.
Merging SQL and NoSQL
We can expect some of the differences between SQL and NoSQL systems to disappear over time. Already many SQL databases now accept JSON documents as a native data type, and can perform queries against that data. Some even have native ways to impose constraints on JSON data, so that it is handled with the same rigors as conventional row-and-column data.
On the flip side, NoSQL databases are adding not only SQL-like query languages, but also other features of traditional SQL databases, such as MongoDB’s ACID properties.
One likely path is that future generations of database, as well as future versions of current database systems, will straddle the paradigms and offer both SQL and NoSQL functionality, helping to make the database world less fragmented. For example, Microsoft’s Azure Cosmos DB uses a set of primitives under the hood to interchangeably reproduce the
behaviors of both kinds of systems. Google Cloud Spanner combines SQL and strong consistency with the horizontal scalability of NoSQL systems.
Still, pure SQL and pure NoSQL systems will have their place for many years to come. Look to NoSQL in scenarios where design flexibility, horizontal scalability, and high availability are more important considerations than strong read consistency and other safeguards common to SQL databases. For many applications, those safeguards may well be worth trading for what NoSQL offers.
Copyright © 2022 IDG Communications, Inc.