Portuguese Version: https://inside.contabilizei.com.br/quando-utilizar-nosql-sql-9c9255cd5f12
Note: First of all, I would like to clarify that everything I put in this article is based on professional experiences in projects that I had the honor to participate that gave me this level of knowledge. From now on I make myself available for contact and discussion on the subject.
It is a fact that NoSQL came to break a series of paradigms and solve a series of problems that were understood as fact and / or routine. When such disruptive technology comes to market, it certainly meets something that was already a necessity.
NoSQL databases have arrived offering a more flexible, scalable, and less expensive alternative to what the relational database offers. Everyone should ask the following questions before starting any project:
- To what extent can NoSQL actually replace relational databases?
- What problems can we have with this type of structure?
- What alternatives do I have in the use of each?
The bad decision of this architecture can impact one of the most important things a company can have, its data and information.
SQL (Structured Query Language):
SQL is the default language for managing relational databases and manipulating your data. Until the arrival of NoSQL any database required the definition of a previously designed, standardized data structure that would allow the storage of data by linking tables through the relationship of the primary keys with foreign keys and SQL was the means of communicating and set this information.
Large companies emerged from this demand, they set out to contemplate the need and importance of storing data safely and efficiently. At first it was expensive to pay for it, but with the passing of the years came free solutions that hit the front very competitively.
An SQL database, if defined correctly, makes it impossible to duplicate information through its structure. This is a very important feature as it ensures the integrity of the data and its individual identity.
The term NoSQL was first used in 1998 as the name of a non-relational open source database. But only from 2009 the term was used to talk about non-relational databases in general. The term only came after Eric Evans (a Rackspace employee) decided to use it at an event that discussed distributed distributed databases.
Unlike the relational database, a NoSQL database is horizontally scalable, allowing much more traffic by sharding (data partitioning).
There are four main points that differentiate NoSQL databases from relational:
- Data Models: A NoSQL database allows you to create an application without having to define the schema first, unlike relational databases, that cause you to define your schema before you can add data to the system. However, information duplicity is very common depending on the structure of the development or the data refresh rate.
- Data structure: Relational databases were built in an era in which data was reasonably structured and clearly defined by their relationships. NoSQL databases are designed to handle unstructured data (for example, texts, social media posts, video, email), which make up much of the data that exists today.
- Sizing: It is much cheaper to scale a NoSQL database than a relational database because you can increase capacity by scaling up cheap commodity servers. Relational databases, on the other hand, require a single server to host your entire database. To scale, you need to buy a larger, more expensive server.
- Development model: NoSQL databases are open source, while relational databases are generally closed source, with license fees built into the use of their software. With NoSQL, you can start a project without heavy investments in software fees in advance.
Another extremely important point that can affect in deciding which model (SQL / NoSQL) to use, is the impedance mismatch. An impedance mismatch may occur when accessing a relational database in an object-oriented programming language (OO). Problems can arise because OO programming languages, such as Java, C ++ or Python, have very different approaches to accessing data. The point is that relational databases work fundamentally differently than the application code. A relational database is concerned with two things: records and relationships between records, as denoted by foreign key constraints. The application code works with all kinds of complicated structures and data techniques: hierarchies, iteration, chart path. Therefore, when you create a second parallel data model that works for your application code, it is not so easy to translate it into the existing data model expressed by the database schema.
Some of these differences include:
- Object-oriented languages make heavy use of attributes by reference, while this is typically prohibited in relational databases. Scalar types also often differ between the database and OO languages.
- In OO languages, objects can be composed of other objects, while this is impossible in integrity in relational database languages.
- Relational databases have well-defined primitive operations to manipulate and query data, while OO languages have lower-level operations.
- Relational databases have more robust approaches to transactions to preserve atomicity and consistency. The only way to guarantee this through an OO language is at the field level of primitive types.
Methods to mitigate impedance mismatch include the use of NoSQL databases and the design of relational databases with object-oriented programming languages, in addition to paying attention to the differences between OO languages and code-relational databases a project.
But which is the best choice?
It depends! That’s right, it depends! It seems like a vague answer, but it is critical to know what the features of your project will be prior to that definition. I will bring here the learning acquired from a great leader who went through my career, saying: “Make the best of each tool”, and for me that is the secret.
If there is no defined schema and if the IO is down, do not think twice about choosing NoSQL.
If your business does not require a clear definition of a database schema or you do not need multi-line storage, your choice should clearly be NoSQL because you will have an economic structure that does not require a lot of paperwork. In addition, you will benefit from the incredible speed that this structure will give you.
But if you need the benefits that a relational database gives you and also have extremely costly information for consultation or storage, or need some of the data to have attributes that can be varied, I suggest the mixed use of those structures.
Keep the data mutable in relational database and store the fixed in NoSQL.
To illustrate this information, I will bring a common example in Brazil that is the storage of an electronic invoice. For several moments I came across storage of the XML file in relational database. And in all cases, this storage was responsible for the largest slice of the database’s consumed size. This can easily be solved by linking this information, storing only necessary information in the relational database and the XML itself in NoSQL.
Taking the case of the NFe as an example, we could have just the key of the note and the link with the order table inside a relational database and a record in a NoSQL where the index is exactly the key of the note and the content the XML itself .
For large SQL queries, make a NoSQL summary
This type of situation is very common in several scenarios. The most common is the famous BigData where you have a summary with all the necessary information in a table. To do this, use NoSQL! This choice will save you from several headaches.
Let’s bring a practical example where you have a business that stores your company’s sales. Imagine that you have a screen on any system that needs to bring all the information of a customer, including orders, products, notes, tracking of orders, payment data, service signatures, signature details, etc …
Thought? Now think about the query that the API will make in your database to bring all this information as more data arrives in the related tables. The result will definitely not be enjoyable if you have just using relational database. For this, I think the best solution is to evaluate the ideal period for updating data, store all this information in a summarized way in an indexed NoSQL, in this case, with a customer ID.
When I say it in a summarized way, it is really summed up! That is, it is not a replica of data, but a summary with the information needed for one or more requesters.
The choice depends on several factors, for example, the type of data being analyzed, how much data it has and how quickly you need it. For example, for applications such as user behavior analysis, the relational database is best for social media, text, or geographic data applications that require large amounts of text mining or image processing, NoSQL type data works best.
So, we can say that both SQL and NoSQL have their advantages and disadvantages and to define the best model (which can be SQL, NoSQL or both) depends on the application, the way of development and the structure planned for the environment.
I hope I have helped in some way, now the challenge lies in your hands. It’s up to you to come to your own conclusion!