8
votes
When should you really use a NoSQL database?
I've always used Postgres. For a medium-sized app that I work on right now it's running great. Were I to seriously need more throughput I'd either shard (no small task I know) or use CockroachDB (which to my understanding is basically Postgres with built-in sharding and no extension support). Throwing away relationships, constraints, unique compound indices and all of the tools I love that Postgres provides just to get schemaless JSON with high write throughput out of the box doesn't sound like a good deal. But so many people have made the decision to go NoSQL so there must be something I'm missing.
This question is a tad broad. NoSQL is a very broad category, after all - it's ALL types of datastores that do not use SQL as their query interface.
Beyond that, I would say that there are two main types of NoSQL databases, ones that aim to replace the general purpose databases that are the bread and butter of your application, and ones that are more hyperspecialized. For the former, of course, there's Mongo, and for that matter it has fallen out of favor quite a bit. But there are reasons to use Mongo.
For the latter, well, it's because SQL isn't always the interface you want - depending on which two axis of CAP you want to fulfill, sometimes it's just not the best tool for the job. For example, for something like a cache, Redis, a simple key-value store, will do the job MUCH better than postgresql or mysql. For wide-column, heavy writes situations, where availability is less of a concern, something like Cassandra will beat out your traditional RDMBS.
In an even more extreme case, you have something like Elasticsearch, which is all about textual search. If full text search is your goal, it's hard to even compare the abilities of ES vs Postgres.
Now, back to Mongo, in its defense I will say that sharded SQL is a completely different beast. Remember, you can't really do joins across sharding keys (well, safely anyway - lots of fanout queries defeats the purpose of sharding and is also slow as balls). That's a huge limitation on schema design - joins are the lifeblood of relational databases.
Some applications just don't naturally have structures where's there's a good candidate for a column to shard on. I've seen sharded mysql quickly become basically Mongo in some applications, because you can barely do effective joins and you end up point loading the foreign keys on the client doing manual joins. When you can't really do joins, and your queries in general are kneecapped by sharding requirements, Mongo has its advantages (and has the ability to make ACID compliant transactions now).
When you're at Big Big Tech scale, then some kind of horizontal scaling does become important. From what I've seen, sharded SQL (via Aurora, typically) is the more popular solution now. But it does have its sharp edges - it's not going to look like the relational databases at smaller scale. Of course most apps do not reach this scale nor really have to care.
However, even for medium sized applications, it's pretty common to have some niches where a NoSQL database comes into use. Redis for caching and ES for full text search are pretty common ones. It's hard for Postgres or MySQL to ever reach anything near Redis's performance for its niche and of course it just kinda sucks at text search if that's what you need.
I don't know about that. Postgres has a pretty good built-in full-text search! You can create functional indexes over multiple columns in a table in order to make Postgres FTS pretty damn fast. You can weight different columns differently in the search results. Postgres can also weight searches based on the proximity of the search terms in the text.
It is limited in that you can only index the search across one table, so if you need to join tables in your search, ES is probably better, but it comes with its own downsides: now you have to manage essentially two databases and two services.
I've found Postgres full-text search to be brilliant in all the applications I've needed to implement full text search so far.
I've also used it myself on a small database with much success. I expect for much bigger applications a domain-specific db must do better.
Thanks for the insight! Thankfully I’ve never actually needed to shard a database yet, so the shortcomings are only things I’ve heard in passing.
I’m definitely on board with Redis for caching. The out of the box least-recently-used cache eviction is nice. And it can act as a distributed semaphore. It’s really well suited to its use cases.
I remember spending quite a lot of time playing with “nosql” databases and I couldn’t really find any particular use where it was superior to a tradition RDBMS. Especially because the bigger names were stepping up with the features that were making those kinds of data stores popular like JSON compatibility and web apis to make it easier to access via JavaScript.
I’m sure that there are still some niche applications where nosql makes more sense but I don’t know what they are. To the best of my knowledge the idea has largely been supplanted by the ideas in GraphQL. But if I didn’t know how much RDBMS systems had grown in the past decade, I would have said that nosql was just another stupid webdev fad. The one big thing it has over a traditional RDBMS is that it is perceived to be easier to use because it doesn’t require you to learn SQL.