The polite interpretation of "NoSQL" has become Not Only SQL
. If you have data that is indeed truly relational, or if your functionality depends on things like joins and ACIDity, then you should store that data in a relational way. In this post, I'll explain how I use MySQL alongside two NoSQL data stores. Modern, web-scale data storage is all about understanding how to pick the best tool(s) for the job(s).
That said, NoSQL is really a reaction to the fact that the relational method and way of thinking has been applied to problems where it's not actually a very good fit (typically huge tables with tens of millions of rows or more). Once tables get that large, the typical SQL "best practice" has been to manually shard the data -- that is, putting records 1 through 10,000,000 in table A, 10,000,001 through 20,000,001 in table B, and so on. Then, typically in the application model layer, the lookups are performed according to this scheme. This is what's called application-aware
scaling. It's time-intensive and error prone, but to scale something up while maintaining MySQL for the long table store, it's become a more or less standard MO. NoSQL represents, to me, the application-unaware
alternative.
Key-Value
When I had a MySQL prototype start getting too big for its own good, I personally moved as much data as possible to the lightning-fast Membase, which outperforms Memcached and adds persistence. Membase is a distributed key-value store that scales more or less linearly (Zynga uses it to handle a half-million ops per second, for instance) by adding more commodity servers into a cluster -- it's therefore a great fit for the cloud age of Amazon EC2, Joyent, etc.
It's well known that distributed key-value stores are the best way to get enormous, linear scale. The weakness of key-value is queryability and indexing. But even in the relational world, the best practice for scalability is to offload as much effort onto the application servers as possible, doing joins in memory on commodity app servers instead of asking the central RDB cluster to handle all of that logic. Since simple select
plus application logic
are really the best way to achieve massive scale even on MySQL, the transition to something like Membase (or its competitors like Riak) isn't really too bad.
Document Stores
Sometimes -- though I would argue less often than many think -- an application's design inherently requires secondary indices, range queryability, etc. The NoSQL approach to this is through a document store
like MongoDB. Like Membase, Mongo is very good in some areas where relational databases are particularly weak, like application-unaware
scaling, auto-sharding
, and maintaining flat response times even as dataset size balloons
. It's significantly slower than Membase and a bit trickier to do pure horizontal scale, but the benefit is that it's highly queryable. You can query on parameters and ranges in real time, or you can use Map/Reduce to perform complex batch operations on truly enormous data sets.
On the same project I mentioned above, which uses Membase to serve tons of live player data, we use MongoDB to store analytics/metrics data, which is really where MongoDB shines.
Why to keep things in SQL
I touched briefly on the fact that 'truly relational' information should stay in relational databases. As commenter Dan K. points out, I missed the part where I discuss the disadvantages of leaving RDBMS, or at least of leaving it entirely.
First, there's SQL itself. SQL is well-known and has been an industry standard for a long time. Some "NoSQL" databases like Google's App Engine Datastore (built on Big Table) implement their own SQL-like language (Google's is called, cutely, GQL for Google Query Language
). MongoDB takes a fresh approach to the querying problem with its delightful JSON query objects. Still, SQL itself is a powerful tool for getting information out of data, which is often the whole point of databases to begin with.
The most important reason to stay with RDBMS is ACID, or Atomicity, Consistency, Isolation, Durability
. I won't re-hash the state of Acid-NoSQL, as it's well-addressed in this post on SO. Suffice it to say, there's a rational reason Oracle's RDBMS has such a huge market that isn't going anywhere: some data needs pure ACID compliance. If your data does (and if it does, you're probably well aware of that fact), then so does your database. Keep that pH low!
Edit: Check out Aaronaught's post here. He represents the business-to-business perspective far better than I could, in part because I've spent my entire career in the consumer space.