I\'ve been looking at MongoDB and I\'m fascinated. It appears (although I have to be suspicious) that in exchange for organizing my database in a slightly different way, I get a
If your data does not take advantage of relational algebra, nor do you need ACID guarantees, then you don't gain anything by using languages that cater exclusively for those uses.
How often do you think Facebook does arbitrary queries against its datastore(s)? Not everything is a web app, and conversely not every set of data needs to be analyzed deeply.
NoSQL in my opinion, is largely a reactionary response to what basically amounted to people using RDBMS for tasks they were not well suited because people didn't actively make a decision based on their needs and chose some default. To "jump ship from MySQL" (or RDBMSs in general) industry-wide would be to make the same mistake all over again and the pendulum will end up swinging back the other way.
If MongoDB works for your use case, by all means go ahead. Just don't assume your use case is all use cases. There is no technology that fits all scenarios. The invention of the supersonic jets didn't eliminate the use of freight trains.
I have used MongoDB, Redis (more than key-value pair supports list, set and sorted set), Tokyo Tyrant, Memcached and MySql & PostgreSQL.
The arguments between NoSQL DB And SQL based DB are completely baseless. You need to choose the appropriate model based on your use case.. If you need ACID compliances, go ahead with SQL DB like PostgreSQL, Oracle etc. You need high performance, but you less care about data, then you may consider noSQL DB. They are fundamentally different technologies. You can even use the combination of models. With NoSQL, you will be missing relationships, constraints and sometimes transaction.. In fact, thats is the one of the reason NoSQL are faster..
Once I have lost two months of aggregate data with MongoDB.. No clue how I lost them..But I had backup and I have lost few minutes of data. I brought back MongoDB with backup.. If you use NoSQL, take occasional backup or schedule cron jobs for DB backup. This is applicable for SQL DB also.
Compared to SQL RDBMS, NoSQL DBs are younger and they are currently under full fledged development but NoSQL DBs are matured in their scope ie they meant for high performance, easy replication.
In my website(stacked.in), I have used only redis DB, it works much much faster than MySQL.
Let me hit the questions one at a time:
I know I can't do transactions across relationships... when would this be a big deal?
Picture cascading deletes. Or even just basic referential integrity. The concept of "foreign keys" can't really be enforced across "collections" (the Mongo term for tables). You can do atomic writes to only a single "document" (AKA record). So if you have a DB issue, you can orphan data in the DB.
I get as much performance as I have CPUs and RAM for free?
Not free, but definitely with a different set of trade-offs. For example, Mongo is great at running single-record, key/value look-ups. However, Mongo is poor at running relational queries. You'll need to use map-reduce for many of these. Mongo is a "RAM-whore". Mongo basically demands 64-bit for any significant dataset. Mongo will suck up drive space, load up a 140GB DB and you can end up using 200+ GB as the swap file grows during use.
And you're still going to want a fast drive.
In fact, I think it's safe to say the MongoDB is really a DB system that caters to leading-edge hardware (64-bit, lots of RAM, SSDs). I mean, the whole DB is centered around looking up data index data in RAM (hello 64-bit) and then doing focused random lookups on the drive (hello SSD).
why ... doesn't the entire industry jump ship from MySQL?
As I understood it, as you scale, you get MySQL to feed Memcache. Now it appears I can start with something equally performant from the beginning.
Honestly, I'm working towards this on a few of my projects. Again, I think that MongoDB actually does make a valid caching layer. In fact, it makes a file-backed caching layer. So if you're capable of pushing MySQL change to Mongo, then you're getting getting Memcached without cache misses. It also makes it easy to "warm the cache" on new server, just copy files and start Mongo pointing at the correct folder, it really is that easy.
The big backlash against NoSQL is rooted in the mentality of many of the NoSQL advocates. Specifically, the attitude best summarized as "SQL is too hard, I shouldn't have to do it". I dislike NoSQL because it seems in many cases to be elevating ignorance.
I know I can't do transactions across relationships... when would this be a big deal?
More often than you might expect. There are a lot of things that can go wrong when you can't assume a consistent dataset.
I write this but as a dispute to Rex's answer.
I dispute the idea that nosql is relationless and fuzzy.
I had been working with CODASYL many years ago with C and Cobol - entity relationships are very tight in CODASYL.
In contrast, relational database systems have a very liberal policy towards relationships. As long as you can identiy a foreign key, you could form a relationship adhoc.
It is frequently taken for granted that SQL is synonymous with RDBMS, but people have been writing SQL drivers for CODASYL, XML, inverted sets, etc.
RDBMS/SQL do not equal precision in data or relationship. In fact, RDBMS has been a constant cause in imprecision and misperception of relationships. I do not see how RDBMS offer better data and relationship integrity than hadoop, for example. Put on a layer of JDO - and we can construct a network of good and clean relationships between entities in hadoop.
However, I like working with SQL because it gives me the ability to script adhoc relationships, even though I realise that adhoc relationships is a constant cause of relationship adulteration and problems.
Having the opportunity to work with statistical analysis of business and industrial processes, SQL gave me the ability to explore relationships where no relationships had previously been perceived. The opportunity to work with statistical analysis gave me insights that would not normally come the way of SQL programmers.
For example, you would design and normalise your schema to reflect a set of processes. What you might not realise is that relationships change over time. The statistical characteristics would reveal that a schema may no longer be as "properly normalised" as it once had been. That the principal components of the processes have mutated over time. But non-statistical programmers do not understand that and continue to tout RDBMS as the perfect solution for data integrity and relationship precision.
However, in a relationship-linking database, you could link entities in relationships as they appear. When relationships mutate, the linking naturally mutate with the data. Relationships and their mutation are documented within the database system without the expensive need to renormalise the schema. At which point, RDBMS is good only as temp dbs.
But then you might counter that RDBMS too allows you to flexibly mutate your relationships, since that is what SQL does best. True, very true - so long as you perform BCNF or even 4NF. Otherwise, you would begin to see that your queries and data loaders performing replicated operations. But then your many years in the RDBMS business have so far certainly at least made you realise that BCNF is very expensive and operationally inefficient and that we are constantly guilty of 2.5 NFing our schemata.
To say that RDBMS and SQL promotes data and relationship integrity is a gross mis-statement. Either you work in a company that is so tiny or you didn't stay in your positions for more than two years - you would not see the amount of data or the information mutation and the problems caused by RDBMS. The abuse of RDBMS is the cause of executives being restricted in the view by computer applications and the cause of financial failures of companies failing to see changes in market behaviour because their views were restricted by the programmers whose views were restricted to their veneration of their beloved RDBMS schemata.
That is why SQL programmers do not understand why your company statistician refuses to use your application which you crafted meticulously but they employed a college intern to write SQL to download data into their personal servers and that your company executives learn to trust the accountants' and statisticians' spreadsheets rather than your elegant multi-tiered applications because of your applications' inability to mutate with processes.
It might not be possible, but I still urge you to acquire some statistical understanding to perceive how processes mutate over time so that you can make the right technological decision.
The reason people are not moving to SQL-less is lack of a good scripting environment like SQL to perform adhoc relationship analysis. Not because SQL-less technology is deficient in precision or integrity. Adhoc relationship analysis is very important nowadays due to the rapid and agile application development attitudes and strategies we have nowadays.