I\'ve been trying for some time to put some \'cool\' things I\'ve been reading about noSQL (couchDB, mongoDB, Redis...) in the past years into practical use.
I\'m quite
This is what I get for trolling stackoverflow. Once in a great while an outstanding question gets asked and I am compelled to offer my 2 cents (at the risk of my own project timelines).
I just finished up a project where I had to de-couple an ORM from the model so I could implement a NoSQL solution, and found it not that difficult, although it was rough at times trying to figure out the best approach. So without getting too specific about my implementation, I will touch on what I had to do to make it work, as it may offer some enlightenment when you travel down the same path.
My setup:
Goal:
I didn't want to store images as blobs within the persistent store (database), and I didn't want to store image paths in the database, as I didn't want to pay the overhead of creating a database connection and querying for the path. So I decided to store the path information within a NoSQL persistent store (filesystem).
And ditto for html descriptions, I didn't want to create a text column on my table and store what could potentially be hundreds of lines of html within the database, and the same reasons as above.
All my NoSQL files relate to an object (refrigerator for example). These files contain paths to their related assets (html description and images), in what I call pointers, which point to the assets on the filesystem. I opted to use XML format for storing the data so it looks something like this:
// Path to pointer file
/home/files/app/needle/myApp/refrigerator/1/1.xml
// Example pointer
<pointer>/home/files/app/file/myApp/refrigerator/1.png</pointer>
Now, within the framework I had to override the save() methods so I could save the aforementioned assets using the NoSQL API. It was pretty easy, I just checked the parent calls and maintained the values coming into the methods, so they wouldn't break any chain logic (methods calling other methods with the same arguments), that I wasn't aware of. I also made my custom NoSQL API calls throw exceptions as the main save() call was wrapped in a try/catch block. The only thing you have to be careful of here, is determining whether your NoSQL assets are worth stopping the entire transaction. In my example, I had to figure out if uploading an image would break saving the rest of the form fields in the database (I opted to break the transaction).
I also had to alter the load() methods to retrieve the assets using the NoSQL API vs the standard model logic. As with the save methods, this wasn't too hard to do either. I just had to see what the parent classes were doing and not muck with any argument values.
When all was said and done, I was able to store images and html descriptions on the filesystem, with an xml file made up of pointers pointing to their location. So now I don't incur a database call every time I need an asset.
Some considerations (these may be included in other NoSQL solutions, I had to write my own):
I think I hit all the major hurdles I faced when implementing a NoSQL solution with an ORM, if you have any other questions feel free to hit me up.
-- Edit --
Responses to comments:
As I mentioned I didn't want to create a database connection and query just to get a path to an asset. I feel it's better to use a NoSQL solution for this type of information as there is really no reason run queries against this type of information (images or html descriptions).
Developing my own NoSQL solution was more of an ego challenge. At work there was a project to implement a custom NoSQL solution (had bad experiences with MogileFS), and to be frank, was poorly designed and poorly implemented. But rather than just point out the bad, I challenged myself to offer up a better solution, but for a side project. And because of the challenge aspect, I didn't research any already available NoSQL solutions, but in hindsight I probably should have.
I still think you can implement MongoDB or any NoSQL solution by overriding crud functions with the Model layer of your ORM, relatively easy. In fact, not only did I implement my NoSQL solution, I also added the ability to index data into SOLR (for full-text searching) during crud functions as well, so anything is possible.
What I am also trying to understand is if it wouldn't be better to drop using an ORM when switching to noSQL
It's not really beneficial to drop the ORM entirely. But you may have to rewrite it quite a bit. There are lots of little things like transactions, evented sequential writes, error handling, and data integrity checks that an ORM could handle for you in a noSQL fashion.
ORMs aren't meant to handle every possible feature, even in SQL. They just do "most" of the heavy lifting. That's why the django ORM provides a direct access to SQL class when you need it.
It's quite an open question, there is no "correct" answer. The decision between NoSQL and SQL (ORM) depends on too many factors. Some questions I would ask:
As I told you, it's open ended. My personal suggestion would be to start modelling with the technology you know. You can always integrate new components later, if you really need it.
Of course if the interest is purely "academical" to use NoSQL, don't mind about the optimal scenarios, use it, you'll see what's good and bad about it.
EDIT on comment (answer doesn't fit in comment's area):
@Stefano I'm afraid I don't see your point, as usage fo NoSQL in frameworks (or usage of an ORM) depends on your needs.
It's not a problem of "is good to use this tool in this framework", as support is (often) excellent. The issue should be "Do I need to use this tool, why and what benefits does it give me?".
If the answer is "yes, I need this because A,B and/or C" then just go ahead and use it.
If the answer is "no, because A or B" or "it doesn't make a difference", then either don't use it or choose the option you are most familiar with from the ones available to you.
That is, the fact that one framework supports something doesn't mean it's worse or better or should or shouldn't be used. That's why I put my questions. In the end, it IS a question about NoSQL vs SQL, as the tools you use to integrate it (ORM, SQL, whatever) are just a channel to access the data, and that's less relevant than the storage system you pick for your problem (as the tool will be limited by the storage system by definition)