问题
I am developing a cloud platform and I want to Store documents and video files.
First concept was to use MongoDB for the simple documents and for large video files cassandra. But i have read that with cassandra i'll have a problem if the file is larger than 64 MB.
On the other hand mongoDB has GridFS that allows files larger than 100MB.
I have connected mongoDB with Java. At start the database was 80MB, when I inserted a 1.80GB video file with GridFS into MongoDB i was expecting the Database to capture something like 1.9GB disk spase, but it captured 6 GB .... 3 times the size of the file.
Here is the code i used:
MongoClient mongo = new MongoClient("localhost", 27017);
DB db = mongo.getDB("testdb");
DBCollection table = db.getCollection("user");
String newFileName = "Video";
File videoFile = new File("e:\\Magnificent.mp4");
GridFS gfsText = new GridFS(db, "video");
GridFSInputFile gfsFile = gfsText.createFile(videoFile);
gfsFile.setFilename(newFileName);
gfsFile.save();
Also, after run the application and save the file into mongodb the following files are created into the DB folder. Pease notice the size of the files:
1st question is why it captures so much disk space ??
2ond question: Is there any chance to use cassandra for storing video files 500MB-1GB ???
Thanks for any advice
回答1:
Ad 1: It's because the files are splited into chunks with 2 collections created, one for the binary part (a chunk) and one for the metadata. You can read way more on the MongoDB documentation about GridFS and about how the storage is actually maintained.
Ad 2: Yes, there is a way to store a file that is way above 64MB in Cassandra. You can easily build a mechanism to split the files into chunks and keep them as a separated entries. That will work in the exactly same way as MongoDB GridFS implementation. And you will not be the first one - DataStax, a company that claims to be "behind" Cassandra, have this already implemented in their commercial stack, called Enterprise. You can read about Cassandra File System Design here and some documentation here. Overall, if you will decide to build your own solution it should be fairly simple and straight forward - all you need to do is just to split the files and put the content in more than one record.
On the other note the philosophical question is "why". Why would you like to use a Database system to store such a big file? There are so many better ways to handle that, including distributed and replicated file/storage systems similar to Amazon S3 or any other implementation, that will make your life so much easier on so many levels. Consider that as well, as a good replacement to BLOBs...
来源:https://stackoverflow.com/questions/23713038/nosql-mongodb-vs-cassandra