Is it a good idea to store hundreds of millions small images to a key/value store or other nosql database?

天涯浪子 提交于 2019-12-24 05:05:39

问题


I am developing a web system to handle a very large set of small images, about 100 millions images of 50kb ~ 200kb, working on ReiserFS.

For now, it is very difficult to backup and sync those large number of small files.

My question is that if it a good idea to store these small images to a key/value store or other nosql database such as GridFS (Mongodb), Tokyo Tyrant, Voldemort to gain more performance and bring better backup support?


回答1:


First off, have a look at this: Storing a millon images in the filesystem. While it isn't about backups, it is a worthwile discussion of the topic at hand.

And yes, large numbers of small files are pesky; They take up inodes, require space for filenames &c. (And it takes time to do backup of all this meta-data). Basically it sounds like you got the serving of the files figured out; if you run it on nginx, with a varnish in front or such, you can hardly make it any faster. Adding a database under that will only make things more complicated; also when it comes to backing up. Alas, I would suggest working harder on a in-place FS backup strategy.

First off, have you tried rsync with the -az-switches (archive and compression, respectively)? They tend to be highly effective, as it doesn't transfer the same files again and again.

Alternately, my suggestion would be to tar + gz into a number of files. In pseudo-code (and assuming you got them in different sub-folders):

foreach prefix (`ls -1`):
    tar -c $prefix | gzip -c -9 | ssh -z destination.example.tld "cat > backup_`date --iso`_$prefix.tar.gz"
end

This will create a number of .tar.gz-files that are easily transferred without too much overhead.




回答2:


Another alternative is to store the images in SVN and actually have the image folder on the web server be an svn sandbox of the images. That simplifies backup, but will have zero net effect on performance.

Of course, make sure you configure your web server to not serve the .svn files.




回答3:


If all your images, or at least the ones most accessed, fit into memory, then mongodb GridFS might outperform the raw file system. You have to experiment to find out.

Of course, depending on your file-system, breaking up the images into folders or not would affect images. In the past I noticed that ReiserFS is better for storing large numbers of files in a single directory. However, I don't know if thats still the best file system for the job.



来源:https://stackoverflow.com/questions/4164024/is-it-a-good-idea-to-store-hundreds-of-millions-small-images-to-a-key-value-stor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!