CouchDB .view file growing out of control?

后端 未结 4 866
天命终不由人
天命终不由人 2021-02-02 01:51

I recently encountered a situation where my CouchDB instance used all available disk space on a 20GB VM instance. Upon investigation I discovered that a directory in /usr/local/

相关标签:
4条回答
  • 2021-02-02 02:08

    That your .view files grow, each time you access a view is because CouchDB updates views on access. CouchDB views need compaction like databases too. If you have frequent changes to your documents, resulting in changes in your view, you should run view compaction from time to time. See http://wiki.apache.org/couchdb/HTTP_view_API#View_Compaction

    To reduce the size of your views, have a look at the data, you are emitting. When you emit(foo, doc) the entire document is copied to the view to it is very instantly available when you query the view. the function(doc) { emit(doc.title, doc); } will result in a view as big as the database itself. You could also emit(doc.title, nil); and use the include_docs option to let CouchDB fetch the document from the database when you access the view (which will result in a slightly performance penalty). See http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options

    0 讨论(0)
  • 2021-02-02 02:09

    Use sequential or monotonic id's for documents instead of random

    Yes, couchdb is very disk hungry, and it needs regular compactions. But there is another thing that can help reducing this disk usage, specially sometimes when it's unnecessary.

    Couchdb uses B+ trees for storing data/documents which is very good data structure for performance of data retrieval. However use of B-tree trades in performance for disk space usage. With completely random Id, B+-tree fans out quickly. As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes. Also new insertions can cause a rewrite of full tree. That's what randomness can cause ;)

    Instead, use of sequential or monotonic ids can avoid all.

    0 讨论(0)
  • 2021-02-02 02:16

    CouchDB is very disk hungry, trading disk space for performance. Views will increase in size as items are added to them. You can recover disk space that is no longer needed with cleanup and compaction.

    Every time you create update or delete a document then the view indexes will be updated with the relevant changes to the documents. The update to the view will happen when it is queried. So if you are making lots of document changes then you should expect your index to grow and will need to be managed with compaction and cleanup.

    If your views are very large for a given set of documents then you may have poorly designed views. Alternatively your design may just require large views and you will need to manage that as you would any other resource.

    It would be easier to tell what is happening if you could describe what document updates (inc create and delete) are happening and what your view functions are emitting, especially for the large view.

    0 讨论(0)
  • 2021-02-02 02:20

    I've had this problem too, trying out CouchDB for a browsed-based game.

    We had about 100.000 unexpected visitors on the first day of a site launch, and within 2 days the CouchDB database was taking about 40GB in space. This made the server crash because the HD was completely full.

    Compaction brought that back to about 50MB. I also set the _revs_limit (which defaults to 1000) to 10 since we didn't care about revision history, and it's running perfectly since. After almost 1M users, the database size is usually about 2-3GB. When i run compaction it's about 500MB.

    Setting document revision limit to 10:
    curl -X PUT -d "10" http://dbuser:dbpassword@127.0.0.1:5984/yourdb/_revs_limit

    Or without user:password (not recommended):
    curl -X PUT -d "10" http://127.0.0.1:5984/yourdb/_revs_limit

    0 讨论(0)
提交回复
热议问题