Including documents in the emit compared to include_docs = true in CouchDB

后端 未结 2 1497
后悔当初
后悔当初 2020-12-31 03:07

I ran across a mention somewhere that doing an emit(key, doc) will increase the amount of time an index takes to build (or something to that effect).

Is

相关标签:
2条回答
  • 2020-12-31 03:39

    Yes, it will increase the size of your index, because CouchDB effectively copies the entire document in those cases. For cases in which you can, use include_docs=true.

    There is, however, a race condition to be aware of when using this that is mentioned in the wiki. It is possible, during the time between reading the view data and fetching the document, that said document has changed (or has been deleted, in which case _deleted will be true). This is documented here under "Querying Options".

    0 讨论(0)
  • 2020-12-31 03:44

    This is a classic time/space tradeoff.

    Emitting document data into your index will increase the size of the index file on disk because CouchDB includes the emitted data directly into the index file. However, this means that, when querying your data, CouchDB can just stream the content directly from the index file on disk. This is obviously quite fast.

    Relying instead on include_docs=true will decrease the size of your on-disk index, it's true. However, on querying, CouchDB must perform a document read for every returned row. This involves essentially random document lookups from the main data file, meaning that the cost and time of returning data increases significantly.

    While the query time difference for small numbers of documents is slow, it will add up over every call made by the application. For me, therefore, emitting needed fields from a document into the index is usually the right call -- disk is cheap, user's attention spans less so. This is broadly similar to using covering indexes in a relational database, another widely echoed piece of advice.

    I did a totally unscientific test on this to get a feel for what the difference is. I found about an 8x increase in response time and 50% increase in CPU when using include_docs=true to read 100,000 documents from a view when compared to a view where the documents were emitted directly into the index itself.

    0 讨论(0)
提交回复
热议问题