Solr performance with commitWithin does not make sense

五迷三道 提交于 2019-12-10 16:54:26

问题


I am running a very simple performance experiment where I post 2000 documents to my application. Who in tern persists them to a relational DB and sends them to Solr for indexing (Synchronously, in the same request).

I am testing 3 use cases:

  1. No indexing at all - ~45 sec to post 2000 documents
  2. Indexing included - commit after each add. ~8 minutes (!) to post and index 2000 documents
  3. Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 2000 documents

The 3rd result does not make any sense, I would expect the behavior to be similar to the one in point 2. At first I thought that the documents were not really committed but I could actually see them being added by executing some queries during the experiment (via the solr web UI).

I am worried that I am missing something very big. Is it possible that committing after each add will degrade performance by a factor of 400?!

The code I use for point 2:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection 
solrConnection.add(doc);
solrConnection.commit(); 

Where as the code for point 3:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc, 1); // According to API documentation I understand there is no need to call an explicit commit after this

回答1:


According to this wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

the commitWithin is a soft-commit by default. Soft-commits are very efficient in terms of making the added documents immediately searchable. But! They are not on the disk yet. That means the documents are being committed into RAM. In this setup you would use updateLog to be solr instance crash tolerant.

What you do in point 2 is hard-commit, i.e. flush the added documents to disk. Doing this after each document add is very expensive. So instead, post a bunch of documents and issue a hard commit or even have you autoCommit set to some reasonable value, like 10 min or 1 hour (depends on your user expectations).



来源:https://stackoverflow.com/questions/21729785/solr-performance-with-commitwithin-does-not-make-sense

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!