How to configure Solr for improved indexing speed

前端 未结 2 443
挽巷
挽巷 2021-02-07 15:13

I have a client program which generates a 1-50 millions Solr documents and add them to Solr.
I\'m using ConcurrentUpdateSolrServer for pushing the documents from the client,

相关标签:
2条回答
  • 2021-02-07 15:32

    It looks like you are doing a bulk import of data into Solr, so you don't need to search any data right away.

    First, you can increase the number of documents per request. Since your documents are small, I would even increase it to 100K docs per request or more and try.

    Second, you want to reduce the number of times commits happen when you are bulk indexing. In your solrconfig.xml look for:

    <!-- AutoCommit
    
         Perform a hard commit automatically under certain conditions.
         Instead of enabling autoCommit, consider using "commitWithin"
         when adding documents.
    
         http://wiki.apache.org/solr/UpdateXmlMessages
    
         maxDocs - Maximum number of documents to add since the last
                   commit before automatically triggering a new commit.
    
         maxTime - Maximum amount of time in ms that is allowed to pass
                   since a document was added before automatically
                   triggering a new commit.
    
         openSearcher - if false, the commit causes recent index changes
         to be flushed to stable storage, but does not cause a new
         searcher to be opened to make those changes visible.
      -->
     <autoCommit>
       <maxTime>15000</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>
    

    You can disable autoCommit altogether and then call a commit after all your documents are posted. Otherwise you can tweak the numbers as follows:

    The default maxTime is 15 secs so an auto commit happens every 15 secs if there are uncommitted docs, so you can set this to something large, say 3 hours (i.e. 3*60*60*1000). You can also add <maxDocs>50000000</maxDocs> which means an auto commit happens only after 50 million documents are added. After you post all your documents, call commit once manually or from SolrJ - it will take a while to commit, but this will be much faster overall.

    Also after you are done with your bulk import, reduce maxTime and maxDocs, so that any incremental posts you will do to Solr will get committed much sooner. Or use commitWithin as mentioned in solrconfig.

    0 讨论(0)
  • 2021-02-07 15:35

    In addition to what was written above, when using SolrCloud, you may want to consider using the CloudSolrClient when using SolrJ. The CloudSolrClient client class is Zookeeper aware and is able to directly connect to the leader shard speeding up the indexing in some cases.

    0 讨论(0)
提交回复
热议问题