Optimizing Solr for Sorting

后端 未结 3 562
一生所求
一生所求 2021-02-04 18:36

I\'m using Solr for a realtime search index. My dataset is about 60M large documents. Instead of sorting by relevance, I need to sort by time. Currently I\'m using the sort flag

相关标签:
3条回答
  • 2021-02-04 19:21

    I found the answer.

    If you want to sort by time, and not relevance, use fq= instead of q= for all of your filters. This way, Solr doesn't waste time figuring out the weighted value of the documents matching q=. It turns out that Solr was spending too much time weighting, not sorting.

    Additionally, you can speed sorting up by pre-warming your sort fields in the newSearcher and firstSearcher event listeners in solrconfig.xml. This will ensure that sorts are done via cache.

    0 讨论(0)
  • 2021-02-04 19:34

    Obvious first question: what's type of your time field? If it's string, then sorting is obviously very slow. tdate is even faster than date.

    Another point: do you have enough memory for Solr? If it starts swapping, then performance is immediately awful.

    And third one: if you have older Lucene, then date is just string, which is very slow.

    0 讨论(0)
  • 2021-02-04 19:38

    Warning: Wild suggestion, not based on prior experience or known facts. :)

    1. Perform a query without sorting and rows=0 to get the number of matches. Disable faceting etc. to improve performance - we only need the total number of matches.
    2. Based on the number of matches from Step #1, the distribution of your data and the count/offset of the results that you need, fire another query which sorts by date and also adds a filter on the date, like fq=date:[NOW()-xDAY TO *] where x is the estimated time period in days during which we will find the required number of matching documents.
    3. If the number of results from Step #2 is less than what you need, then relax the filter a bit and fire another query.

    For starters, you can use the following to estimate x:

    If you are uniformly adding n documents a day to the index of size N documents and a specific query matched d documents in Step #1, then to get the top r results you can use x = (N*r*1.2)/(d*n). If you have to relax your filter too often in Step #3, then slowly increase the value 1.2 in the formula as required.

    0 讨论(0)
提交回复
热议问题