Optimizing Solr for Sorting

后端未结

关注

 3  566

I\'m using Solr for a realtime search index. My dataset is about 60M large documents. Instead of sorting by relevance, I need to sort by time. Currently I\'m using the sort flag

相关标签:

3条回答

轻奢々

2021-02-04 19:21

I found the answer.

If you want to sort by time, and not relevance, use fq= instead of q= for all of your filters. This way, Solr doesn't waste time figuring out the weighted value of the documents matching q=. It turns out that Solr was spending too much time weighting, not sorting.

Additionally, you can speed sorting up by pre-warming your sort fields in the newSearcher and firstSearcher event listeners in solrconfig.xml. This will ensure that sorts are done via cache.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2021-02-04 19:34

Obvious first question: what's type of your time field? If it's string, then sorting is obviously very slow. tdate is even faster than date.

Another point: do you have enough memory for Solr? If it starts swapping, then performance is immediately awful.

And third one: if you have older Lucene, then date is just string, which is very slow.

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-04 19:38
Warning: Wild suggestion, not based on prior experience or known facts. :)
1. Perform a query without sorting and rows=0 to get the number of matches. Disable faceting etc. to improve performance - we only need the total number of matches.
2. Based on the number of matches from Step #1, the distribution of your data and the count/offset of the results that you need, fire another query which sorts by date and also adds a filter on the date, like fq=date:[NOW()-xDAY TO *] where x is the estimated time period in days during which we will find the required number of matching documents.
3. If the number of results from Step #2 is less than what you need, then relax the filter a bit and fire another query.
For starters, you can use the following to estimate x:

If you are uniformly adding n documents a day to the index of size N documents and a specific query matched d documents in Step #1, then to get the top r results you can use x = (N*r*1.2)/(d*n). If you have to relax your filter too often in Step #3, then slowly increase the value 1.2 in the formula as required.
0 讨论(0)
发布评论:

提交评论
- 加载中...