I\'m using Solr for a realtime search index. My dataset is about 60M large documents. Instead of sorting by relevance, I need to sort by time. Currently I\'m using the sort flag
I found the answer.
If you want to sort by time, and not relevance, use fq= instead of q= for all of your filters. This way, Solr doesn't waste time figuring out the weighted value of the documents matching q=. It turns out that Solr was spending too much time weighting, not sorting.
Additionally, you can speed sorting up by pre-warming your sort fields in the newSearcher and firstSearcher event listeners in solrconfig.xml. This will ensure that sorts are done via cache.
Obvious first question: what's type of your time field? If it's string, then sorting is obviously very slow. tdate
is even faster than date
.
Another point: do you have enough memory for Solr? If it starts swapping, then performance is immediately awful.
And third one: if you have older Lucene, then date
is just string, which is very slow.
Warning: Wild suggestion, not based on prior experience or known facts. :)
fq=date:[NOW()-xDAY TO *]
where x
is the estimated time period in days during which we will find the required number of matching documents.For starters, you can use the following to estimate x
:
If you are uniformly adding n
documents a day to the index of size N
documents and a specific query matched d
documents in Step #1, then to get the top r
results you can use x = (N*r*1.2)/(d*n)
. If you have to relax your filter too often in Step #3, then slowly increase the value 1.2 in the formula as required.