Solr faceted search performance recommendations

前端未结

关注

 2  1176

We have a solr instance with 86,315,770 documents. It\'s using up to 4GB of memory and we need it for faceting on a tokenized field called content. The index size on disk is 23G

相关标签:

2条回答

名媛妹妹

2021-02-10 10:30

Since Solr computes facets on in-memory data-structures, facet computation is likely to be CPU-bound. The code to compute facets is already highly optimised (the getCounts method in UnInvertedField for a multi-valued field).

One idea would be to parallelize the computation. Maybe the easiest way to do this would be to split your collection into several shards as described in Do multiple Solr shards on a single machine improve performance?.

Otherwise, if your term dictionary is small enough and if queries can take a limited number of forms, you could set up a different system that would maintain the count matrix for every (term, query) pair. For example, if you only allow term queries, this means you should maintain the counts for every pair of terms. Beware that this would require a lot of disk space depending of the total number of terms and queries. If you don't require the counts to be exact, maybe the easiest would be to compute these counts in a batch process. Otherwisee, it might be (possible, but) a little bit tricky to keep the counts sync'd with Solr.

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-02-10 10:34

You could use the topTerms feature of LukeRequestHandler.

0 讨论(0)
发布评论:

提交评论
- 加载中...