How does Lucene/Solr achieve high performance in multi-field / faceted search?

前端未结

关注

 2  1609

Context

This is a question mainly about Lucene (or possibly Solr) internals. The main topic is faceted search, in which search can happen along

相关标签:

2条回答

慢半拍i

2021-02-06 02:59

An explaining post can be found at: http://yonik.wordpress.com/2008/11/25/solr-faceted-search-performance-improvements/

The new method works by un-inverting the indexed field to be faceted, allowing quick lookup of the terms in the field for any given document. It’s actually a hybrid approach – to save memory and increase speed, terms that appear in many documents (over 5%) are not un-inverted, instead the traditional set intersection logic is used to get the counts.

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2021-02-06 03:02
Faceting

There are two answers for faceting, because there are two types of faceting. I'm not certain that either of these are faster than an RDBMS.
1. Enum faceting. Results of a query are a bit vector where the ith bit is 1 if the ith document was a match. The facet is also a bit vector, so intersection is just a bitwise AND. I don't think this is a novel approach, and most RDBMS's probably support it.
2. Field Cache. This is just a normal (non-inverted) index. The SQL-style query that is run here is like:
  
  select facet, count(*) from field_cache where docId in query_results group by facet
Again, I don't think this is anything that a normal RDBMS couldn't do. The index is a skip list, with the docId as the key.

Multi-term search

This is where Lucene shines. Why Lucene's approach is so good is too long to post here, but I can recommend this post on Lucene Performance, or the papers linked therein.
0 讨论(0)
发布评论:

提交评论
- 加载中...