How does Lucene/Solr achieve high performance in multi-field / faceted search?

前端 未结 2 1609
臣服心动
臣服心动 2021-02-06 02:41

Context

This is a question mainly about Lucene (or possibly Solr) internals. The main topic is faceted search, in which search can happen along

相关标签:
2条回答
  • 2021-02-06 02:59

    An explaining post can be found at: http://yonik.wordpress.com/2008/11/25/solr-faceted-search-performance-improvements/

    The new method works by un-inverting the indexed field to be faceted, allowing quick lookup of the terms in the field for any given document. It’s actually a hybrid approach – to save memory and increase speed, terms that appear in many documents (over 5%) are not un-inverted, instead the traditional set intersection logic is used to get the counts.

    0 讨论(0)
  • 2021-02-06 03:02

    Faceting

    There are two answers for faceting, because there are two types of faceting. I'm not certain that either of these are faster than an RDBMS.

    1. Enum faceting. Results of a query are a bit vector where the ith bit is 1 if the ith document was a match. The facet is also a bit vector, so intersection is just a bitwise AND. I don't think this is a novel approach, and most RDBMS's probably support it.
    2. Field Cache. This is just a normal (non-inverted) index. The SQL-style query that is run here is like:

      select facet, count(*) from field_cache where docId in query_results group by facet

    Again, I don't think this is anything that a normal RDBMS couldn't do. The index is a skip list, with the docId as the key.

    Multi-term search

    This is where Lucene shines. Why Lucene's approach is so good is too long to post here, but I can recommend this post on Lucene Performance, or the papers linked therein.

    0 讨论(0)
提交回复
热议问题