How does Lucene/Solr achieve high performance in multi-field / faceted search?

前端 未结 2 1605
臣服心动
臣服心动 2021-02-06 02:41

Context

This is a question mainly about Lucene (or possibly Solr) internals. The main topic is faceted search, in which search can happen along

2条回答
  •  温柔的废话
    2021-02-06 03:02

    Faceting

    There are two answers for faceting, because there are two types of faceting. I'm not certain that either of these are faster than an RDBMS.

    1. Enum faceting. Results of a query are a bit vector where the ith bit is 1 if the ith document was a match. The facet is also a bit vector, so intersection is just a bitwise AND. I don't think this is a novel approach, and most RDBMS's probably support it.
    2. Field Cache. This is just a normal (non-inverted) index. The SQL-style query that is run here is like:

      select facet, count(*) from field_cache where docId in query_results group by facet

    Again, I don't think this is anything that a normal RDBMS couldn't do. The index is a skip list, with the docId as the key.

    Multi-term search

    This is where Lucene shines. Why Lucene's approach is so good is too long to post here, but I can recommend this post on Lucene Performance, or the papers linked therein.

提交回复
热议问题