How does Lucene/Solr achieve high performance in multi-field / faceted search?

跟風遠走 提交于 2019-12-03 02:58:16


There are two answers for faceting, because there are two types of faceting. I'm not certain that either of these are faster than an RDBMS.

  1. Enum faceting. Results of a query are a bit vector where the ith bit is 1 if the ith document was a match. The facet is also a bit vector, so intersection is just a bitwise AND. I don't think this is a novel approach, and most RDBMS's probably support it.
  2. Field Cache. This is just a normal (non-inverted) index. The SQL-style query that is run here is like:

    select facet, count(*) from field_cache where docId in query_results group by facet

Again, I don't think this is anything that a normal RDBMS couldn't do. The index is a skip list, with the docId as the key.

Multi-term search

This is where Lucene shines. Why Lucene's approach is so good is too long to post here, but I can recommend this post on Lucene Performance, or the papers linked therein.

An explaining post can be found at:

The new method works by un-inverting the indexed field to be faceted, allowing quick lookup of the terms in the field for any given document. It’s actually a hybrid approach – to save memory and increase speed, terms that appear in many documents (over 5%) are not un-inverted, instead the traditional set intersection logic is used to get the counts.
