With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

前端 未结 3 1230
慢半拍i
慢半拍i 2021-01-12 07:19

I\'ve had an app doing prefix searches for a while. Recently the index size was increased and it turned out that some prefixes were too darned numerous for lucene to handle.

相关标签:
3条回答
  • 2021-01-12 08:04

    I've hit this before. It has to do with the fact that lucene, under the covers, turns many (all?) things into boolean queries when you call Query.rewrite()

    From: http://web.archive.org/web/20110915061619/http://lucene.apache.org:80/java/2_2_0/api/org/apache/lucene/search/Query.html

    public Query rewrite(IndexReader reader)
                  throws IOException
    
        Expert: called to re-write queries into primitive queries.
                For example, a PrefixQuery will be rewritten into a
                BooleanQuery that consists of TermQuerys.
    
        Throws:
            IOException
    
    0 讨论(0)
  • 2021-01-12 08:14

    When running a prefix query, Lucene searches for all terms in its "dictionary" that match the query. If more than 1024 (by default) match, the TooManyClauses-Exception is thrown.

    You can call BooleanQuery.setMaxClauseCount to increase the maximum number of clauses permitted per BooleanQuery.

    0 讨论(0)
  • 2021-01-12 08:25

    The API reference page of TooManyClauses shows that PrefixQuery, FuzzyQuery, WildcardQuery, and RangeQuery are expanded this way (into BooleanQuery). Since it is in the API reference, it should be a behavior that users can rely on. Lucene does not place arbitrary limits on the number of hits (other than a document ID being an int) so a "too many hits" exception might not make sense. Perhaps PrefixQuery.rewrite(IndexReader) should catch the TooManyClauses and throw a "too many prefixes" exception, but right now it does not behave that way.

    By the way, another way to search by prefix is to use PrefixFilter. Either filter your query with it or wrap the filter with a ConstantScoreQuery.

    0 讨论(0)
提交回复
热议问题