How to improve a single character PrefixQuery performance?

前端 未结 1 847
耶瑟儿~
耶瑟儿~ 2021-01-21 12:43

I have a RAMDirectory with 1.5 million documents and I\'m searching using a PrefixQuery for a single field. When the search text has a length of 3 or more characters, the search

1条回答
  •  -上瘾入骨i
    2021-01-21 13:38

    Consider removing stop words from your index if you haven't already.

    To understand how stop words slow down PrefixQuery then consider how PrefixQuery works: It is rewritten as a BooleanQuery that includes every term from the index beginning with the PrefixQuery's term. For example a* becomes a OR and OR aardvark OR anchor OR ... So far this isn't bad and it will perform surprisingly well even with thousands of terms. The real drain is when stop words like a and and are included because they'll likely be found multiple times in every single document in your index. This creates a lot more work for the gathering/collecting/scoring portion of the search and thus slows things down.

    On a side note, I highly recommend not running the autocomplete search when the user has entered less than 2 or 3 characters, purely from a usability perspective. I can't imagine the results would be at all relevant. Imagine running a search for a* -- there's no way to tell which results are more relevant. If you must display something to the user then consider an n-gram approach like Jf Beaulac suggested in the comments.

    0 讨论(0)
提交回复
热议问题