Lucene Synonym Filter behavior

别来无恙 提交于 2019-12-08 08:01:36

问题


I am trying to figure out how does lucene's analyzer work? My question is how does lucene handle synonym words? Here is the situation: we have single words and multi words

single: foo = bar multi words: foo bar = foobar

For single words:

  • Does lucene expand the indexed records or not? I guess if a query has a word like "foo", it adds "bar" to the query too. I don't know if it happens for indexing or not?

For multi words:

  • Does lucene expand both query and indexing? for example if we have "foo bar", does it add foobar to the indexing/query?

My second question is : Lucene uses a stream of tokens and gives them to the filters like lowercase filter. My question is how does lucene find the multi words? like how does it find out that "foo bar" is a multi words that are together?

thanks


回答1:


SynonymFilter can, optionally, keep the original word, and add the synonym to the tokenstream as well, by setting keepOrig=true (see SynonymMap.Builder.add()). This behavior can cause problems for PhraseQueries and the like, see first Note on the SynonymFilter docs.

If you are using the same Analyzer for querying and indexing, then both queries and docs written to the index will, of course, be treated the same way. SynonymFilter with keepOrig set to true is one of the few Analyzers that is reasonably often applied incongruously between querying and indexing, but that is entirely up to your implementation.

As far as how it is implemented, the source code is available to you.



来源:https://stackoverflow.com/questions/17283100/lucene-synonym-filter-behavior

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!