Multi-word synonym search in Solr

后端 未结 3 810
星月不相逢
星月不相逢 2021-02-10 00:52

I\'m trying to use a synonym filter to search for a phrase.

peter=> spider man, spiderman, Mary Jane, .....

I use the default configuration.

3条回答
  •  独厮守ぢ
    2021-02-10 01:48

    It's a known limitation within Solr / Lucene. Essentially you would have to provide an alternative form of tokenization so that specific space delimited words (i.e. phrases) are treated as single words. One way of achieving this is to do this client side - i.e. in your application that is calling Solr, when indexing, keep a list of synonym phrases and find / replace those phrase values with an alternative (for example removing the spaces or replacing it with a delimiter that isn't treated as a token boundary).

    E.g. if you have "Hello There" as a phrase you want to use in a synonym, then replace it with "HelloThere" when indexing.

    Now in your synonyms.txt file you can have (for example):

    Hi HelloThere Wotcha => Hello
    

    Similarly when you search, replace any incidences of "Hello There" in the query string with HelloThere and then they will be matched as a synonym of Hello.

    Alternatively, you could use the AutoPhraseTokenFilter that LucidWorks created, available on github. This works by maintaining a token stream so that it can work out if a combination of two or more sequential tokens matches one of the synonym phrases, and if it doesn't, it throws away the first token as not matching the phrase. I'm not sure how much overhead this adds, but it seems a good approach - would be nice to have by default in Solr as part of the SynonymFilter.

提交回复
热议问题