发表新帖

发表新帖

Multi-word synonym search in Solr

后端未结

关注

 3  810

星月不相逢 2021-02-10 00:52

I\'m trying to use a synonym filter to search for a phrase.

peter=> spider man, spiderman, Mary Jane, .....

I use the default configuration.

3条回答

独厮守ぢ (楼主)

2021-02-10 01:48
It's a known limitation within Solr / Lucene. Essentially you would have to provide an alternative form of tokenization so that specific space delimited words (i.e. phrases) are treated as single words. One way of achieving this is to do this client side - i.e. in your application that is calling Solr, when indexing, keep a list of synonym phrases and find / replace those phrase values with an alternative (for example removing the spaces or replacing it with a delimiter that isn't treated as a token boundary).

E.g. if you have "Hello There" as a phrase you want to use in a synonym, then replace it with "HelloThere" when indexing.

Now in your synonyms.txt file you can have (for example):
```
Hi HelloThere Wotcha => Hello
```
Similarly when you search, replace any incidences of "Hello There" in the query string with HelloThere and then they will be matched as a synonym of Hello.

Alternatively, you could use the AutoPhraseTokenFilter that LucidWorks created, available on github. This works by maintaining a token stream so that it can work out if a combination of two or more sequential tokens matches one of the synonym phrases, and if it doesn't, it throws away the first token as not matching the phrase. I'm not sure how much overhead this adds, but it seems a good approach - would be nice to have by default in Solr as part of the SynonymFilter.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题