Multi-word synonym search in Solr

后端未结

关注

 3  896

星月不相逢 2021-02-10 00:52

I\'m trying to use a synonym filter to search for a phrase.

peter=> spider man, spiderman, Mary Jane, .....

I use the default configuration.

3条回答

执笔经年 (楼主)

2021-02-10 01:38
Yes sadly this is a well known problem due to how the Solr query parser breaks up on whitespace before analyzing. So instead of seeing "spider" before "man" in the token stream, you instead simply see each word on its own. Just "spider" with nothing before/after and just "man" with nothing before/after.

This is because most Solr query forms see a space as basically an "OR". Search for "spider OR man" instead of looking at the full text, analyzing it to generate synonyms, then generating a query from that.

For more background, there's this blog post

There's a large number of solutions to this problem, including the following:
- hon-lucene-synonyms. This plugin runs an analyzer before generating an edismax query over multiple fields. It's a bit of a blackbox, and I've found it can generate some complex query forms that generate weird performance and relevance bugs.
- Lucidwork's autophrase query parser By selectively autophrasing, this plugin lets you specify key phrases (spider man) that should not be broken into OR queries and can have synonym expansion applied
- OpenSource Connection's Match query parser. Searches a single field using a query-specified analyzer run before the field is searched. Also searches multi-word synonyms as phrases. My favorite, but disclaimer: I'm the author :)
- Rene Kriegler's Querqy -- Querqy is a Solr plugin for query preprocessing rules. These rules can identify your key phrases and rewrite the query to non-multiterm form.
- Roll your own: Learn to write your own query parser plugin and handle the problem however you want.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...