How to match a phrase in elastic-search with expandable prefix and suffix?

与世无争的帅哥 提交于 2019-12-22 01:21:49

问题


We have a use case in which we want to match phrases in elastic-search, but in addition to phrase query we also want to search partial phrases.

Example:

Search phrase: "welcome you" or "lcome you" or "welcome yo" or "lcome yo" this should match to documents containing phrases:

"welcome you"

"we welcome you"

"welcome you to"

"we welcome you to"

i.e. we want to maintain the ordering of words by doing a phrase query with added functionality that is returns us results which contains phrase as a partial substring and with prefix and suffix expandable to certain configurable length. In elastic I found something similar 'match_phrase_prefix' but it only match phrases which starts with a particular prefix.

Ex return results starting with d prefix:

$ curl -XGET localhost:9200/startswith/test/_search?pretty -d '{
    "query": {
        "match_phrase_prefix": {
            "title": {
                "query": "d",
                "max_expansions": 5
            }
        }
    }
}'

Is there any way that I could achieve this for suffix as well ?


回答1:


I would strongly encourage you to look into the shingle token filter.

You can define an index with a custom analyzer that leverages shingles in order to index a set of subsequent tokens together in addition to the tokens themselves.

curl -XPUT localhost:9200/startswith -d '{
  "settings": {
      "analysis": {
        "analyzer": {
          "my_shingles": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "shingles"
            ]
          }
        },
        "filter": {
          "shingles": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 2,
            "output_unigrams": true
          }
        }
      }
  },
  "mappings": {
    "test": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_shingles"
        }
      }
    }
  }
}'

For instance, we welcome you to would be indexed as the following tokens

  • we
  • we welcome
  • welcome
  • welcome you
  • you
  • you to
  • to

Then you can index a few sample documents:

curl -XPUT localhost:9200/startswith/test/_bulk -d '
{"index": {}}
{"title": "welcome you"}
{"index": {}}
{"title": "we welcome you"}
{"index": {}}
{"title": "welcome you to"}
{"index": {}}
{"title": "we welcome you to"}
'

Finally, you can run the following query to match all four documents above, like this:

curl -XPOST localhost:9200/startswith/test/_search -d '{
   "query": {
       "match": {"title": "welcome you"}
   }
}'

Note that this approach is more powerful than the match_phrase_prefix query, because it allows you to match subsequent tokens anywhere in your body of text, whether at the beginning or the end.



来源:https://stackoverflow.com/questions/43757857/how-to-match-a-phrase-in-elastic-search-with-expandable-prefix-and-suffix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!