Return sets of keywords derived from fields in ElasticSearch

跟風遠走 提交于 2021-02-08 11:57:30

问题


Im kinda new to this i need help, i looked online couldnt find any answer im looking for. Basically, what im trying to do is for autocomplete based on keywords derived from some textfields

Given an example of my indices:

"name": "One liter of Chocolate Milk"
"name": "Milo Milk 250g"
"name": "HiLow low fat milk"
"name": "Yoghurt strawberry"
"name": "Milk Nutrisoy"

So when i type in "mi", im expecting to get the results like:

"milk"
"milo"
"milo milk"
"chocolate milk" 
etc

Very good example is this aliexpress.com autocomplete

Thanks in advance


回答1:


That seems like a good use case for the shingle token filter

curl -XPUT localhost:9200/your_index -d '{
  "settings": {
      "analysis": {
        "analyzer": {
          "my_shingles": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "shingles"
            ]
          }
        },
        "filter": {
          "shingles": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 2,
            "output_unigrams": true
          }
        }
      }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "my_shingles"
        }
      }
    }
  }
}'

If you analyze Milo Milk 250g with this analyzer, you'll get the following tokens:

curl -XGET 'localhost:9200/your_index/_analyze?analyzer=my_shingles&pretty' -d 'Milo Milk 250g'

{
  "tokens" : [ {
    "token" : "milo",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "milo milk",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "shingle",
    "position" : 0
  }, {
    "token" : "milk",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "milk 250g",
    "start_offset" : 5,
    "end_offset" : 14,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "250g",
    "start_offset" : 10,
    "end_offset" : 14,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

So when searching for mi, you'll get the following tokens:

  • milo
  • milo milk
  • milk
  • milk 250g


来源:https://stackoverflow.com/questions/42990382/return-sets-of-keywords-derived-from-fields-in-elasticsearch

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!