Elasticsearch using shingle filter with synonym

佐手、 提交于 2021-02-10 05:55:12

问题


I have the following documents:

  • south africa
  • north africa

I want to retrieve my "south africa" document from:

  • s africa (a)
  • southafrica (b)
  • safrica (c)

I defined the followings filters and analyzers:

POST test_index
{
  "settings": {
   "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "south,s",
            "north,n"
          ]
        },
        "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "token_separator": ""
          }
      },
      "analyzer": {
        "my_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter"]
        },
        "my_shingle_synonym": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter", "synonym_filter"]
        },
        "my_synonym_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["synonym_filter", "shingle_filter"]
        }
    }
  } 
  },
  "mappings": {}
}

1) With my_shingle south africa will be indexed as south, southafrica, africa

2) With my_shingle_synonym south africa will be indexed as south, s, southafrica, africa

3) With my_synonym_shingle south africa will be indexed as south, souths, southsafrica, s, safrica, africa

So with

  • (1) I will find b

  • (2) I will find a, b

  • (3) I will find a, c

I want south africa to be indexed as: south, s, southafrica, safrica, africa


回答1:


You do not have to output all possible tokens as per your requirement. Your problem can be solved by using different analyzers on multi fields.

You would define mapping of your desired field like this.

"mappings": {
    "your_mapping": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "my_shingle",
          "fields": {
            "synonym": {
              "type": "string",
              "analyzer": "my_synonym_shingle"
            }
          }
        }
      }
    }
  }

sample document to index

PUT test_index/your_mapping/1
{
  "name" : "south africa"
}

then you would query on all variants of name field with wildcard expression.

GET test_index/your_mapping/_search
{
  "query": {
    "query_string": {
      "fields": [
        "name*"
      ],
      "query": "safrica"
    }
  }
}


来源:https://stackoverflow.com/questions/40681178/elasticsearch-using-shingle-filter-with-synonym

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!