how to configure the synonyms_path in elasticsearch

前端 未结 1 533
渐次进展
渐次进展 2021-02-07 08:57

i\'m pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file:

index :
    analysis :
        analyzer : 
                   


        
相关标签:
1条回答
  • 2021-02-07 09:52

    I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.

    First thing create the synonym file:

    foo => foo bar, baz
    

    Now I create the index with the particular settings you are trying to test:

    curl -XPUT 'http://localhost:9200/test/' -d '{
      "settings": {
        "index": {
          "analysis": {
            "analyzer": {
              "synonym": {
                "tokenizer": "whitespace",
                "filter": ["synonym"]
              }
            },
            "filter" : {
              "synonym" : {
                  "type" : "synonym",
                  "synonyms_path" : "synonyms.txt"
              }
            }
          }
        }
      },
      "mappings": {
    
        "test" : {
          "properties" : {
            "text_1" : {
               "type" : "string",
               "analyzer" : "synonym"
            },
            "text_2" : {
               "search_analyzer" : "standard",
               "index_analyzer" : "standard",
               "type" : "string"
            },
            "text_3" : {
               "type" : "string",
               "search_analyzer" : "synonym",
               "index_analyzer" : "standard"
            }
          }
        }
      }
    }'
    

    Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.

    Now index a doc:

    curl -XPUT 'http://localhost:9200/test/test/1' -d '{
      "text_3": "baz dog cat",
      "text_2": "foo dog cat",
      "text_1": "foo dog cat"
    }'
    

    Now the searches

    Searching in field text_1

    curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.15342641,
        "hits": [
          {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 0.15342641,
            "_source": {
              "text_3": "baz dog cat",
              "text_2": "foo dog cat",
              "text_1": "foo dog cat"
            }
          }
        ]
      }
    }
    

    You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms

    Searching in field text_2

    curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
    

    result:

    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
      }
    }
    

    I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.

    Searching in field text_3

    curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.15342641,
        "hits": [
          {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 0.15342641,
            "_source": {
              "text_3": "baz dog cat",
              "text_2": "foo dog cat",
              "text_1": "foo dog cat"
            }
          }
        ]
      }
    }
    

    Note: text_3 is "baz dog cat"

    text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.

    If you want to debug you can use _analyze endpoint for example:

    curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
    

    result:

    {
      "tokens": [
        {
          "token": "foo",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 1
        },
        {
          "token": "baz",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 1
        },
        {
          "token": "bar",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 2
        }
      ]
    }
    
    0 讨论(0)
提交回复
热议问题