How to search on Elasticsearch for words with or without apostrophe ? and deal with spelling mistakes?

问题

I'm trying to move my Full Text Search logic from MySQL to Elasticsearch. In MySQL to find all rows containing the word "woman" I would just write

SELECT b.code
FROM BIBLE b 
WHERE ((b.DISPLAY_NAME LIKE '%woman%')
 OR (b.BRAND LIKE '%woman%')
 OR (b.DESCRIPTION LIKE '%woman%'));

on elasticsearch I tried for something similar

curl -X GET "localhost:9200/bible/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "multi_match": { "query": "WOMAN","fields": ["description","display_name","brand"] } }, "sort": { "code": {"order": "asc" } },"_source":["code"]
}
'

but it didn't have the same count on further checking it I found words like woman's weren't found by elasticsearch but was by MySQL. How do I solve this ?

AND

How do I incorporate stuff like searching for words even with spelling mistakes or words which are phonetically the same ?

回答1:

Firstly, how is your mapping like ? Are you using any tokenizer. If not i would suggest that if you want to do wildcard search, you should use ngram tokenizer. It is mostly used for partial matches.

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

回答2:

In elasticsearch, you have to do the mapping for the fields before indexing the data. Mapping is the way for informing elasticsearch to index the data in a particular way for retrieving the data the way you want.

Try the below DSL query (JSON format) for creating custom analyzer and mapping:

PUT {YOUR_INDEX_NAME}
{
 "settings": {
   "analysis": {
    "analyzer": {
     "my_analyzer": {
       "tokenizer": "my_tokenizer"
     }
   },
   "tokenizer": {
     "my_tokenizer": {
       "type": "ngram",
       "min_gram": 3,
       "max_gram": 20,
       "token_chars": [
         "letter",
         "digit"
       ]
     }
   }
 },
 "max_ngram_diff": 20 //For Elasticsearch v6 and above
},
"mappings": {
 "properties": {
   "code": {"type": "long"},
   "description": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "display_name": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "brand": {
     "type": "text",
     "analyzer": "my_analyzer"
   }
  }
 }
}

Sample Query example:

GET {YOUR_INDEX_NAME}/_search
{
  "query": {
    "multi_match" : {
      "query" : "women",
      "fields" : [ "description^3", "display_name", "brand" ] 
    }
  }
}

I suggest you take a look at the fuzzy query for spelling mistakes.

Try to use Kibana UI for testing the index using DSL query instead of using cURL which will save you time.

Hope it helps you.

来源：https://stackoverflow.com/questions/55770760/how-to-search-on-elasticsearch-for-words-with-or-without-apostrophe-and-deal-w

标签

ElasticSearch