How to search on Elasticsearch for words with or without apostrophe ? and deal with spelling mistakes?

会有一股神秘感。 提交于 2021-02-11 13:38:48

问题


I'm trying to move my Full Text Search logic from MySQL to Elasticsearch. In MySQL to find all rows containing the word "woman" I would just write

SELECT b.code
FROM BIBLE b 
WHERE ((b.DISPLAY_NAME LIKE '%woman%')
 OR (b.BRAND LIKE '%woman%')
 OR (b.DESCRIPTION LIKE '%woman%'));

on elasticsearch I tried for something similar

curl -X GET "localhost:9200/bible/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "multi_match": { "query": "WOMAN","fields": ["description","display_name","brand"] } }, "sort": { "code": {"order": "asc" } },"_source":["code"]
}
'

but it didn't have the same count on further checking it I found words like woman's weren't found by elasticsearch but was by MySQL. How do I solve this ?

AND

How do I incorporate stuff like searching for words even with spelling mistakes or words which are phonetically the same ?


回答1:


Firstly, how is your mapping like ? Are you using any tokenizer. If not i would suggest that if you want to do wildcard search, you should use ngram tokenizer. It is mostly used for partial matches.

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html




回答2:


In elasticsearch, you have to do the mapping for the fields before indexing the data. Mapping is the way for informing elasticsearch to index the data in a particular way for retrieving the data the way you want.

Try the below DSL query (JSON format) for creating custom analyzer and mapping:

PUT {YOUR_INDEX_NAME}
{
 "settings": {
   "analysis": {
    "analyzer": {
     "my_analyzer": {
       "tokenizer": "my_tokenizer"
     }
   },
   "tokenizer": {
     "my_tokenizer": {
       "type": "ngram",
       "min_gram": 3,
       "max_gram": 20,
       "token_chars": [
         "letter",
         "digit"
       ]
     }
   }
 },
 "max_ngram_diff": 20 //For Elasticsearch v6 and above
},
"mappings": {
 "properties": {
   "code": {"type": "long"},
   "description": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "display_name": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "brand": {
     "type": "text",
     "analyzer": "my_analyzer"
   }
  }
 }
}

Sample Query example:

GET {YOUR_INDEX_NAME}/_search
{
  "query": {
    "multi_match" : {
      "query" : "women",
      "fields" : [ "description^3", "display_name", "brand" ] 
    }
  }
}

I suggest you take a look at the fuzzy query for spelling mistakes.

Try to use Kibana UI for testing the index using DSL query instead of using cURL which will save you time.

Hope it helps you.



来源:https://stackoverflow.com/questions/55770760/how-to-search-on-elasticsearch-for-words-with-or-without-apostrophe-and-deal-w

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!