What is the best practice of fuzzy search (like '%aaa%' in MySQL) in Elasticsearch 6.8

一曲冷凌霜 提交于 2021-02-11 13:41:32


Background: I use Mysql and there are millions data, each line have twenty columns, we have some complex search and some column use fuzzy match, such as username like '%aaa%', it can't use mysql index unless remove the first %, but we need fuzzy match to do search like Satckoverflow search, i also checked Mysql fulltext index, but it doesn't support complex search whthin one sql if using other index.

My solution: add Elasticsearch as our search engine, insert data into Mysql and Es and search data only in Elasticsearch

I checked Elasticsearch fuzzy search, wildcard works, but many people don't suggest use * in the word beginning, it will make search very slow.

For example: username: 'John_Snow'

wildcard works but may very slow

GET /user/_search
  "query": {
    "wildcard": {
      "username": "*hn*"

match_phrase doesn't work seems only work on Tokenizer like phrase 'John Snow'

  "query": {
      "dbName": "hn"

My question: Is there any better solution to do complex query that contains fuzzy match like '%no%' or '%hn_Sn%'.


You can use ngram tokenizer that first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.

Adding a working example with index data, mapping, search query, and results.

Index Mapping:

    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "my_tokenizer"
            "tokenizer": {
                "my_tokenizer": {
                    "type": "ngram",
                    "min_gram": 2,
                    "max_gram": 10,
                    "token_chars": [
        "max_ngram_diff": 50
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "my_analyzer",
                "search_analyzer": "standard"

Analyze API

POST/ _analyze

  "analyzer": "my_analyzer",
  "text": "John_Snow"

The tokens are :

    "tokens": [
            "token": "Jo",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
            "token": "Joh",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 1
            "token": "John",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 2
            "token": "oh",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 3
            "token": "ohn",
            "start_offset": 1,
            "end_offset": 4,
            "type": "word",
            "position": 4
            "token": "hn",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 5
            "token": "Sn",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 6
            "token": "Sno",
            "start_offset": 5,
            "end_offset": 8,
            "type": "word",
            "position": 7
            "token": "Snow",
            "start_offset": 5,
            "end_offset": 9,
            "type": "word",
            "position": 8
            "token": "no",
            "start_offset": 6,
            "end_offset": 8,
            "type": "word",
            "position": 9
            "token": "now",
            "start_offset": 6,
            "end_offset": 9,
            "type": "word",
            "position": 10
            "token": "ow",
            "start_offset": 7,
            "end_offset": 9,
            "type": "word",
            "position": 11

Index Data:


Search Query:

    "query": {
        "match" : {
            "title" : "hn"

Search Result:

"hits": [
                "_index": "test",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "John_Snow"

Refer to this blog if you want to do an autocomplete search.

Another search query

    "query": {
        "match" : {
            "title" : "ohr"

The above search query shows no result

