ElasticSearch - Searching with hyphens

前端 未结 3 1400
离开以前
离开以前 2020-12-16 02:17

Elastic Search 1.6

I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a \"Simple Query String\" query to sea

相关标签:
3条回答
  • 2020-12-16 03:11

    the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.

    if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.

    Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.

    you can do it in the client front.

    In your problem u-*

    {
      "query":{
        "simple_query_string":{
          "query":"u AND 1*",
          "analyze_wildcard":true
        }
      }
    }
    

    t-sh*

      {
          "query":{
            "simple_query_string":{
              "query":"t AND sh*",
              "analyze_wildcard":true
            }
          }
        }
    
    0 讨论(0)
  • 2020-12-16 03:11

    If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.

    For eg, O-000022334 should indexed as O_000022334.

    When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.

    0 讨论(0)
  • 2020-12-16 03:19

    The answer is really simple:

    Quote from Igor Motov: Configuring the standard tokenizer

    By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:

    {
      "_source":true,
      "query":{
        "simple_query_string":{
          "query":"u-1*",
          "analyze_wildcard":true,
          "default_operator":"AND"
        }
      }
    }
    
    0 讨论(0)
提交回复
热议问题