How to find out result of elasticsearch parsing a query_string?

前端未结

关注

 1  493

Is there a way to find out via the elasticsearch API how a query string query is actually parsed? You can do that manually by looking at the lucene query syntax, but it woul

相关标签:

1条回答

半阙折子戏

2021-01-05 04:08

As javanna mentioned in comments there's _validate api. Here's what works on my local elastic (version 1.6):

curl -XGET 'http://localhost:9201/pl/_validate/query?explain&pretty' -d'
{
  "query": {
      "query_string": {
      "query": "a OR (b AND c) OR (d AND NOT(e or f))",
      "default_field": "t"
    }
  }
}
'

pl is name of index on my cluster. Different index could have different analyzers, that's why query validation is executed in a scope of an index.

The result of the above curl is following:

{
  "valid" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "explanations" : [ {
    "index" : "pl",
    "valid" : true,
    "explanation" : "filtered(t:a (+t:b +t:c) (+t:d -(t:e t:or t:f)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@ce2d82f1)"
  } ]
}

I made one OR lowercase on purpose and as you can see in explanation, it is interpreted as a token and not as a operator.

As for interpretation of the explanation. Format is similar to +- operators of query string query:

( and ) characters start and end bool query
+ prefix means clause that will be in must
- prefix means clause that will be in must_not
no prefix means that it will be in should (with default_operator equal to OR)

So above will be equivalent to following:

{
  "bool" : {
    "should" : [
      {
        "term" : { "t" : "a" }
      },
      {
        "bool": {
          "must": [
            {
              "term" : { "t" : "b" }
            },
            {
              "term" : { "t" : "c" }
            }
          ]
        }
      },
      {
        "bool": {
          "must": {
              "term" : { "t" : "d" }
          },
          "must_not": {
            "bool": {
              "should": [
                {
                  "term" : { "t" : "e" }
                },
                {
                  "term" : { "t" : "or" }
                },
                {
                  "term" : { "t" : "f" }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

I used _validate api quite heavily to debug complex filtered queries with many conditions. It is especially useful if you want to check how analyzer tokenized input like an url or if some filter is cached.

There's also an awesome parameter rewrite that I was not aware of until now, which causes the explanation to be even more detailed showing the actual Lucene query that will be executed.

0 讨论(0)