Multi-field, multi-word, match without query_string

后端未结

关注

 4  1558

I would like to be able to match a multi word search against multiple fields where every word searched is contained in any of the fields, any combination. T

相关标签:

4条回答

-上瘾入骨i

2020-12-23 22:48

What you are looking for is the multi-match query, but it doesn't perform in quite the way you would like.

Compare the output of validate for multi_match vs query_string.

multi_match (with operator and) will make sure that ALL terms exist in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "multi_match" : {
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

While query_string (with default_operator AND) will check that EACH term exists in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "query_string" : {
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

So you have a few choices to achieve what you are after:

Preparse the search terms, to remove things like wildcards, etc, before using the query_string
Preparse the search terms to extract each word, then generate a multi_match query per word
Use index_name in your mapping for the name fields to index their data into a single field, which you can then use for search. (like your own custom all field):

As follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "index_name" : "name",
               "type" : "string"
            },
            "lastname" : {
               "index_name" : "name",
               "type" : "string"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 33
# }

Note however, that firstname and lastname are no longer searchable independently. The data for both fields has been indexed into name.

You could use multi-fields with the path parameter to make them searchable both independently and together, as follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "fields" : {
                  "firstname" : {
                     "type" : "string"
                  },
                  "any_name" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            },
            "lastname" : {
               "fields" : {
                  "any_name" : {
                     "type" : "string"
                  },
                  "lastname" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

Searching the any_name field works:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "any_name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 11
# }

Searching firstname for john AND smith doesn't work:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 2
# }

But searching firstname for just john works correctly:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 3
# }

0 讨论(0)

南笙

2020-12-23 22:48

I think "match" query is what you are looking for:

"The match family of queries does not go through a “query parsing” process. It does not support field name prefixes, wildcard characters, or other “advance” features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does)"

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-23 22:49
Nowadays you can use cross_fields type in multi_match
```
GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields", 
            "operator":    "and",
            "fields":      [ "firstname", "lastname", "middlename" ]
        }
    }
}
```
Cross-fields take a term-centric approach. It treats all of the fields as one big field, and looks for each term in any field.

One thing to note though is that if you want it to work optimally, all fields analyzed should have the same analyzer (standard, english, etc.):

For the cross_fields query type to work optimally, all fields should have the same analyzer. Fields that share an analyzer are grouped together as blended fields.

If you include fields with a different analysis chain, they will be added to the query in the same way as for best_fields. For instance, if we added the title field to the preceding query (assuming it uses a different analyzer), the explanation would be as follows:

(+title:peter +title:smith) ( +blended("peter", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name]) )
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-12-23 22:51

I would rather avoid using query_string in case the user passes "OR", "AND" and any of the other advanced params.

In my experience, escaping the special characters with backslash is a simple and effective solution. The list can be found in the documentation http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description, plus AND/OR/NOT/TO.

0 讨论(0)
发布评论:

提交评论
- 加载中...