问题
I have a collection of addresses. Let's simplify and say the only fields are postcode
, city
, street
, streetnumber
and name
.
I'd like to be able to suggest a list of streets when the user enters a postcode, a city and some query for the street.
For example, if the user, in a HTML form, enters:
postcode: 75010
city: Paris
street: rue des
I'd like to get a list of streets like
'rue des petites écuries'
'rue des messageries'
...
'rue du faubourg poissonnière'
...
that I could suggest to the user.
So, I'd like to obtain a list of unique values of the "street" field, sorted according to how well they match my query on the "street" field. I'd like to obtain the 10 best matching streets for this query.
A query returning documents would look like:
{
"query": {
"bool": {
"must": [
{{"term": {"postcode": "75010"}},
{{"term": {city": "Paris"}},
{{"match": {"street": "rue des"}}
]
}
}
}
But of course you would get the same street appear many times, since each street can appear multiple times in differerent addresses in the collection.
I tried to use the "aggregation" framework and added an aggs:
{
"query": {
"bool": {
"must": [
{{"term": {"postcode": "75010"}},
{{"term": {city": "Paris"}},
{{"match": {"street": "rue des"}}
]
}
},
"aggs": {
"street_agg": {
"terms": {
"field": "street",
"size": 10
}
}
}
}
The problem is that it's automatically sorted, not according to the score, but according to the number of documents in each bucket.
I'd like to have the buckets sorted by the score of an arbitrary document picked in each bucket (yes, it's enough to get the score from a single document in a bucket since the score depends only on the content of the street field in my example).
How would you achieve that?
回答1:
Ok, so the solution could actually be found in Elasticsearch aggregation order by top hit score but only if you read the comment here by Shadocko: Elasticsearch aggregation order by top hit score , which I hadn't.
So here's the solution for anyone interested, and for my future self:
{
'query': {
'bool': {
'must': [
{'term': {'postcode': '75010'}},
{'term': {'city': 'Paris'}},
{'match': {'street.autocomplete': 'rue des'}}
]
}
},
'aggs': {
'street_agg': {
'terms': {
'field': 'street',
'size': 10,
'order': {
'max_score': 'desc'
}
},
'aggs': {
'max_score': {
'max': {'script': '_score'}
}
}
}
}
}
It's not perfect, since it uses the max
aggregation function, which means it does unnecessary computation (just taking the score of one document out of a bucket would have been enough). But it seems there's no "pick one" aggregation function, just min
, max
, avg
and sum
, so you have to do that. Well, I think computing the max is not that costly anyway.
来源:https://stackoverflow.com/questions/50685190/elasticsearch-how-to-get-the-top-unique-values-of-a-field-sorted-by-matching-sc