问题
Basically, I'm trying to find the duplicate contacts by first name, last name & email address. For that, I've tried to use composite aggregation with the fields firstName, lastName & emails.email, the response from the query has the values bucketed for non-nested fields(such as firstName & lastName), but the nested field emails.email doesn't have value at all -> it returns NULL: https://www.screencast.com/t/98CKr0I5
Am I missing something here? any help would be greatly appreciated.
Below is one of the example document
{
"regionId": 10,
"firstName": "John",
"lastName": "mayer",
"emails": [
{
"isPrimary": true,
"email": "sample@gmail.com"
}
]
}
And, I'm trying to query the Elasticsearch as follows:
GET contacts/_search
{
"size" : 0,
"query" : {
"term" : {
"regionId" : {
"value" : 10,
"boost" : 1.0
}
}
},
"_source" : false,
"stored_fields" : "_none_",
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 1000,
"sources" : [
{
"firstNameField" : {
"terms" : {
"field" : "firstName.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
},
{
"lastNameField" : {
"terms" : {
"field" : "lastName.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
},
{
"emailField" : {
"terms" : {
"field" : "emails.email.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"having.3483" : {
"bucket_selector" : {
"buckets_path" : {
"a0" : "_count"
},
"script" : {
"source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.gt(params.a0,params.v0))",
"lang" : "painless",
"params" : {
"v0" : 1
}
},
"gap_policy" : "skip"
}
}
}
}
}
}
回答1:
That's unfortunately not possible. All sources in the composite
would need to be under the same nested
context.
I'd recommend extracting the primary email & setting it in the top level context:
GET contacts/_update_by_query
{
"query": {
"nested": {
"path": "emails",
"query": {
"exists": {
"field": "emails.isPrimary"
}
}
}
},
"script": {
"source": """
ctx._source.primary_email = ctx._source.emails.find(egroup -> egroup.isPrimary).email;
""",
"lang": "painless"
}
}
Then perform the composite agg on primary_email.keyword
.
来源:https://stackoverflow.com/questions/63017759/using-nested-fields-in-composite-aggregation