问题
So I have this case where I need to use top hits on transformation I want to show data based on
I have this data
email col2 col3 col4 col5 Time
a.com a a a a 11:00
a.com a a a a 11:01
a.com a b a a 11:02
I want to remove the duplicate email, and only show it based on the latest time. I'm using transform and aggregate it based on max time. and for the group by I choose every field I needed. It returns data such as : I transform the index and make it groupby : email, col2,col3,col4 and aggregate it by max(Time)
Current index
email col2 col3 col4 col5 Time
a.com a a a a 11:01
a.com a b a a 11:02
I only want it to show data my target
email col2 col3 col4 col5 Time
a.com a b a a 11:02
How can I make the transform based on groupby email only instead every field? Since I need all the field but I don't think add all of the as group by is right but there are only 2 methods either aggregation or groupby
my transformation definition : It didn't come as what i need
{
"id": "transform_baru",
"source": {
"index": [
"email-profile-nov-bug*"
],
"query": {
"match_all": {}
}
},
"dest": {
"index": "transform_baru"
},
"pivot": {
"group_by": {
"Email.keyword": {
"terms": {
"field": "Email.keyword"
}
},
"fa.keyword": {
"terms": {
"field": "fa.keyword"
}
},
"ever.keyword": {
"terms": {
"field": "ever.keyword"
}
},
"bln.keyword": {
"terms": {
"field": "bln.keyword"
}
},
"domain.keyword": {
"terms": {
"field": "domain.keyword"
}
},
"Email_age_category.keyword": {
"terms": {
"field": "Email_age_category.keyword"
}
},
"Status_Category.keyword": {
"terms": {
"field": "Status_Category.keyword"
}
},
"Vintage_cat.keyword": {
"terms": {
"field": "Vintage_cat.keyword"
}
}
},
"aggregations": {
"extract_date.max": {
"max": {
"field": "extract_date"
}
}
}
},
"settings": {},
"version": "7.8.0",
"create_time": 1607832008196
}
回答1:
Problem solved by using this Tophit workaround But I wasn't able to use it. Here is how to use it:
- Choose only the groupby you need. In my case I would just add Email
- Edit json config and add the aggregation with the latest_doc script
- Change '@timestamp' field as your time field.
- So technically, you only use email as groupby, and latest_doc as aggregation
- On the preview, it might be show only the field that you choose as group by, but when the transform index created rest of the field will be show under latest.doc script. So don't worry and just create the transform
I hope this will help some elastic newbie to use this workaround.
Thank you for everyone who try to help me. Cheers
来源:https://stackoverflow.com/questions/65246803/kibana-tophits-on-transform-group-by-a-field-not-all-field