问题
There is a list of conversations and every conversation has a list of messages. Every message has different fields and an action
field. We need to consider that in the first messages of the conversation there is used the action A
, after a few messages there is used action A.1
and after a while A.1.1
and so on (there is a list of chatbot intents).
Grouping the messages actions of a conversation will be something like: A > A > A > A.1 > A > A.1 > A.1.1 ...
Problem:
I need to create a report using ElasticSearch that will return the actions group
of every conversation; next, I need to group the similar actions groups
adding a count; in the end will result in a Map<actionsGroup, count>
as 'A > A.1 > A > A.1 > A.1.1', 3
.
Constructing the actions group
I need to eliminate every group of duplicates; Instead of A > A > A > A.1 > A > A.1 > A.1.1
I need to have A > A.1 > A > A.1 > A.1.1
.
Steps I started to do:
{
"collapse":{
"field":"context.conversationId",
"inner_hits":{
"name":"logs",
"size": 10000,
"sort":[
{
"@timestamp":"asc"
}
]
}
},
"aggs":{
},
}
What I need next:
- I need to map the result of the collapse in a single result like
A > A.1 > A > A.1 > A.1.1
. I've seen that in the case oraggr
is possible to use scripts over the result and there is possible to create a list of actions like I need to have, butaggr
is doing the operations over all messages, not only over the grouped messages that I have in collapse. It is there possible to useaggr
inside collapse or a similar solution? - I need to group the resulted values(
A > A.1 > A > A.1 > A.1.1
) from all collapses, adding a count and resulting in theMap<actionsGroup, count>
.
Or:
- Group the conversations messages by
conversationId
field usingaggr
(I don't know how can I do this) - Use script to iterate all values and create the
actions group
for every conversation. (not sure if this is possible) - Use another
aggr
over all values and group the duplicates, returningMap<actionsGroup, count>
.
Update 2: I managed to have a partial result but still remaining one issue. Please check here what I still need to fix.
Update 1: adding some extra details
Mappings:
"mappings":{
"properties":{
"@timestamp":{
"type":"date",
"format": "epoch_millis"
}
"context":{
"properties":{
"action":{
"type":"keyword"
},
"conversationId":{
"type":"keyword"
}
}
}
}
}
Sample documents of the conversations:
Conversation 1.
{
"@timestamp": 1579632745000,
"context": {
"action": "A",
"conversationId": "conv_id1",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "A.1",
"conversationId": "conv_id1",
}
},
{
"@timestamp": 1579632745002,
"context": {
"action": "A.1.1",
"conversationId": "conv_id1",
}
}
Conversation 2.
{
"@timestamp": 1579632745000,
"context": {
"action": "A",
"conversationId": "conv_id2",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "A.1",
"conversationId": "conv_id2",
}
},
{
"@timestamp": 1579632745002,
"context": {
"action": "A.1.1",
"conversationId": "conv_id2",
}
}
Conversation 3.
{
"@timestamp": 1579632745000,
"context": {
"action": "B",
"conversationId": "conv_id3",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "B.1",
"conversationId": "conv_id3",
}
}
Expected result:
{
"A -> A.1 -> A.1.1": 2,
"B -> B.1": 1
}
Something similar, having this or any other format.
Since I'm new with elasticsearch every hint is more than welcome.
回答1:
Using script in Terms aggregation we can create buckets on first character of "context.action". Using similar terms sub aggregation we can get all the "context.action" under parent bucket ex A-> A.1->A.1.1 ...
Query:
{
"size": 0,
"aggs": {
"conversations": {
"terms": {
"script": {
"source": "def term=doc['context.action'].value; return term.substring(0,1);"
---> returns first character ex A,B,C etc
},
"size": 10
},
"aggs": {
"sub_conversations": {
"terms": {
"script": {
"source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> All context.action under [A], length check to ignore [A]
},
"size": 10
}
},
"count": {
"cardinality": {
"script": {
"source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> count of all context.action under A
}
}
}
}
}
}
}
Since in elastic search it not possible to join different documents. you will have to get combined key in client side by iterating over the aggregation bucket.
Result:
"aggregations" : {
"conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 6,
"sub_conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A.1",
"doc_count" : 2
},
{
"key" : "A.1.1",
"doc_count" : 2
}
]
},
"count" : {
"value" : 2
}
},
{
"key" : "B",
"doc_count" : 2,
"sub_conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "B.1",
"doc_count" : 1
}
]
},
"count" : {
"value" : 1
}
}
]
}
}
来源:https://stackoverflow.com/questions/60650823/elasticsearch-mapping-the-result-of-collapse-do-operations-on-a-grouped-docume