ElasticSearch mapping the result of collapse / do operations on a grouped documents

瘦欲@ 提交于 2020-03-16 08:11:03

问题


There is a list of conversations and every conversation has a list of messages. Every message has different fields and an action field. We need to consider that in the first messages of the conversation there is used the action A, after a few messages there is used action A.1 and after a while A.1.1 and so on (there is a list of chatbot intents).

Grouping the messages actions of a conversation will be something like: A > A > A > A.1 > A > A.1 > A.1.1 ...

Problem:

I need to create a report using ElasticSearch that will return the actions group of every conversation; next, I need to group the similar actions groups adding a count; in the end will result in a Map<actionsGroup, count> as 'A > A.1 > A > A.1 > A.1.1', 3.

Constructing the actions group I need to eliminate every group of duplicates; Instead of A > A > A > A.1 > A > A.1 > A.1.1 I need to have A > A.1 > A > A.1 > A.1.1.

Steps I started to do:

{
   "collapse":{
      "field":"context.conversationId",
      "inner_hits":{
         "name":"logs",
         "size": 10000,
         "sort":[
            {
               "@timestamp":"asc"
            }
         ]
      }
   },
   "aggs":{
   },
}

What I need next:

  1. I need to map the result of the collapse in a single result like A > A.1 > A > A.1 > A.1.1. I've seen that in the case or aggr is possible to use scripts over the result and there is possible to create a list of actions like I need to have, but aggr is doing the operations over all messages, not only over the grouped messages that I have in collapse. It is there possible to use aggr inside collapse or a similar solution?
  2. I need to group the resulted values(A > A.1 > A > A.1 > A.1.1) from all collapses, adding a count and resulting in the Map<actionsGroup, count>.

Or:

  1. Group the conversations messages by conversationId field using aggr (I don't know how can I do this)
  2. Use script to iterate all values and create the actions group for every conversation. (not sure if this is possible)
  3. Use another aggr over all values and group the duplicates, returning Map<actionsGroup, count>.

Update 2: I managed to have a partial result but still remaining one issue. Please check here what I still need to fix.

Update 1: adding some extra details

Mappings:

"mappings":{
  "properties":{
     "@timestamp":{
        "type":"date",
        "format": "epoch_millis"
     }
     "context":{
        "properties":{
           "action":{
              "type":"keyword"
           },
           "conversationId":{
              "type":"keyword"
           }
        }
     }
  }
}

Sample documents of the conversations:

Conversation 1.
{
    "@timestamp": 1579632745000,
    "context": {
        "action": "A",
        "conversationId": "conv_id1",
    }
},
{
    "@timestamp": 1579632745001,
    "context": {
        "action": "A.1",
        "conversationId": "conv_id1",
    }
},
{
    "@timestamp": 1579632745002,
    "context": {
        "action": "A.1.1",
        "conversationId": "conv_id1",
    }
}

Conversation 2.
{
    "@timestamp": 1579632745000,
    "context": {
        "action": "A",
        "conversationId": "conv_id2",
    }
},
{
    "@timestamp": 1579632745001,
    "context": {
        "action": "A.1",
        "conversationId": "conv_id2",
    }
},
{
    "@timestamp": 1579632745002,
    "context": {
        "action": "A.1.1",
        "conversationId": "conv_id2",
    }
}

Conversation 3.
{
    "@timestamp": 1579632745000,
    "context": {
        "action": "B",
        "conversationId": "conv_id3",
    }
},
{
    "@timestamp": 1579632745001,
    "context": {
        "action": "B.1",
        "conversationId": "conv_id3",
    }
}

Expected result:

{
    "A -> A.1 -> A.1.1": 2,
    "B -> B.1": 1
}
Something similar, having this or any other format.

Since I'm new with elasticsearch every hint is more than welcome.


回答1:


Using script in Terms aggregation we can create buckets on first character of "context.action". Using similar terms sub aggregation we can get all the "context.action" under parent bucket ex A-> A.1->A.1.1 ...

Query:

{
  "size": 0,
  "aggs": {
    "conversations": {
      "terms": {
        "script": {
          "source": "def term=doc['context.action'].value; return term.substring(0,1);" 
--->  returns first character ex A,B,C etc
        },
        "size": 10
      },
      "aggs": {
        "sub_conversations": {
          "terms": {
            "script": {
              "source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> All context.action under [A], length check to ignore [A]
            },
            "size": 10
          }
        },
        "count": {
          "cardinality": {
            "script": {
              "source": "if(doc['context.action'].value.length()>1) return doc['context.action'];"--> count of all context.action under A
            }
          }
        }
      }
    }
  }
}

Since in elastic search it not possible to join different documents. you will have to get combined key in client side by iterating over the aggregation bucket.

Result:

  "aggregations" : {
    "conversations" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "A",
          "doc_count" : 6,
          "sub_conversations" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "A.1",
                "doc_count" : 2
              },
              {
                "key" : "A.1.1",
                "doc_count" : 2
              }
            ]
          },
          "count" : {
            "value" : 2
          }
        },
        {
          "key" : "B",
          "doc_count" : 2,
          "sub_conversations" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "B.1",
                "doc_count" : 1
              }
            ]
          },
          "count" : {
            "value" : 1
          }
        }
      ]
    }
  }


来源:https://stackoverflow.com/questions/60650823/elasticsearch-mapping-the-result-of-collapse-do-operations-on-a-grouped-docume

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!