Elasticsearch Aggregation by Day of Week and Hour of Day

十年热恋 提交于 2019-12-06 07:04:34
Heschoon

The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day and day of week:

Date date = new Date(doc['created_time'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "terms": {
                "script": "Date date = new Date(doc['created_time'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');format.format(date)"
            }
        }
    }
}

Re-post from my answer here: https://stackoverflow.com/a/31851896/6247

Does this help:

"aggregations": {
    "timeslice": {
        "histogram": {
            "script": "doc['timestamp'].value.getHourOfDay()",
            "interval": 1,
            "min_doc_count": 0,
            "extended_bounds": {
                "min": 0,
                "max": 23
            },
            "order": {
                "_key": "desc"
            }
        }
    }

This is nice, as it'll also include any hours with zero results, and, it'll extend the results to cover the entire 24 hour period (due to the extended_bounds).

You can use 'getDayOfWeek', 'getHourOfDay', ... (see 'Joda time' for more).

This is great for hours, but for days/months it'll give you a number rather than the month name. To work around, you can get the timeslot as a string - but, this won't work with the extended bounds approach, so you may have empty results (i.e. [Mon, Tues, Fri, Sun]).

In-case you want that, it is here:

"aggregations": {
    "dayOfWeek": {
        "terms": {
            "script": "doc['timestamp'].value.getDayOfWeek().getAsText()",
            "order": {
                "_term": "asc"
            }
        }
    }

Even if this doesn't help you, hopefully someone else will find it and benefit from it.

The simplest way would be to define a dedicated day-of-week field that holds only the day of the week for each document, then do a terms aggregation on that field.

If for whatever reason you don't want to do that (or can't), here is a hack that might help you get what you want. The basic idea is to define a "date.raw" sub-field that is a string, analyzed with the standard analyzer so that terms are created for each day of the week. Then you can aggregate on those terms to get your counts, using include to only include the terms you want.

Here is the mapping I used for testing:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "properties": {
            "msg": {
               "type": "string"
            },
            "date": {
               "type": "date",
               "format": "E, dd MMM yyyy",
               "fields": {
                  "raw": {
                     "type": "string"
                  }
               }
            }
         }
      }
   }
}

and a few sample docs:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"msg": "hello","date": "Wed, 11 Mar 2015"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"msg": "hello","date": "Tue, 10 Mar 2015"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"msg": "hello","date": "Mon, 09 Mar 2015"}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"msg": "hello","date": "Wed, 04 Mar 2015"}

and the aggregation and results:

POST /test_index/_search?search_type=count
{
    "aggs":{
        "docs_by_day":{
            "terms":{
                "field": "date.raw",
                "include": "mon|tue|wed|thu|fri|sat|sun"
            }
        }
    }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "docs_by_day": {
         "buckets": [
            {
               "key": "wed",
               "doc_count": 2
            },
            {
               "key": "mon",
               "doc_count": 1
            },
            {
               "key": "tue",
               "doc_count": 1
            }
         ]
      }
   }
}

Here is the code all together:

http://sense.qbox.io/gist/0292ddf8a97b2d96bd234b787c7863a4bffb14c5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!