How to filter logs easily with awk?

邮差的信 提交于 2019-11-28 01:37:40

Use ISO 8601 time format!

However, this seems to be quite a bit of work for something that should be more straight forward.

Yes, this should be straightforward, and the reason why it is not, is because the logs do not use ISO 8601. Application logs should use ISO format and UTC to display times, other settings should be considered broken and fixed.

Your request should be split in two parts. The first part canonise the logs, converting dates to the ISO format, the second performs a research:

awk '
match($0, /([0-9]+)\/([A-Z][a-z]{2})\/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
  day=a[1]
  month=a[2];
  year=a[3]
  hour=a[4]
  min=a[5]
  sec=a[6]
  utc=a[7];
  month=sprintf("%02d", (match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
  myisodate=sprintf("%4d-%2d-%2dT%2d:%2d:%2d%6s", year,month,day,hour,min,sec,utc);
 $1 = myisodate
 print
}' mylog

The nice thing about ISO 8601 dates – besides them being a standard – is that the chronological order coincide with lexicographic order, therefore, you can use the /…/,/…/ operator to extract the dates you are interested in. For instance to find what happened between 1 Oct 2015 18:00 +0200 and 1 Nov 2015 01:00 +0200, append the following filter to the previous, standardising filter:

awk '/2015-10-01:18:00:00+0200/,/2015-11-01:01:00:00+0200/'

without getting into time format (assuming all records are formatted the same) you can use sort | awk combination to achieve the same with ease.

This assumes logs are not ordered, based on your format and special sort option to sort months (M) and awk to pick the interested range. The sorting is based on year, month, and day in that order.

$ sort -k1.9,1.12 -k1.5,1.7M -k1.2,1.3 log | awk '/01\/Oct\/2015/,/01\/Nov\/2015/'

You can easily extend to include time as well and drop the sort if the file is already sorted.

The following has the time constraint as well

awk -F: '/01\/Oct\/2015/ && $2>=18{p=1} 
         /01\/Nov\/2015/ && $2>=1 {p=0} p'

I would use date command inside awk to achieve this, though no idea how this would perform with large log files.

awk -F "[][]" -v start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"
    -v end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")" '{
        gsub(/\//,"-",$2);sub(/:/," ",$2);
        cmd="date -d\""$2"\" +%s" ;
        cmd|getline mytimestamp;
        close(cmd);
        if (start<=mytimestamp && mytimestamp<=end) print
}' mylog
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!