How to filter logs easily with awk?

前端 未结 3 1204
一整个雨季
一整个雨季 2020-12-06 23:32

Suppose I have a log file mylog like this:

[01/Oct/2015:16:12:56 +0200] error number 1
[01/Oct/2015:17:12:56 +0200] error number 2
[01/Oct/2015:         


        
相关标签:
3条回答
  • 2020-12-06 23:45

    Use ISO 8601 time format!

    However, this seems to be quite a bit of work for something that should be more straight forward.

    Yes, this should be straightforward, and the reason why it is not, is because the logs do not use ISO 8601. Application logs should use ISO format and UTC to display times, other settings should be considered broken and fixed.

    Your request should be split in two parts. The first part canonise the logs, converting dates to the ISO format, the second performs a research:

    awk '
    match($0, /([0-9]+)\/([A-Z][a-z]{2})\/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
      day=a[1]
      month=a[2];
      year=a[3]
      hour=a[4]
      min=a[5]
      sec=a[6]
      utc=a[7];
      month=sprintf("%02d", (match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
      myisodate=sprintf("%4d-%2d-%2dT%2d:%2d:%2d%6s", year,month,day,hour,min,sec,utc);
     $1 = myisodate
     print
    }' mylog
    

    The nice thing about ISO 8601 dates – besides them being a standard – is that the chronological order coincide with lexicographic order, therefore, you can use the /…/,/…/ operator to extract the dates you are interested in. For instance to find what happened between 1 Oct 2015 18:00 +0200 and 1 Nov 2015 01:00 +0200, append the following filter to the previous, standardising filter:

    awk '/2015-10-01:18:00:00+0200/,/2015-11-01:01:00:00+0200/'
    
    0 讨论(0)
  • 2020-12-07 00:04

    without getting into time format (assuming all records are formatted the same) you can use sort | awk combination to achieve the same with ease.

    This assumes logs are not ordered, based on your format and special sort option to sort months (M) and awk to pick the interested range. The sorting is based on year, month, and day in that order.

    $ sort -k1.9,1.12 -k1.5,1.7M -k1.2,1.3 log | awk '/01\/Oct\/2015/,/01\/Nov\/2015/'
    

    You can easily extend to include time as well and drop the sort if the file is already sorted.

    The following has the time constraint as well

    awk -F: '/01\/Oct\/2015/ && $2>=18{p=1} 
             /01\/Nov\/2015/ && $2>=1 {p=0} p'
    
    0 讨论(0)
  • 2020-12-07 00:08

    I would use date command inside awk to achieve this, though no idea how this would perform with large log files.

    awk -F "[][]" -v start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"
        -v end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")" '{
            gsub(/\//,"-",$2);sub(/:/," ",$2);
            cmd="date -d\""$2"\" +%s" ;
            cmd|getline mytimestamp;
            close(cmd);
            if (start<=mytimestamp && mytimestamp<=end) print
    }' mylog
    
    0 讨论(0)
提交回复
热议问题