问题
Suppose I have a log file mylog
like this:
[01/Oct/2015:16:12:56 +0200] error number 1
[01/Oct/2015:17:12:56 +0200] error number 2
[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8
[01/Nov/2015:01:02:00 +0200] error number 9
[01/Jan/2016:01:02:00 +0200] error number 10
And I want to find those lines that occur between 1 Oct at 18.00 and 1 Nov at 1.00. That is, the expected output would be:
[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8
I have managed to convert the times to timestamp by using match() and then mktime(). First one finds the specified pattern, that is stored in the array a[]
so it can be accessed (interesting to see glenn jackman's answer to access captured group from line pattern for a good example). Since mktime
requires a format YYYY MM DD HH MM SS[ DST]
, I also have to convert the month in the form Xxx
into a digit, for which I use an answer by Ed Morton to "convert month from Aaa to xx": awk '{printf "%02d\n",(match("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'
.
All together, finally I have the timestamp in the variable mytimestamp
:
awk 'match($0, /([0-9]+)\/([A-Z][a-z]{2})\/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
day=a[1]; month=a[2]; year=a[3];
hour=a[4]; min=a[5]; sec=a[6]; utc=a[7];
month=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
mydate=sprintf("%s %s %s %s %s %s %s", year,month,day,hour,min,sec,utc);
mytimestamp=mktime(mydate)
print mytimestamp
}' mylog
Returns:
1443708776
1443712376
1443715676
etc.
So now I am ready to convert against the given dates. Since awk
takes a lot to handle such format, I prefer to provide them through an external shell variable, using date -d"my date" +"%s"
to print the timestamp:
start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"
end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")"
All together, this works:
awk start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")" end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")" 'match($0, /([0-9]+)\/([A-Z][a-z]{2})\/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {day=a[1]; month=a[2]; year=a[3]; hour=a[4]; min=a[5]; sec=a[6]; utc=a[7]; month=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3); mydate=sprintf("%s %s %s %s %s %s %s", year,month,day,hour,min,sec,utc); mytimestamp=mktime(mydate); if (start<=mytimestamp && mytimestamp<=end) print}' mylog
[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8
However, this seems to be quite a bit of work for something that should be more straight forward. Nonetheless, the introduction of the "Time functions" section in man gawk
is
Since one of the primary uses of AWK programs is processing log files that contain time stamp information, gawk provides the following functions for obtaining time stamps and formatting them.
So I wonder: is there any better way to do this? For example, what if the format instead of dd/Mmm/YYYY:HH:MM:ss
was something like dd Mmm YYYY HH:MM:ss
? Couldn't it be possible to provide the match pattern externally instead of having to change it every time this would happen? Do I really have to use match()
and then process that output to then feed mktime()
? Doesn't gawk
provide a more simple way to do this?
回答1:
Use ISO 8601 time format!
However, this seems to be quite a bit of work for something that should be more straight forward.
Yes, this should be straightforward, and the reason why it is not, is because the logs do not use ISO 8601. Application logs should use ISO format and UTC to display times, other settings should be considered broken and fixed.
Your request should be split in two parts. The first part canonise the logs, converting dates to the ISO format, the second performs a research:
awk '
match($0, /([0-9]+)\/([A-Z][a-z]{2})\/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
day=a[1]
month=a[2];
year=a[3]
hour=a[4]
min=a[5]
sec=a[6]
utc=a[7];
month=sprintf("%02d", (match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
myisodate=sprintf("%4d-%2d-%2dT%2d:%2d:%2d%6s", year,month,day,hour,min,sec,utc);
$1 = myisodate
print
}' mylog
The nice thing about ISO 8601 dates – besides them being a standard – is that the chronological order coincide with lexicographic order, therefore, you can use the /…/,/…/
operator to extract the dates you are interested in. For instance to find what happened between 1 Oct 2015 18:00 +0200 and 1 Nov 2015 01:00 +0200, append the following filter to the previous, standardising filter:
awk '/2015-10-01:18:00:00+0200/,/2015-11-01:01:00:00+0200/'
回答2:
without getting into time format (assuming all records are formatted the same) you can use sort | awk
combination to achieve the same with ease.
This assumes logs are not ordered, based on your format and special sort option to sort months (M
) and awk to pick the interested range. The sorting is based on year, month, and day in that order.
$ sort -k1.9,1.12 -k1.5,1.7M -k1.2,1.3 log | awk '/01\/Oct\/2015/,/01\/Nov\/2015/'
You can easily extend to include time as well and drop the sort if the file is already sorted.
The following has the time constraint as well
awk -F: '/01\/Oct\/2015/ && $2>=18{p=1}
/01\/Nov\/2015/ && $2>=1 {p=0} p'
回答3:
I would use date
command inside awk
to achieve this, though no idea how this would perform with large log files.
awk -F "[][]" -v start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"
-v end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")" '{
gsub(/\//,"-",$2);sub(/:/," ",$2);
cmd="date -d\""$2"\" +%s" ;
cmd|getline mytimestamp;
close(cmd);
if (start<=mytimestamp && mytimestamp<=end) print
}' mylog
来源:https://stackoverflow.com/questions/34311140/how-to-filter-logs-easily-with-awk