问题
A system wraps lines in a log file if they exceed X characters. I am trying to extract various data from the log, but first I need to combine all the split lines so gawk can parse the fields as a single record.
For example:
2012/11/01 field1 field2 field3 field4 fi
eld5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 fi
eld5 field6 field7 field8 field9 field10
field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4
I want to return
2012/11/01 field1 field2 field3 field4 field5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4
The actual max line length in my case is 130. I'm reluctant to test for that length and use getline
to join the next line, in case there is a entry that is exactly 130 chars long.
Once I've cleaned up the log file, I'm also going to want to extract all the relevant events, where "relevant" may involve criteria like:
- 'foo' is anywhere in any field in the record
- field2 ~ /bar|dtn/
- if field1 ~ /xyz|abc/ && field98 == "0001"
I'm wondering if I will need to run two successive gawk programs, or if I can combine all of this into one.
I'm a gawk newbie and come from a non-Unix
回答1:
$ awk '{printf "%s%s",($1 ~ "/" ? rs : ""),$0; rs=RS} END{print ""}' file
2012/11/01 field1 field2 field3 field4 field5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4
Now that I've noticed you don't actually want to just print recombined records, here's an alternative way to do that that's more amenable to test on the recompiled record ("s" in this script:
$ awk 'NR>1 && $1~"/"{print s; s=""} {s=s $0} END{print s}' file
Now with that structure, instead of just printing s you can perform tests on s, for example (note "foo" in 3rd record):
$ cat file
2012/11/01 field1 field2 field3 field4 fi
eld5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 foo field4 fi
eld5 field6 field7 field8 field9 field10
field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4
$ awk '
function tst(rec, flds,nf,i) {
nf=split(rec,flds)
if (rec ~ "foo") {
print rec
for (i=1;i<=nf;i++)
print "\t",i,flds[i]
}
}
NR>1 && $1~"/" { tst(s); s="" }
{ s=s $0 }
END { tst(s) }
' file
2012/12/31 field1 field2 foo field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
1 2012/12/31
2 field1
3 field2
4 foo
5 field4
6 field5
7 field6
8 field7
9 field8
10 field9
11 field10
12 field11
13 field12
14 field13
回答2:
gawk '{ gsub( "\n", "" ); printf $0 RT }
END { print }' RS='\n[0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]' input
This can be somewhat simplified with:
gawk --re-interval '{ gsub( "\n", "" ); printf $0 RT }
END { print }' RS='\n[0-9]{4}/[0-9]{2}/[0-9]{2}' input
回答3:
This might work for you (GNU sed):
sed -r ':a;$!N;\#\n[0-9]{4}/[0-9]{2}/[0-9]{2}#!{s/\n//;ta};P;D' file
回答4:
Here's a slightly bigger Perl solution which also handles the additional filtering (as you tagged this perl as well):
root@virtualdeb:~# cat combine_and_filter.pl
#!/usr/bin/perl -n
if (m!^2\d{3}/\d{2}/\d{2} !){
print $prevline if $prevline =~ m/field13/;
$prevline = $_;
}else{
chomp($prevline);
$prevline .= $_
}
root@virtualdeb:~# perl combine_and_filter < /tmp/in.txt
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
回答5:
this may work for you:
awk --re-interval '/^[0-9]{4}\//&&s{print s;s=""}{s=s""sprintf($0)}END{print s}' file
test with your example:
kent$ echo "2012/11/01 field1 field2 field3 field4 fi
eld5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 fi
eld5 field6 field7 field8 field9 field10
field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4"|awk --re-interval '/^[0-9]{4}\//&&s{print s;s=""}{s=s""sprintf($0)}END{print s}'
2012/11/01 field1 field2 field3 field4 field5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4
回答6:
Here is a very short script to acccomplish this.
sed '/^[[:digit:]]/ { :r N; /\n\([^[:digit:]]\)/ s:: \1:g; tr; } ' FILE
Are you happy with it in this form ?
来源:https://stackoverflow.com/questions/14780745/combine-split-lines-with-awk-gawk