问题
I need to be able to search for a string (lets use 4320101), print 20 lines above the string and print after this until it finds the string
For example:
Random text I do not want or blank line
16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>
Random text I do not want or blank line
I just want the following result outputted to a file:
16 Apr 2013 00:14:15
id="4320101"
</eventUpdate>
There are multiple examples of these groups of text in a file that I want.
I tried using this below:
cat filename | grep "</eventUpdate>" -A 20 4320101 -B 100 > greptest.txt
But it only ever shows for 20 lines either side of the string.
Notes:
- the line number the text is on is inconsistent so I cannot go off these, hence why I am using -A 20.
- ideally I'd rather have it so when it searches after the string, it stops when it finds and then carries on searching.
Summary: find 4320101, output 20 lines above 4320101 (or one line of white space), and then output all lines below 4320101 up to
</eventUpdate>
Doing research I am unsure of how to get awk, nawk or sed to work in my favour to do this.
回答1:
This might work for you (GNU sed):
sed ':a;s/\n/&/20;tb;$!{N;ba};:b;/4320102/!D;:c;n;/<\/eventUpdate>/!bc' file
EDIT:
:a;s/\n/&/20;tb;$!{N;ba};
this keeps a window of 20 lines in the pattern space (PS):b;/4320102!D;
this moves the above window through the file until the pattern4320102
is found.:c;n;/<\/eventUpdate>/!bc
the 20 line window is printed and any subsequent line until the pattern<\/eventUpdate>
is found.
回答2:
Here is an ugly awk
solution :)
awk 'BEGIN{last=1}
{if((length($0)==0) || (Random ~ $0))last=NR}
/4320101/{flag=1;
if((NR-last)>20) last=NR-20;
cmd="sed -n \""last+1","NR-1"p \" input.txt";
system(cmd);
}
flag==1{print}
/eventUpdate/{flag=0}' <filename>
So basically what it does is keeps track of the last blank line or line containing Random
pattern in the last
variable. Now if the 4320101
has been found, it prints from that line -20 or last
whichever is nearer through a system sed
command. And sets the flag
. The flag
causes the next onwards lines to be printed till eventUpdate
has been found. Have not tested though, but should be working
回答3:
Look-behind in sed/awk is always tricky.. This self contained awk
script basically keeps the last 20 lines stored, when it gets to 4320101
it prints these stored lines, up to the point where the blank or undesired line is found, then it stops. At that point it switches into printall
mode and prints all lines until the eventUpdate
is encountered, then it prints that and quits.
awk '
function store( line ) {
for( i=0; i <= 20; i++ ) {
last[i-1] = last[i]; i++;
};
last[20]=line;
};
function purge() {
for( i=20; i >= 0; i-- ) {
if( length(last[i])==0 || last[i] ~ "Random" ) {
stop=i;
break
};
};
for( i=(stop+1); i <= 20; i++ ) {
print last[i];
};
};
{
store($0);
if( /4320101/ ) {
purge();
printall=1;
next;
};
if( printall == 1) {
print;
if( /eventUpdate/ ) {
exit 0;
};
};
}' test
回答4:
Let's see if I understand your requirements:
You have two strings, which I'll call KEY
and LIMIT
. And you want to print:
At most 20 lines before a line containing
KEY
, but stopping if there is a blank line.All the lines between a line containing
KEY
and the following line containingLIMIT
. (This ignores your requirement that there be no more than 100 such lines; if that's important, it's relatively straightforward to add.)
The easiest way to accomplish (1)
is to keep a circular buffer of 20 lines, and print it out when you hit key
. (2)
is trivial in either sed or awk, because you can use the two-address form to print the range.
So let's do it in awk:
#file: extract.awk
# Initialize the circular buffer
BEGIN { count = 0; }
# When we hit an empty line, clear the circular buffer
length() == 0 { count = 0; next; }
# When we hit `key`, print and clear the circular buffer
index($0, KEY) { for (i = count < 20 ? 0 : count - 20; i < count; ++i)
print buf[i % 20];
hi = 0;
}
# While we're between key and limit, print the line
index($0, KEY),index($0, LIMIT)
{ print; next; }
# Otherwise, save the line
{ buf[count++ % 20] = $0; }
In order to get that to work, we need to set the values of KEY
and LIMIT
. We can do that on the command line:
awk -v "KEY=4320101" -v "LIMIT=</eventUpdate>" -f extract.awk $FILENAME
Notes:
I used
index($0, foo)
instead of the more usual/foo/
, because it avoids having to escape regex special characters, and there is nowhere in the requirements that regexen are even desired.index(haystack, needle)
returns the index ofneedle
inhaystack
, with indices starting at1
, or0
ifneedle
is not found. Used as a true/false value, it is true ofneedle
is found.next
causes processing of the current line to end. It can be quite handy, as this little program shows.
回答5:
You can try something like this -
awk '{
a[NR] = $0
}
/<\/eventUpdate>/ {
x = NR
}
END {
for (i in a) {
if (a[i]~/4320101/) {
for (j=i-20;j<=x;j++) {
print a[j]
}
}
}
}' file
回答6:
The simplest way is to use 2 passes of the file - the first to identify the line numbers in the range within which your target regexp is found, the second to print the lines in the selected range, e.g.:
awk '
NR==FNR {
if ($0 ~ /\<4320101\>/ {
for (i=NR-20;i<NR;i++)
range[i]
inRange = 1
}
if (inRange) {
range[NR]
}
if ($0 ~ /<\/eventUpdate>/) {
inRange = 0
}
next
}
FNR in range
' file file
来源:https://stackoverflow.com/questions/16694469/search-e-g-awk-grep-sed-for-string-then-look-for-x-lines-above-and-another