I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk.
But in my case, I have a regular expression that I want to run
I use perl
to make this easier for myself. e.g.
perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/'
This runs Perl, the -n
option instructs Perl to read in one line at a time from STDIN and execute the code. The -e
option specifies the instruction to run.
The instruction runs a regexp on the line read, and if it matches prints out the contents of the first set of bracks ($1
).
You can do this will multiple file names on the end also. e.g.
perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/' example1.txt example2.txt
gawk '/.*abc([0-9]+)xyz.*/' file
If you want to select lines then strip out the bits you don't want:
egrep 'abc[0-9]+xyz' inputFile | sed -e 's/^.*abc//' -e 's/xyz.*$//'
It basically selects the lines you want with egrep
and then uses sed
to strip off the bits before and after the number.
You can see this in action here:
pax> echo 'a
b
c
abc12345xyz
a
b
c' | egrep 'abc[0-9]+xyz' | sed -e 's/^.*abc//' -e 's/xyz.*$//'
12345
pax>
Update: obviously if you actual situation is more complex, the REs will need to me modified. For example if you always had a single number buried within zero or more non-numerics at the start and end:
egrep '[^0-9]*[0-9]+[^0-9]*$' inputFile | sed -e 's/^[^0-9]*//' -e 's/[^0-9]*$//'
For awk. I would use the following script:
/.*abc([0-9]+)xyz.*/ {
print $0;
next;
}
{
/* default, do nothing */
}
you can do it with the shell
while read -r line
do
case "$line" in
*abc*[0-9]*xyz* )
t="${line##abc}"
echo "num is ${t%%xyz}";;
esac
done <"file"
The OP's case doesn't specify that there can be multiple matches on a single line, but for the Google traffic, I'll add an example for that too.
Since the OP's need is to extract a group from a pattern, using grep -o
will require 2 passes. But, I still find this the most intuitive way to get the job done.
$ cat > example.txt <<TXT
a
b
c
abc12345xyz
a
abc23451xyz asdf abc34512xyz
c
TXT
$ cat example.txt | grep -oE 'abc([0-9]+)xyz'
abc12345xyz
abc23451xyz
abc34512xyz
$ cat example.txt | grep -oE 'abc([0-9]+)xyz' | grep -oE '[0-9]+'
12345
23451
34512
Since processor time is basically free but human readability is priceless, I tend to refactor my code based on the question, "a year from now, what am I going to think this does?" In fact, for code that I intend to share publicly or with my team, I'll even open man grep
to figure out what the long options are and substitute those. Like so: grep --only-matching --extended-regexp