I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk.
But in my case, I have a regular expression that I want to run
If your version of grep
supports it you could use the -o
option to print only the portion of any line that matches your regexp.
If not then here's the best sed
I could come up with:
sed -e '/[0-9]/!d' -e 's/^[^0-9]*//' -e 's/[^0-9]*$//'
... which deletes/skips with no digits and, for the remaining lines, removes all leading and trailing non-digit characters. (I'm only guessing that your intention is to extract the number from each line that contains one).
The problem with something like:
sed -e 's/.*\([0-9]*\).*/&/'
.... or
sed -e 's/.*\([0-9]*\).*/\1/'
... is that sed
only supports "greedy" match ... so the first .* will match the rest of the line. Unless we can use a negated character class to achieve a non-greedy match ... or a version of sed
with Perl-compatible or other extensions to its regexes, we can't extract a precise pattern match from with the pattern space (a line).
My sed
(Mac OS X) didn't work with +
. I tried *
instead and I added p
tag for printing match:
sed -n 's/^.*abc\([0-9]*\)xyz.*$/\1/p' example.txt
For matching at least one numeric character without +
, I would use:
sed -n 's/^.*abc\([0-9][0-9]*\)xyz.*$/\1/p' example.txt
perl is the cleanest syntax, but if you don't have perl (not always there, I understand), then the only way to use gawk and components of a regex is to use the gensub feature.
gawk '/abc[0-9]+xyz/ { print gensub(/.*([0-9]+).*/,"\\1","g"); }' < file
output of the sample input file will be
12345
Note: gensub replaces the entire regex (between the //), so you need to put the .* before and after the ([0-9]+) to get rid of text before and after the number in the substitution.
You can use awk
with match() to access the captured group:
$ awk 'match($0, /abc([0-9]+)xyz/, matches) {print matches[1]}' file
12345
This tries to match the pattern abc[0-9]+xyz
. If it does so, it stores its slices in the array matches
, whose first item is the block [0-9]+
. Since match()
returns the character position, or index, of where that substring begins (1, if it starts at the beginning of string), it triggers the print
action.
With grep
you can use a look-behind and look-ahead:
$ grep -oP '(?<=abc)[0-9]+(?=xyz)' file
12345
$ grep -oP 'abc\K[0-9]+(?=xyz)' file
12345
This checks the pattern [0-9]+
when it occurs within abc
and xyz
and just prints the digits.
You can use sed to do this
sed -rn 's/.*abc([0-9]+)xyz.*/\1/gp'
-n
don't print the resulting line-r
this makes it so you don't have the escape the capture group parens()
.\1
the capture group match/g
global match/p
print the resultI wrote a tool for myself that makes this easier
rip 'abc(\d+)xyz' '$1'