问题
I want to process this list: (Of course this is just an excerpt.)
1 S3 -> PC-8-Set
2 S3 -> PC-850-Set
3 S3 -> ANSI-Set
4 S3 -> 7-Bit-NRC
5 PC-8-Set -> S3
6 PC-850-Set -> S3
7 ANSI-Set -> S3
This is what I did:
awk -F '[[:blank:]]+' '{printf ("%s ", $2)}' list
This is what I got:
1 2 3 4 5 6 7
Now I thought the quantifier +
is equivalent to {1,}
, but when I changed the line to
awk -F '[[:blank:]]{1,}' '{printf ("%s ", $2)}' list
I got just blanks and the whole line was read to $1.
Can someone explain this behaviour please? I'm thankful for every answer!
回答1:
Try
awk --re-interval -F '[[:blank:]]{1,}' '{printf ("%s ", $2)}' list
--re-interval
Allow interval expressions (see Regexp Operators) in regexps. This is now gawk's default behavior. Nevertheless, this option remains both for backward compatibility, and for use in combination with the
--traditional
option.
回答2:
You are using a Gawk which is from before this November 2010 commit, found by git bisect
.
http://git.savannah.gnu.org/cgit/gawk.git/commit/?id=40b3741f63c19e38077d57f4ce4737916ec5073e
The change indeed hinges on the defaulting behavior with respect to intervals, which become on by default (as POSIX requires them to be).
It looks like the --re-interval
option becomes relegated only for use with --traditional
; i.e. that if --traditional
is enabled, then support for {m,n}
goes away, but can be selectively brought back with --re-interval
.
In your version, {m,n}
is unrecognized by default, with or without --traditional
. This is true up to this commit:
commit 00ef0423acd97cb964a2bae54c93a03a8ab50e5e
Author: Arnold D. Robbins <arnold@******>
Date: Fri Jul 16 14:55:10 2010 +0300
Move to 3.1.8.
and you're behind that still, on 3.1.5.
来源:https://stackoverflow.com/questions/20393781/quantifiers-in-a-regular-expression-used-with-awk-behave-unexpected