问题
in cygwin, this does not return a match:
$ echo "aaab" | grep '^[ab]+$'
But this does return a match:
$ echo "aaab" | grep '^[ab][ab]*$'
aaab
Are the two expressions not identical? Is there any way to express "one or more characters of the character class" without typing the character class twice (like in the seconds example)?
According to this link the two expressions should be the same, but perhaps Regular-Expressions.info does not cover bash in cygwin.
回答1:
grep
has multiple "modes" of matching, and by default only uses a basic set, which does not recognize a number of metacharacters unless they're escaped. You can put grep into extended or perl modes to let +
be evaluated.
From man grep
:
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly experimental and grep -P may warn of unimplemented features.
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.
GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification. For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax
error in the regular expression. POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.
Alternately, you can use egrep
instead of grep -E
.
回答2:
In basic regular expressions the metacharacters
?
,+
,{
,|
,(
, and)
lose their special meaning; instead use the backslashed versions \?,\+
,\{
,\|
,\(
, and\)
.
So use the backslashed version:
$ echo aaab | grep '^[ab]\+$'
aaab
Or activate extended syntax:
$ echo aaab | egrep '^[ab]+$'
aaab
回答3:
Masking by backslash, or egrep as extended grep, alias grep -e
:
echo "aaab" | egrep '^[ab]+$'
aaab
echo "aaab" | grep '^[ab]\+$'
aaab
来源:https://stackoverflow.com/questions/5650761/how-do-you-use-a-plus-symbol-with-a-character-class-as-part-of-a-regular-express