I\'m reading Jan Goyvaerts\' \"Regular Expressions: The Complete Tutorial and Reference\" to touch up on my Regex.
In the second chapter, Jan has a section on \"spec
The regex flavors in my book do not require }
and ]
to be escaped (except for ]
in character classes in JavaScript). So I don't because I like to have as few backslashes in my regexes as possible. You can escape them if you find your regexes clearer that way.
First of all, anyone learning about regular expressions needs to understand the importance of the qualifier "In the regex flavors discussed in this tutorial..." You cannot discuss regular expressions without stating which regex flavor(s) you're talking about.
What I wrote is true for the flavors my book (2006 edition) discusses. In those flavors, )
is treated as a token that closes a group. It is a syntax error if used without a corresponding (
. So )
has a special meaning when used all on its own.
}
does not have a special meaning when used all on its own. You never need to escape it with these flavors. If you wanted to match something like {7}
or {7,42}
literally, you only need to escape the opening {
. If you want to argue that }
is special because it sometimes has a special meaning, then you would have to say the same about ,
which becomes special in the same situation.
]
does not have a special meaning outside character classes in these regex flavors. You never need to escape it outside character classes. The paragraph you quoted does not talk about special characters inside character classes. That's a totally different list (\
, ]
, ^
, and -
) discussed in a later chapter.
Now as to why: most regular expressions have plenty of backslashes already. My preferred style is to escape as few characters as needed. So I never escape }
. I escape ]
in character classes when using JavaScript because that's the only way. But with other flavors I place ]
at the start of the character class or after the negating caret so I don't need to escape it. My teaching materials teach this style. When my products RegexBuddy or RegexMagic convert or generate regular expressions, they also use as few backslashes as needed.
I often see people new to regular expressions needlessly escape characters like "
, '
, or /
because they need to be escaped when the regular expression is quoted as a source code literal in certain programming languages. But the regular expression itself does not require these to be escaped.
I even see people escape characters like <
or >
. This is a bad habit because in some regex flavors \<
and \>
are word boundaries. This includes recent versions of PCRE (but not the PCRE that was current in 2006).
But, if you find it confusing to see unescaped }
and ]
used as literals, you are free to escape them in your regexes. Except for <
and >
, all the flavors discussed in my book allow you to escape any punctuation character to match that character literally, even if the character on its own would be a literal already.
So somebody saying that }
and ]
are special characters in regular expressions is not wrong if "special characters" means "characters that have a special meaning either on their own or when used in combination with other characters". But that list would also include ,
(quantifier), :
(non-capturing group), -
(mode modifier), !
(negative lookaround), <
(lookbehind), and -
(character class range).
But if "special characters" means "characters that have a special meaning on their own", then }
and ]
are not included in the list for the flavors my book covers.