Now I\'m using VC++ 2010, but the syntax_option_type
of VC++ 2010 only contains the following options:
static const flag_type icase = regex_cons
For the particular regex you want to convert, the equivalent in ECMA regex is:
/^(\d{3,4})[- ]?(\d{4})[- ]?(\d{4})[- ]?(\d{4})$/
In this case, \A
(in Perl regex) has the same meaning as ^
(in ECMA regex) (matching beginning of the string) and \Z
(in Perl regex) has the same meaning as $
(in ECMA regex) (matching the end of the string). Note that meaning of ^
and $
in ECMA regex will change to matching the beginning and the end of the line if you enable multiline mode.
ECMA regex is a subset of Perl regex, so if the regex uses exclusive features in Perl regex, it is likely that it is not convertible to ECMA regex. Even for same syntax, the syntax may mean slightly different thing between 2 dialects of regex, so it is always wise to check the documentation and compare the usage.
I'm only going to say what is similar between ECMA regex and Perl regex. What is not similar, but convertible, I will mention it to the most of my ability.
ECMA regex is lacking on features to work with Unicode, which compels you to look up the code points and specify them as character classes.
Going according to the documentation for Perl regular expression:
i
, g
, m
are in ECMA Standard, and they behave the same as in Perl.s
dot-all modifier can be simulated in ECMA regex by using 2 complementing character classes e.g. [\S\s]
, [\D\d]
x
and p
flag.\
with non-meta character that doesn't resolve to any special meaning, but it should be fine if you don't escape where you don't need to. .
in ECMA excludes a few more characters. The rest behaves the same in ECMA regex (even effect of m
flag on ^
and $
).\a
and \e
in ECMA regex. \t
, \n
, \r
, \f
are the same.\cX
- there are differences.\xhh
is common in ECMA regex and Perl regex (specifying 2 hexadecimal digits is the safest - otherwise, you will have to look up the documentation to see how the language will deal with the case where there are less than 2 hexadecimal digits). \uhhhh
is ECMA regex exclusive feature to specify Unicode character. Perl has other exclusive ways to specify character such as \x{}
, \N{}
, \o{}
, \000
.\l
, \u
, \L
, \U
are exclusive to Perl regex.\Q
and \E
can be simulated by escaping the quoted section by hand.\w
, \W
, \s
, \S
, \d
, \D
are equivalent in ECMA regex and Perl regex, if assuming US-ASCII. If Unicode is involved, things will be a bloody mess.\w
, \s
, \d
or specify yourself in character class.[]
and already mentioned escaped sequences) are unsupported in ECMA regex.\b
and \B
are equivalent in both languages, with regards to how they are defined based on \w
.()
and back reference are the same. $n
, which is used in the replacement string to back reference to matched text, is the same. The rest in the section are Perl exclusive features.s
flag is one that can always be converted to equivalent expression in ECMA regex).(?:pattern)
(non-capturing group), (?=pattern)
(positive look ahead), (?!pattern)
(negative look ahead) are common between Perl and ECMA.(?#text)
can be ignored.Conclusion:
If the regex utilize the full power of Perl regex, or at the level which Boost library supports (e.g. recursive regex), it is not possible to convert the regex to ECMA regex. Fortunately, ECMA regex covers the most commonly used features, so it's likely that the regex are convertible.
Reference:
ECMA RegExp Reference on MDN