I have some useful regular expressions in Perl. Is there a simple way to translate them to .NET\'s dialect of regular expressions?
If not, is there a concise referen
There is a big comparison table in http://www.regular-expressions.info/refflavors.html.
Most of the basic elements are the same, the differences are:
Minor differences:
\u200A
, in Perl it is \x{200A}
.\v
in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. Of course there is \V
in Perl because of this.(?(name)yes|no)
, but (?(<name>)yes|no)
in Perl. Some elements are Perl-only:
x?+
, x*+
, x++
etc). Use non-backtracking subexpression ((?>…)
) instead.\N{LATIN SMALL LETTER X}
, \N{U+200A}
.\l
(lower case next char), \u
(upper case next char).\L
(lower case), \U
(upper case), \Q
(quote meta characters) until \E
.\pL
and \PL
. You have to include the braces in .NET e.g. \p{L}
.\X
, \C
.\v
, \V
, \h
, \H
, \N
, \R
\g1
, \g{-1}
. You can only use absolute group index in .NET.\g{name}
. Use \k<name>
instead.[[:alpha:]]
.(?|…)
\K
. Use look-behind ((?<=…)
) instead.(?{…})
, post-poned subexpression (??{…})
.(?0)
, (?R)
, (?1)
, (?-1)
, (?+1)
, (?&name)
. (?{…})
(R)
, (R1)
, (R&name)
(DEFINE)
. (*VERB:ARG)
(?P<name>…)
. Use (?<name>…)
instead.(?P=name)
. Use \k<name>
instead.(?P>name)
. No equivalent in .NET.Some elements are .NET only:
\K
instead.(?(pattern)yes|no)
.[a-z-[d-w]]
(?<-name>…)
. This could be simulated with code evaluation assertion (?{…})
followed by a (?&name)
.References:
They were designed to be compatible with Perl 5 regexes. As such, Perl 5 regexes should just work in .NET.
You can translate some RegexOptions
as follows:
[Flags]
public enum RegexOptions
{
Compiled = 8,
CultureInvariant = 0x200,
ECMAScript = 0x100,
ExplicitCapture = 4,
IgnoreCase = 1, // i in Perl
IgnorePatternWhitespace = 0x20, // x in Perl
Multiline = 2, // m in Perl
None = 0,
RightToLeft = 0x40,
Singleline = 0x10 // s in Perl
}
Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#:
string badOnTheEyesRx = "\\d{4}/\\d{2}/\\d{2}";
string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";
It really depends on the complexity of the regular expression - many ones will work the same out of the box.
Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do.
I don't know of any tool that automatically translates between RegEx dialects.