Translate Perl regular expressions to .NET

前端 未结 3 502
陌清茗
陌清茗 2020-11-29 02:47

I have some useful regular expressions in Perl. Is there a simple way to translate them to .NET\'s dialect of regular expressions?

If not, is there a concise referen

相关标签:
3条回答
  • 2020-11-29 03:25

    There is a big comparison table in http://www.regular-expressions.info/refflavors.html.


    Most of the basic elements are the same, the differences are:

    Minor differences:

    • Unicode escape sequences. In .NET it is \u200A, in Perl it is \x{200A}.
    • \v in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. Of course there is \V in Perl because of this.
    • The conditional expression for named reference in .NET is (?(name)yes|no), but (?(<name>)yes|no) in Perl.

    Some elements are Perl-only:

    • Possessive quantifiers (x?+, x*+, x++ etc). Use non-backtracking subexpression ((?>…)) instead.
    • Named unicode escape sequence \N{LATIN SMALL LETTER X}, \N{U+200A}.
    • Case folding and escaping
      • \l (lower case next char), \u (upper case next char).
      • \L (lower case), \U (upper case), \Q (quote meta characters) until \E.
    • Shorthand notation for Unicode property \pL and \PL. You have to include the braces in .NET e.g. \p{L}.
    • Odd things like \X, \C.
    • Special character classes like \v, \V, \h, \H, \N, \R
    • Backreference to a specific or previous group \g1, \g{-1}. You can only use absolute group index in .NET.
    • Named backreference \g{name}. Use \k<name> instead.
    • POSIX character class [[:alpha:]].
    • Branch-reset pattern (?|…)
    • \K. Use look-behind ((?<=…)) instead.
    • Code evaluation assertion (?{…}), post-poned subexpression (??{…}).
    • Subexpression reference (recursive pattern) (?0), (?R), (?1), (?-1), (?+1), (?&name).
    • Some conditional expression's predicate are Perl-specific:
      • code (?{…})
      • recursive (R), (R1), (R&name)
      • define (DEFINE).
    • Special Backtracking Control Verbs (*VERB:ARG)
    • Python syntax
      • (?P<name>…). Use (?<name>…) instead.
      • (?P=name). Use \k<name> instead.
      • (?P>name). No equivalent in .NET.

    Some elements are .NET only:

    • Variable length look-behind. In Perl, for positive look-behind, use \K instead.
    • Arbitrary regular expression in conditional expression (?(pattern)yes|no).
    • Character class subtraction (undocumented?) [a-z-[d-w]]
    • Balancing Group (?<-name>…). This could be simulated with code evaluation assertion (?{…}) followed by a (?&name).

    References:

    • .NET Framework 4: Regular Expression Language Elements
    • perlre
    0 讨论(0)
  • 2020-11-29 03:36

    They were designed to be compatible with Perl 5 regexes. As such, Perl 5 regexes should just work in .NET.

    You can translate some RegexOptions as follows:

    [Flags]
    public enum RegexOptions
    {
      Compiled = 8,
      CultureInvariant = 0x200,
      ECMAScript = 0x100,
      ExplicitCapture = 4,
      IgnoreCase = 1,                 // i in Perl
      IgnorePatternWhitespace = 0x20, // x in Perl
      Multiline = 2,                  // m in Perl
      None = 0,
      RightToLeft = 0x40,
      Singleline = 0x10               // s in Perl
    }
    

    Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#:

    string badOnTheEyesRx    = "\\d{4}/\\d{2}/\\d{2}";
    string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";
    
    0 讨论(0)
  • 2020-11-29 03:48

    It really depends on the complexity of the regular expression - many ones will work the same out of the box.

    Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do.

    I don't know of any tool that automatically translates between RegEx dialects.

    0 讨论(0)
提交回复
热议问题