Are Java and C# regular expressions compatible?

前端 未结 6 2058
醉酒成梦
醉酒成梦 2020-11-29 21:55

Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular ex

相关标签:
6条回答
  • 2020-11-29 22:00

    c# regex has its own convention for named groups (?<name>). I don't know of any other differences.

    0 讨论(0)
  • 2020-11-29 22:02

    From my experience:

    Java 7 regular expressions as compared to .NET 2.0 regular expressions:

    • Underscore symbol in group names is not supported

    • Groups with the same name (in the same regular expression) are not supported (although it may be really useful in expressions using "or"!)

    • Groups having captured nothing have value of null and not of an empty string

    • Group with index 0 also contains the whole match (same as in .NET) BUT is not included in groupCount()

    • Group back reference in replace expressions is also denoted with dollar sign (e.g. $1), but if the same expression contains dollar sign as the end-of-line marker - then the back reference dollar should be escaped (\$), otherwise in Java we get the "illegal group reference" error

    • End-of-line symbol ($) behaves greedy. Consider, for example, the following expression (Java-string is given): "bla(bla(?:$|\r\n))+)?$". Here the last line of text will be NOT captured! To capture it, we must substitute "$" with "\z".

    • There is no "Explicit Capture" mode.

    • Empty string doesn't satisfy the ^.{0}$ pattern.

    • Symbol "-" must be escaped when used inside square brackets. That is, pattern "[a-z+-]+" doesn't match string "f+g-h" in Java, but it does in .NET. To match in Java, pattern should look as (Java-string is given): "[a-z+\-]+".

    NOTE: "(Java-string is given)" - just to explain double escapes in the expression.

    0 讨论(0)
  • 2020-11-29 22:03

    Check out: http://www.regular-expressions.info/refflavors.html Plenty of regex info on that site, and there's a nice chart that details the differences between java & .net.

    0 讨论(0)
  • 2020-11-29 22:06

    There are quite (a lot of) differences.

    Character Class

    1. Character classes subtraction [abc-[cde]]
      • .NET YES (2.0)
      • Java: Emulated via character class intersection and negation: [abc&&[^cde]])
    2. Character classes intersection [abc&&[cde]]
      • .NET: Emulated via character class subtraction and negation: [abc-[^cde]])
      • Java YES
    3. \p{Alpha} POSIX character class
      • .NET NO
      • Java YES (US-ASCII)
    4. Under (?x) mode COMMENTS/IgnorePatternWhitespace, space (U+0020) in character class is significant.
      • .NET YES
      • Java NO
    5. Unicode Category (L, M, N, P, S, Z, C)
      • .NET YES: \p{L} form only
      • Java YES:
        • From Java 5: \pL, \p{L}, \p{IsL}
        • From Java 7: \p{general_category=L}, \p{gc=L}
    6. Unicode Category (Lu, Ll, Lt, ...)
      • .NET YES: \p{Lu} form only
      • Java YES:
        • From Java 5: \p{Lu}, \p{IsLu}
        • From Java 7: \p{general_category=Lu}, \p{gc=Lu}
    7. Unicode Block
      • .NET YES: \p{IsBasicLatin} only. (Supported Named Blocks)
      • Java YES: (name of the block is free-casing)
        • From Java 5: \p{InBasicLatin}
        • From Java 7: \p{block=BasicLatin}, \p{blk=BasicLatin}
    8. Spaces, and underscores allowed in all long block names (e.g. BasicLatin can be written as Basic_Latin or Basic Latin)
      • .NET NO
      • Java YES (Java 5)

    Quantifier

    1. ?+, *+, ++ and {m,n}+ (possessive quantifiers)
      • .NET NO
      • Java YES

    Quotation

    1. \Q...\E escapes a string of metacharacters
      • .NET NO
      • Java YES
    2. \Q...\E escapes a string of character class metacharacters (in character sets)
      • .NET NO
      • Java YES

    Matching construct

    1. Conditional matching (?(?=regex)then|else), (?(regex)then|else), (?(1)then|else) or (?(group)then|else)
      • .NET YES
      • Java NO
    2. Named capturing group and named backreference
      • .NET YES:
        • Capturing group: (?<name>regex) or (?'name'regex)
        • Backreference: \k<name> or \k'name'
      • Java YES (Java 7):
        • Capturing group: (?<name>regex)
        • Backreference: \k<name>
    3. Multiple capturing groups can have the same name
      • .NET YES
      • Java NO (Java 7)
    4. Balancing group definition (?<name1-name2>regex) or (?'name1-name2'subexpression)
      • .NET YES
      • Java NO

    Assertions

    1. (?<=text) (positive lookbehind)
      • .NET Variable-width
      • Java Obvious width
    2. (?<!text) (negative lookbehind)
      • .NET Variable-width
      • Java Obvious width

    Mode Options/Flags

    1. ExplicitCapture option (?n)
      • .NET YES
      • Java NO

    Miscellaneous

    1. (?#comment) inline comments
      • .NET YES
      • Java NO

    References

    • regular-expressions.info - Comparison of Different Regex Flavors
    • MSDN Library Reference - .NET Framework 4.5 - Regular Expression Language
    • Pattern (Java Platform SE 7)
    0 讨论(0)
  • 2020-11-29 22:12

    .NET Regex supports counting, so you can match nested parentheses which is something you normally cannot do with a regular expression. According to Mastering Regular Expressions that's one of the few implementations to do that, so that could be a difference.

    0 讨论(0)
  • 2020-11-29 22:25

    Java uses standard Perl type regex as well as POSIX regex. Looking at the C# documentation on regexs, it looks like that Java has all of C# regex syntax, but not the other way around.

    Compare them yourself: Java: C#:

    EDIT: Currently, no other regex flavor supports Microsoft's version of named capture.

    0 讨论(0)
提交回复
热议问题