Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular ex
c# regex has its own convention for named groups (?<name>)
. I don't know of any other differences.
From my experience:
Java 7 regular expressions as compared to .NET 2.0 regular expressions:
Underscore symbol in group names is not supported
Groups with the same name (in the same regular expression) are not supported (although it may be really useful in expressions using "or"!)
Groups having captured nothing have value of null
and not of an empty string
Group with index 0 also contains the whole match (same as in .NET) BUT is not included in groupCount()
Group back reference in replace expressions is also denoted with dollar sign (e.g. $1), but if the same expression contains dollar sign as the end-of-line marker - then the back reference dollar should be escaped (\$), otherwise in Java we get the "illegal group reference" error
End-of-line symbol ($) behaves greedy. Consider, for example, the following expression (Java-string is given): "bla(bla(?:$|\r\n))+)?$". Here the last line of text will be NOT captured! To capture it, we must substitute "$" with "\z".
There is no "Explicit Capture" mode.
Empty string doesn't satisfy the ^.{0}$ pattern.
Symbol "-" must be escaped when used inside square brackets. That is, pattern "[a-z+-]+" doesn't match string "f+g-h" in Java, but it does in .NET. To match in Java, pattern should look as (Java-string is given): "[a-z+\-]+".
NOTE: "(Java-string is given)" - just to explain double escapes in the expression.
Check out: http://www.regular-expressions.info/refflavors.html Plenty of regex info on that site, and there's a nice chart that details the differences between java & .net.
There are quite (a lot of) differences.
[abc-[cde]]
[abc&&[^cde]]
)[abc&&[cde]]
[abc-[^cde]]
)\p{Alpha}
POSIX character class
(?x)
mode COMMENTS/IgnorePatternWhitespace, space (U+0020) in character class is significant.
\p{L}
form only\pL
, \p{L}
, \p{IsL}
\p{general_category=L}
, \p{gc=L}
\p{Lu}
form only\p{Lu}
, \p{IsLu}
\p{general_category=Lu}
, \p{gc=Lu}
\p{IsBasicLatin}
only. (Supported Named Blocks)\p{InBasicLatin}
\p{block=BasicLatin}
, \p{blk=BasicLatin}
BasicLatin
can be written as Basic_Latin
or Basic Latin
)
?+
, *+
, ++
and {m,n}+
(possessive quantifiers)
\Q...\E
escapes a string of metacharacters
\Q...\E
escapes a string of character class metacharacters (in character sets)
(?(?=regex)then|else)
, (?(regex)then|else)
, (?(1)then|else)
or (?(group)then|else)
(?<name>regex)
or (?'name'regex)
\k<name>
or \k'name'
(?<name>regex)
\k<name>
(?<name1-name2>regex)
or (?'name1-name2'subexpression)
(?<=text)
(positive lookbehind)
(?<!text)
(negative lookbehind)
(?n)
(?#comment)
inline comments
.NET Regex supports counting, so you can match nested parentheses which is something you normally cannot do with a regular expression. According to Mastering Regular Expressions that's one of the few implementations to do that, so that could be a difference.
Java uses standard Perl type regex as well as POSIX regex. Looking at the C# documentation on regexs, it looks like that Java has all of C# regex syntax, but not the other way around.
Compare them yourself: Java: C#:
EDIT: Currently, no other regex flavor supports Microsoft's version of named capture.