Every single flavor of regex I have ever used has always had the \".\" character match everything but a new line (\\r or \\n)... unless, of course, you enable the single-lin
I think the point here is that the dot is supposed to match anything that's not a line separator, and \r
is a line separator. Perl gets away with recognizing only \n
because it is (as others have pointed out) rooted in the Unix world, and because it's the inspiration for the regex flavors found in most other languages.
(But I note that in Perl 6 regexes (or Rules, to use their formal name), /\n/
matches anything that's recognized by Unicode as a line separator, including both characters of a \r\n
sequence.)
.NET was born in the Unicode era; it should recognize all Unicode-endorsed line separators, including \r
(older Mac style) and \r\n
(which is used by some network protocols as well as Windows). Consider this example in Java:
String s = "fee\nfie\r\nfoe\rfum";
Pattern p = Pattern.compile("(?m)^.+$");
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println(m.group().length());
}
result:
3
3
3
3
.
, ^
and $
all work correctly with all three line separators. Now try it in C#:
string s = "fee\nfie\r\nfoe\rfum";
Regex r = new Regex(@"(?m)^.+$");
foreach (Match m in r.Matches(s))
{
Console.WriteLine(m.Value.Length);
}
result:
3
4
7
Does that look right to anyone else? Here we have the regex flavor built into Microsoft's .NET framework, and it doesn't even handle the Windows-standard line separator correctly. And it completely disregards a lone \r
, as it does the other Unicode line separators. .NET came out several years after Java, and its Unicode support is at least as good, so why did they choose to stick on this point?