When matching certain characters (such as line feed), you can use the regex \"\\\\n\" or indeed just \"\\n\". For example, the following splits a string into an array of lines:<
Yes there are different. The Java Compiler has different behavior for Unicode Escapes
in the Java Book The Java Language
Specification section 3.3;
The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non- ASCII characters in the source text to Unicode escapes containing a single u each.
So how this affect the /
n vs //n
in the Java Doc:
It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler.
An a example of the same doc:
The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\b" matches a word boundary. The string literal "(hello)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\(hello\)" must be used.
There is no difference in the current scenario. The usual string escape sequences are formed with the help of a single backslash and then a valid escape char ("\n"
, "\r"
, etc.) and regex escape sequences are formed with the help of a literal backslash (that is, a double backslash in the Java string literal) and a valid regex escape char ("\\n"
, "\\d"
, etc.).
"\n"
(an escape sequence) is a literal LF (newline) and "\\n"
is a regex escape sequence that matches an LF symbol.
"\r"
(an escape sequence) is a literal CR (carriage return) and "\\r"
is a regex escape sequence that matches an CR symbol.
"\t"
(an escape sequence) is a literal tab symbol and "\\t"
is a regex escape sequence that matches a tab symbol.
See the list in the Java regex docs for the supported list of regex escapes.
However, if you use a Pattern.COMMENTS flag (used to introduce comments and format a pattern nicely, making the regex engine ignore all unescaped whitespace in the pattern), you will need to either use "\\n"
or "\\\n"
to define a newline (LF) in the Java string literal and "\\r"
or "\\\r"
to define a carriage return (CR).
See a Java test:
String s = "\n";
System.out.println(s.replaceAll("\n", "LF")); // => LF
System.out.println(s.replaceAll("\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\n", "<LF>"));
// => <LF>
//<LF>
Why is the last one producing <LF>
+newline+<LF>
? Because "(?x)\n"
is equal to ""
, an empty pattern, and it matches an empty space before the newline and after it.