问题
I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
回答1:
Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:
String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
System.out.printf("Unicode String is: [%s]%n", a);
}
OUTPUT:
Unicode String is: [<]
Here is online demo of the above code
回答2:
You can use String#replaceAll:
String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);
It looks weird because the first argument is a string defining a regular expression, and \
is a special character both in string literals and in regular expressions. To actually put a \
in our search string, we need to escape it (\\
) in the literal. But to actually put a \
in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\
in a string, we need write \\\\
in the string literal; and to get two literal \\
to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\
. That is:
String Literal String Meaning to Regex −−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−− \ Escape the next character Would depend on next char \\ \ Escape the next character \\\\ \\ Literal \ \\\\\\\\ \\\\ Literal \\
In the replacement parameter, even though it's not a regex, it still treats \
and $
specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.
回答3:
Another option, capture one of the two slashes and replace both slashes with the captured group:
public static void main(String args[])
{
String str = "C:\\\\";
str= str.replaceAll("(\\\\)\\\\", "$1");
System.out.println(str);
}
回答4:
Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \
, with a different simple string, containing \
" (which is not entirely the OP problem, but part of it):
Most of the answers in this thread mention replaceAll
, which is a wrong tool for the job here. The easier tool is replace
, but confusingly, the OP states that replace("\\\\", "\\")
doesn't work for him, that's perhaps why all answers focus on replaceAll
.
Important note for people with JavaScript background: Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.
(this is because \
and $
can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).
In other words, both first and 2nd params of replace
and replaceAll
behave differently. For replace
you need to double the \
in both params (standard escaping of a backslash in a string literal), whereas in replaceAll
, you need to quadruple it! (standard string escape + function-specific escape)
To sum up, for simple replacements, one should stick to replace("\\\\", "\\")
(it needs only one escaping, not two).
https://ideone.com/ANeMpw
System.out.println("a\\\\b\\\\c"); // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\")); // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\")); // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\")); // "a\b\c"
https://www.ideone.com/Fj4RCO
String str = "\\\\u003c";
System.out.println(str); // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\")); // "\u003c"
回答5:
This is for replacing the double back slash to single back slash
public static void main(String args[])
{
String str = "\\u003c";
str= str.replaceAll("\\\\", "\\\\");
System.out.println(str);
}
回答6:
"\\u003c"
does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\
', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String "\\u003c"
is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.
回答7:
Try using,
myString.replaceAll("[\\\\]{2}", "\\\\");
来源:https://stackoverflow.com/questions/11012253/replacing-double-backslashes-with-single-backslash