Replacing double backslashes with single backslash

半腔热情 提交于 2019-12-18 05:04:43

问题


I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.

I tried with,

myString.replace("\\\\", "\\");

but could not achieve what i wanted.

This is my code,

String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();

// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);

and content of the file is

\u003c


回答1:


Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:

String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
    String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
    System.out.printf("Unicode String is: [%s]%n", a);
}

OUTPUT:

Unicode String is: [<]

Here is online demo of the above code




回答2:


You can use String#replaceAll:

String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);

It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:

String Literal        String                      Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\                     Escape the next character   Would depend on next char
\\                    \                           Escape the next character
\\\\                  \\                          Literal \
\\\\\\\\              \\\\                        Literal \\

In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.




回答3:


Another option, capture one of the two slashes and replace both slashes with the captured group:

public static void main(String args[])
{
    String str = "C:\\\\";
    str= str.replaceAll("(\\\\)\\\\", "$1");

    System.out.println(str);
} 



回答4:


Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \, with a different simple string, containing \" (which is not entirely the OP problem, but part of it):

Most of the answers in this thread mention replaceAll, which is a wrong tool for the job here. The easier tool is replace, but confusingly, the OP states that replace("\\\\", "\\") doesn't work for him, that's perhaps why all answers focus on replaceAll.

Important note for people with JavaScript background: Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!

Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.

On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:

Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.

(this is because \ and $ can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).

In other words, both first and 2nd params of replace and replaceAll behave differently. For replace you need to double the \ in both params (standard escaping of a backslash in a string literal), whereas in replaceAll, you need to quadruple it! (standard string escape + function-specific escape)

To sum up, for simple replacements, one should stick to replace("\\\\", "\\") (it needs only one escaping, not two).

https://ideone.com/ANeMpw

System.out.println("a\\\\b\\\\c");                                 // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\"));  // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\"));  // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\"));           // "a\b\c"

https://www.ideone.com/Fj4RCO

String str = "\\\\u003c";
System.out.println(str);                                // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\"));          // "\u003c"



回答5:


This is for replacing the double back slash to single back slash

public static void main(String args[])
{
      String str = "\\u003c";
      str= str.replaceAll("\\\\", "\\\\");

      System.out.println(str);
}



回答6:


"\\u003c" does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String "\\u003c" is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.




回答7:


Try using,

myString.replaceAll("[\\\\]{2}", "\\\\");



来源:https://stackoverflow.com/questions/11012253/replacing-double-backslashes-with-single-backslash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!