Text cleaning and replacement: delete \n from a text in Java

后端 未结 9 1930
我在风中等你
我在风中等你 2020-12-08 19:57

I\'m cleaning an incoming text in my Java code. The text includes a lot of \"\\n\", but not as in a new line, but literally \"\\n\". I was using replaceAll() from the String

相关标签:
9条回答
  • 2020-12-08 20:36

    I used this solution to solve that problem:

    String replacement = str.replaceAll("[\n\r]", "");
    
    0 讨论(0)
  • 2020-12-08 20:37

    I believe replaceAll() is an expensive operation. The below solution will probably perform better:

    String temp = "Hi \n Wssup??";          
    System.out.println(temp);
    
    StringBuilder result = new StringBuilder();
    
    StringTokenizer t = new StringTokenizer(temp, "\n");
    
    while (t.hasMoreTokens()) {
        result.append(t.nextToken().trim()).append("");
    }
    String result_of_temp = result.toString();
    
    System.out.println(result_of_temp);
    
    0 讨论(0)
  • 2020-12-08 20:38

    Normally \n works fine. Otherwise you can opt for multiple replaceAll statements. first apply one replaceAll on the text, and then reapply replaceAll again on the text. Should do what you are looking for.

    0 讨论(0)
  • 2020-12-08 20:39

    Hooknc is right. I'd just like to post a little explanation:

    "\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).

    "\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.

    "\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.

    Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.

    0 讨论(0)
  • 2020-12-08 20:47

    Try this. Hope it helps.

    raw = raw.replaceAll("\t", "");
    raw = raw.replaceAll("\n", "");
    raw = raw.replaceAll("\r", "");
    
    0 讨论(0)
  • 2020-12-08 20:50

    I think you need to add a couple more slashies...

    String string;
    string = string.replaceAll("\\\\n", "");
    

    Explanation: The number of slashies has to do with the fact that "\n" by itself is a controlled character in Java.

    So to get the real characters of "\n" somewhere we need to use "\n". Which if printed out with give us: "\"

    You're looking to replace all "\n" in your file. But you're not looking to replace the control "\n". So you tried "\n" which will be converted into the characters "\n". Great, but maybe not so much. My guess is that the replaceAll method will actually create a Regular Expression now using the "\n" characters which will be misread as the control character "\n".

    Whew, almost done.

    Using replaceAll("\\n", "") will first convert "\\n" -> "\n" which will be used by the Regular Expression. The "\n" will then be used in the Regular Expression and actually represents your text of "\n". Which is what you're looking to replace.

    0 讨论(0)
提交回复
热议问题