Java - removing strange characters from a String

前端 未结 11 647
轮回少年
轮回少年 2020-12-10 03:04

How do I remove strange and unwanted Unicode characters (such as a black diamond with question mark) from a String?

Updated:

Please tell me the Unicode chara

相关标签:
11条回答
  • 2020-12-10 03:18

    You can use a String.replaceAll("[my-list-of-strange-and-unwanted-chars]","")

    There is no Character.isStrangeAndUnWanted(), you have to define what you want.

    If you want to remove control characters you can do

    String str = "\u0000\u001f hi \n";
    str = str.replaceAll("[\u0000-\u001f]", "");
    

    prints hi (keeps the space).

    EDIT If you want to know the unicode of any 16-bit character you can do

    int num = string.charAt(n);
    System.out.println(num);
    
    0 讨论(0)
  • 2020-12-10 03:19

    same happened with me when i was converting clob to string using getAsciiStream.

    efficiently solved it using

    public String getstringfromclob(Clob cl)
    {
        StringWriter write = new StringWriter();
        try{
            Reader read  = cl.getCharacterStream();     
        int c = -1;
        while ((c = read.read()) != -1)
        {
            write.write(c);
        }
        write.flush();
        }catch(Exception ec)
        {
            ec.printStackTrace();
        }
        return write.toString();
    
    }
    
    0 讨论(0)
  • 2020-12-10 03:24

    You can't because strings are immutable.

    It is possible, though, to make a new string that has the unwanted characters removed. Look up String#replaceAll().

    0 讨论(0)
  • 2020-12-10 03:26

    I did the other way. I replace all letters that are not defined ((^)):

    str.replaceAll("[^a-zA-Z0-9:;.?! ]","")
    

    so for words like : "小米体验版 latin string 01234567890" we will get: "latin string 01234567890"

    0 讨论(0)
  • 2020-12-10 03:28

    To delete non-Latin symbols from the string I use the following code:

    String s = "小米体验版 latin string 01234567890";
    s = s.replaceAll("[^\\x00-\\x7F]", "");
    

    The output string will be: " latin string 01234567890"

    0 讨论(0)
提交回复
热议问题