Remove non-ASCII characters from String in Java

前端 未结 5 1274
渐次进展
渐次进展 2021-01-04 00:06

I have a URI that contains non-ASCII characters like :

http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl�ttrigeSomerzischeruchtanb

相关标签:
5条回答
  • 2021-01-04 00:23

    No no no no no, this is not ASCII ... [^\x20-\x7E]

    This is real ascii: [^\x00-\x7F]

    Otherwise it will trim out newlines and other special characters that are part of ascii table!

    0 讨论(0)
  • 2021-01-04 00:23

    To remove the Non- ASCII characters from String, below code worked for me.

    String str="<UPC>616043287409ÂÂÂÂ</UPC>";
    
    str = str.replaceAll("[^\\p{ASCII}]", "");
    

    Output:

    <UPC>616043287409</UPC>
    
    0 讨论(0)
  • 2021-01-04 00:30

    Use Guava CharMatcher

    String onlyAscii = CharMatcher.ascii().retainFrom(original)
    
    0 讨论(0)
  • 2021-01-04 00:32

    I'm guessing that the source of the URL is more at fault. Perhaps you're fixing the wrong problem? Removing "strange" characters from a URI might give it an entirely different meaning.

    With that said, you may be able to remove all of the non-ASCII characters with a simple string replacement:

    string fixed = original.replaceAll("[^\\x20-\\x7e]", "");
    

    Or you can extend that to all non-four-byte-UTF-8 characters if that doesn't cover the "�" character:

    string fixed = original.replaceAll("[^\\u0000-\\uFFFF]", "");
    
    0 讨论(0)
  • 2021-01-04 00:43
    yourstring=yourstring.replaceAll("[^\\p{ASCII}]", "");
    
    0 讨论(0)
提交回复
热议问题