How do I remove strange and unwanted Unicode characters (such as a black diamond with question mark) from a String?
Updated:
Please tell me the Unicode chara
You can use a String.replaceAll("[my-list-of-strange-and-unwanted-chars]","")
There is no Character.isStrangeAndUnWanted()
, you have to define what you want.
If you want to remove control characters you can do
String str = "\u0000\u001f hi \n";
str = str.replaceAll("[\u0000-\u001f]", "");
prints hi
(keeps the space).
EDIT If you want to know the unicode of any 16-bit character you can do
int num = string.charAt(n);
System.out.println(num);
same happened with me when i was converting clob to string using getAsciiStream.
efficiently solved it using
public String getstringfromclob(Clob cl)
{
StringWriter write = new StringWriter();
try{
Reader read = cl.getCharacterStream();
int c = -1;
while ((c = read.read()) != -1)
{
write.write(c);
}
write.flush();
}catch(Exception ec)
{
ec.printStackTrace();
}
return write.toString();
}
You can't because strings are immutable.
It is possible, though, to make a new string that has the unwanted characters removed. Look up String#replaceAll().
I did the other way. I replace all letters that are not defined ((^)):
str.replaceAll("[^a-zA-Z0-9:;.?! ]","")
so for words like : "小米体验版 latin string 01234567890" we will get: "latin string 01234567890"
To delete non-Latin symbols from the string I use the following code:
String s = "小米体验版 latin string 01234567890";
s = s.replaceAll("[^\\x00-\\x7F]", "");
The output string will be: " latin string 01234567890"