I have a string that contains a character � I haven\'t been able to replace it correctly.
String.replace(\"�\", \"\");
doesn\'t work, d
for detail
import java.io.UnsupportedEncodingException;
/**
* File: BOM.java
*
* check if the bom character is present in the given string print the string
* after skipping the utf-8 bom characters print the string as utf-8 string on a
* utf-8 console
*/
public class BOM
{
private final static String BOM_STRING = "Hello World";
private final static String ISO_ENCODING = "ISO-8859-1";
private final static String UTF8_ENCODING = "UTF-8";
private final static int UTF8_BOM_LENGTH = 3;
public static void main(String[] args) throws UnsupportedEncodingException {
final byte[] bytes = BOM_STRING.getBytes(ISO_ENCODING);
if (isUTF8(bytes)) {
printSkippedBomString(bytes);
printUTF8String(bytes);
}
}
private static void printSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
int length = bytes.length - UTF8_BOM_LENGTH;
byte[] barray = new byte[length];
System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
System.out.println(new String(barray, ISO_ENCODING));
}
private static void printUTF8String(final byte[] bytes) throws UnsupportedEncodingException {
System.out.println(new String(bytes, UTF8_ENCODING));
}
private static boolean isUTF8(byte[] bytes) {
if ((bytes[0] & 0xFF) == 0xEF &&
(bytes[1] & 0xFF) == 0xBB &&
(bytes[2] & 0xFF) == 0xBF) {
return true;
}
return false;
}
}
Change the Encoding to UTF-8 while parsing .This will remove the special characters
As others have said, you posted 3 characters instead of one. I suggest you run this little snippet of code to see what's actually in your string:
public static void dumpString(String text)
{
for (int i=0; i < text.length(); i++)
{
System.out.println("U+" + Integer.toString(text.charAt(i), 16)
+ " " + text.charAt(i));
}
}
If you post the results of that, it'll be easier to work out what's going on. (I haven't bothered padding the string - we can do that by inspection...)
Character issues like this are difficult to diagnose because information is easily lost through misinterpretation of characters via application bugs, misconfiguration, cut'n'paste, etc.
As I (and apparently others) see it, you've pasted three characters:
codepoint glyph escaped windows-1252 info
=======================================================================
U+00ef ï \u00ef ef, LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf ¿ \u00bf bf, LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd ½ \u00bd bd, LATIN_1_SUPPLEMENT, OTHER_NUMBER
To identify the character, download and run the program from this page. Paste your character into the text field and select the glyph mode; paste the report into your question. It'll help people identify the problematic character.
Use the unicode escape sequence. First you'll have to find the codepoint for the character you seek to replace (let's just say it is ABCD in hex):
str = str.replaceAll("\uABCD", "");
No above answer resolve my issue. When i download xml it apppends <xml
to my xml. I simply
xml = parser.getXmlFromUrl(url);
xml = xml.substring(3);// it remove first three character from string,
now it is running accurately.