问题
I am trying to figure out how to simply exclude the BOM while using the example given by Apache.
I am reading a file from Internal Storage and converting it first into a String
. Then I convert it into ByteArray
so that I get an InputStream
. Then I check with BOMInputStream
for BOMs, since I had errors for "Unexpected Tokens". Now I don't know how to exclude the BOM if I have it.
CODE:
StringBuffer fileContent = new StringBuffer("");
String temp = "";
int ch;
try{
FileInputStream fis = ctx.openFileInput("dataxml");
try {
while( (ch = fis.read()) != -1)
fileContent.append((char)ch);
temp = temp + Character.toString((char)ch);
} catch (IOException e) {
e.printStackTrace();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
InputStream ins = new ByteArrayInputStream(temp.getBytes(StandardCharsets.UTF_8));
BOMInputStream bomIn = new BOMInputStream(ins);
if (bomIn.hasBOM()) {
// has a UTF-8 BOM
}
xpp.setInput(ins,"UTF-8");
parseXMLAndStoreIt(xpp);
ins.close();
The filename is "dataxml", which I store in different Class with openFileOutput
.
回答1:
I've never used BOMInputStream before but to exclude a byte order mark from the stream you'd just have to read starting at an offset that is one greater than the location of the end of the BOM. Does BOMInputStream have a property indicating the location of the BOM? Also, you can have a look here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
回答2:
You can use BOMInputStream to remove BOM like this:
BOMInputStream bis = new BOMInputStream(inputStream);
if (bis.hasBOM()) {
bis.skip(bis.getBOM().length());
}
if it won't work you can adjust skip parameter. In my case I got working solution with:
bis.skip(bis.getBOM().length()-3);
回答3:
You are building a String reading characters from an InputStream disregarding BOM and encoding. The way you read characters from the steam converting one byte to one character is bad, very bad. Please use any implementation of Reader (specifying the encoding) to read characters from a sequence of bytes.
Later you convert the String back to bytes (and there you take care specifying the encoding. If you compare the sequence of byte you obtain at this point, it is probably very different than the one you fetched from your store.
来源:https://stackoverflow.com/questions/27136230/how-to-exclude-bom-with-bom-inputstream