why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

前端 未结 3 553
南方客
南方客 2020-12-07 01:54

I have an xml with utf8 encoding. And this file contains BOM a beginning of the file. So during parsing I am facing with org.xml.sax.SAXParseException: Content is not allowe

相关标签:
3条回答
  • 2020-12-07 02:34

    I've experienced the same problem and I've solved it with this code:

    private static InputStream checkForUtf8BOM(InputStream inputStream) throws IOException {
        PushbackInputStream pushbackInputStream = new PushbackInputStream(new BufferedInputStream(inputStream), 3);
        byte[] bom = new byte[3];
        if (pushbackInputStream.read(bom) != -1) {
            if (!(bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF)) {
                pushbackInputStream.unread(bom);
            }
        }
        return pushbackInputStream;
    }
    
    0 讨论(0)
  • 2020-12-07 02:38
    private static char[] UTF32BE = { 0x0000, 0xFEFF };
    private static char[] UTF32LE = { 0xFFFE, 0x0000 };
    private static char[] UTF16BE = { 0xFEFF };
    private static char[] UTF16LE = { 0xFFFE };
    private static char[] UTF8 = { 0xEFBB, 0xBF };
    
    private static boolean removeBOM(Reader reader, char[] bom) throws Exception {
        int bomLength = bom.length;
        reader.mark(bomLength);
        char[] possibleBOM = new char[bomLength];
        reader.read(possibleBOM);
        for (int x = 0; x < bomLength; x++) {
            if ((int) bom[x] != (int) possibleBOM[x]) {
                reader.reset();
                return false;
            }
        }
        return true;
    }
    
    private static void removeBOM(Reader reader) throws Exception {
        if (removeBOM(reader, UTF32BE)) {
            return;
        }
        if (removeBOM(reader, UTF32LE)) {
            return;
        }
        if (removeBOM(reader, UTF16BE)) {
            return;
        }
        if (removeBOM(reader, UTF16LE)) {
            return;
        }
        if (removeBOM(reader, UTF8)) {
            return;
        }
    }
    

    usage:

    // xml can be read from a file, url or string through a stream
    URL url = new URL("some xml url");
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(url.openStream()));
    removeBOM(bufferedReader);
    
    0 讨论(0)
  • 2020-12-07 02:42

    This has come up before, and I found the answer on Stack Overflow when it happened to me. The linked answer uses a PushbackInputStream to test for the BOM.

    0 讨论(0)
提交回复
热议问题