Getting encoding type of a XML in java

前端 未结 3 1246
北恋
北恋 2021-01-12 04:26

I am parsing XML using DocumentBuilder in java 1.4.
XML has first line as

xml version=\"1.0\" encoding=\"GBK\"

I want to

相关标签:
3条回答
  • 2021-01-12 04:32

    This one works for various encodings, taking into concern both the BOM and the XML declaration. Defaults to UTF-8 if neither applies:

    String encoding;
    FileReader reader = null;
    XMLStreamReader xmlStreamReader = null;
    try {
        InputSource is = new InputSource(file.toURI().toASCIIString());
        XMLInputSource xis = new XMLInputSource(is.getPublicId(), is.getSystemId(), null);
        xis.setByteStream(is.getByteStream());
        PropertyManager pm = new PropertyManager(PropertyManager.CONTEXT_READER);
        for (Field field : PropertyManager.class.getDeclaredFields()) {
            if (field.getName().equals("supportedProps")) {
                field.setAccessible(true);
                ((HashMap<String, Object>) field.get(pm)).put(
                        Constants.XERCES_PROPERTY_PREFIX + Constants.ERROR_REPORTER_PROPERTY,
                        new XMLErrorReporter());
                break;
            }
        }
        encoding = new XMLEntityManager(pm).setupCurrentEntity("[xml]".intern(), xis, false, true);
        if (encoding != "UTF-8") {
            return encoding;
        }
    
        // From @matthias-heinrich’s answer:
        reader = new FileReader(file);
        xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(reader);
        encoding = xmlStreamReader.getCharacterEncodingScheme();
    
        if (encoding == null) {
            encoding = "UTF-8";
        }
    } catch (RuntimeException e) {
        throw e;
    } catch (Exception e) {
        throw new UndeclaredThrowableException(e);
    } finally {
        if (xmlStreamReader != null) {
            try {
                xmlStreamReader.close();
            } catch (XMLStreamException e) {
            }
        }
        if (reader != null) {
            try {
                reader.close();
            } catch (IOException e) {
            }
        }
    }
    return encoding;
    

    Tested on Java 6 with:

    • UTF-8 XML file with BOM, with XML declaration ✓
    • UTF-8 XML file without BOM, with XML declaration ✓
    • UTF-8 XML file with BOM, without XML declaration ✓
    • UTF-8 XML file without BOM, without XML declaration ✓
    • ISO-8859-1 XML file (no BOM), with XML declaration ✓
    • UTF-16LE XML file with BOM, without XML declaration ✓
    • UTF-16BE XML file with BOM, without XML declaration ✓

    Standing on the shoulders of these giants:

    import java.io.*;
    import java.lang.reflect.*;
    import java.util.*;
    import javax.xml.stream.*;
    import org.xml.sax.*;
    import com.sun.org.apache.xerces.internal.impl.*;
    import com.sun.org.apache.xerces.internal.xni.parser.*;
    
    0 讨论(0)
  • 2021-01-12 04:40

    Using javax.xml.stream.XMLStreamReader to parse your file, then you can call getEncoding().

    0 讨论(0)
  • 2021-01-12 04:46

    One way to this works like this

    final XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader( new FileReader( testFile ) );
    
    //running on MS Windows fileEncoding is "CP1251"
    String fileEncoding = xmlStreamReader.getEncoding(); 
    
    //the XML declares UTF-8 so encodingFromXMLDeclaration is "UTF-8"
    String encodingFromXMLDeclaration = xmlStreamReader.getCharacterEncodingScheme(); 
    
    0 讨论(0)
提交回复
热议问题