问题
I want to parse the following xml structure:
<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="title">
<value><![CDATA[Personnel changes: Müller]]></value>
</element>
</document>
</documents>
For parsing this element name="?????
structure I use XPath in the following way:
XPath xPath = XPathFactory.newInstance().newXPath();
String currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
The parsing itself works fine, but there are just some problems with german umlauts (vowels) like "Ü", "ß" or something like this. When I print out currentString the String is:
Personnel changes: Müller
But I want to have the String like in the Xml:
Personnel changes: Müller
Just to add: I cant change the content of the xml file, I have to parse it like I get it, so I definitely have to parse everey String in the correct way.
回答1:
Sounds like an encoding problem. The XML is UTF-8 encoded Unicode which you seem to print encoded as ISO-8859-1. Check the encoding settings of your Java source.
Edit: See Setting the default Java character encoding? for how to set file.encoding
.
回答2:
I found a good and fast solution now:
public static String convertXMLToString(File pCurrentXML) {
InputStream is = null;
try {
is = new FileInputStream(pCurrentXML);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String contents = null;
try {
try {
contents = IOUtils.toString(is, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
} finally {
IOUtils.closeQuietly(is);
}
return contents;
}
Afterwars I convert the String to a DOM object:
static Document convertStringToXMLDocumentObject(String string) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
document = builder.parse(new InputSource(new StringReader(string)));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
And then I can just parse the DOM with XPath for example and all element values are in UTF-8!! Demonstration:
currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);
Output:
Personnel changes: Müller
:)
回答3:
if you know file is utf8 encoded try something like :
FileInputStream fis = new FileInputStream("yourfile.xml");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
InputSource pCurrentXMLAsDOM = new InputSource(in);
来源:https://stackoverflow.com/questions/11861630/java-xpath-umlaut-vowel-parsing