stax

I can't write correctly words to xml using StAX

三世轮回 提交于 2019-12-13 08:18:36
问题 I have a problem, I want to get something like this: <text> <sentence> <word>a</word> <word>had</word> <word>lamb</word> <word>little</word> <word>Mary</word> </sentence> <sentence> <word>Aesop</word> <word>and</word> <word>called</word> <word>came</word> <word>for</word> <word>Peter</word> <word>the</word> <word>wolf</word> </sentence> <sentence> <word>Cinderella</word> <word>likes</word> <word>shoes</word> </sentence> but I get only this: <text> <sentence> <word>a</word> <word>had</word>

Characters generated by Apache Commons StringEscapeUtils.unescapeHtml cannnot be parsed using StAX

自作多情 提交于 2019-12-13 07:34:20
问题 I am trying to parse content of HTML table and write it to CSV. I am trying StaX parser The html contains escaped characters like &nbps' and & I am using org.apache.commons.lang3.StringEscapeUtils to usescape the html line by line and write to a new file. StAX still fails to parse the unescaped characters. Please help me fix or handle this exception. I test with below xml fragment - <root><element>A   B   </element></root> I call below code to unescape html - StringEscapeUtils.unescapeHtml4

How to get element only elements with values Stax

爱⌒轻易说出口 提交于 2019-12-13 02:19:36
问题 I'm trying to get only elements that have text, ex xml : <root> <Item> <ItemID>4504216603</ItemID> <ListingDetails> <StartTime>10:00:10.000Z</StartTime> <EndTime>10:00:30.000Z</EndTime> <ViewItemURL>http://url</ViewItemURL> .... </item> It should print Element Local Name:ItemID Text:4504216603 Element Local Name:StartTime Text:10:00:10.000Z Element Local Name:EndTime Text:10:00:30.000Z Element Local Name:ViewItemURL Text:http://url This code prints also root, item etc. Is it even possible, it

Java - Read XML and leave all entities alone

左心房为你撑大大i 提交于 2019-12-13 02:11:20
问题 I want to read XHTML files using SAX or StAX, whatever works best. But I don't want entities to be resolved, replaced or anything like that. Ideally they should just remain as they are. I don't want to use DTDs. Here's an (executable, using Scala 2.8.x) example: import javax.xml.stream._ import javax.xml.stream.events._ import java.io._ println("StAX Test - "+args(0)+"\n") val factory = XMLInputFactory.newInstance factory.setProperty(XMLInputFactory.SUPPORT_DTD, false) factory.setProperty

How to get XML element path using stax/stax2?

删除回忆录丶 提交于 2019-12-12 04:25:52
问题 I want to get element path while parsing XML using java StAX2 parser. How to get information about the current element path? <root> <a><b>x</b></a> </root> In this example the path is /root/a/b . 回答1: Keep a stack. Push the element name on START_ELEMENT and pop it on END_ELEMENT. Here's a short example. It does nothing other than print the path of the element being processed. public static void main(String[] args) throws IOException, XMLStreamException { try (FileInputStream in = new

XmlStreamReader not reading complete text value

泪湿孤枕 提交于 2019-12-12 03:39:55
问题 It seems like this question has come up before as I see in Reading escape characters with XMLStreamReader But the issue I am seeing here is little different. I am reading a pretty big XML file which contains a large snippet of malformed html as one of the tag values. The values are enclosed in CDATA and normally they do not cause any issue. But intermittently, getText method of XMLSTreamReader class reads only half of the text in this CDATA and the first character in next batch is as an

merge multiple XML files using STAX parser

别来无恙 提交于 2019-12-11 22:15:36
问题 I have multiple XML files.all nodes are similar. Please provide an example how to merge XML files using STAX Parser and apply a stylesheet on it. 回答1: If you want to apply XSLT to several XML documents then (with pure XSLT, I don't know about Stax) you can simply use the document function (XSLT 1.0 and 2.0) or the collection function (with XSLT 2.0) e.g. <xsl:template match="/"> <root> <xsl:apply-templates select="document('file1.xml')/* | document('file2.xml')/* | document('file3.xml')/*"/>

Avoid namespace while Parsing xml with woodstox

我是研究僧i 提交于 2019-12-11 17:06:42
问题 I am trying to parse an xml File and remove namespaces and prefix using woodstox parser(the xml contains nested elements and each element contains namespace at every level) Below is the code i use to parse.I get the same input as i pass.Please help in resolving the issue byte[] byteArray = null; try { File file = new File(xmlFileName); byteArray = new byte[(int) file.length()]; byteArray = FileUtils.readFileToByteArray(file); } catch (Exception e) { e.printStackTrace(); } InputStream

IllegalStateException while parsing the XML?

纵然是瞬间 提交于 2019-12-11 09:32:53
问题 While parsing the XMl, If the XML has one parent tag then it is working fine, If it has multiple parent tags then it is throwing the following exception. java.lang.IllegalStateException: Current state END_ELEMENT is not among the statesCHARACTERS, COMMENT, CDATA, SPACE, ENTITY_REFERENCE, DTD valid for getText() at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getText(Unknown Source) at com.axxonet.queue.xmlParserValues.parse(xmlParserValues.java:37) at com.axxonet.queue

Parsing an xml response from a url using STAX

时光总嘲笑我的痴心妄想 提交于 2019-12-11 07:34:37
问题 I am sending a xml request through java code and getting the xml response using the below code: BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8")); StringBuilder sb = new StringBuilder(); String line = null; while ((line = rd.readLine()) != null) { sb.append(line + '\n'); } Now I need to parse the xml response using STAX so I have written a method for parsing: XMLReader reader = new XMLReader(); ArrayList<String> List = reader.parse(connection