Parsing XML with no closing tags in Java

故事扮演 提交于 2019-12-13 08:27:49

问题


I am having trouble parsing an XML with no closing tag. Please see snippet of the xml below.

I have tried SAX and also StAX Parser they both need a properly formatted XML with closing tag XXYY....as you can see below the XML format is a little bit different... Please help me if there is any API out there that can help me parse this or if SAX/StAX can help me achieve what I want.... :(

<Employees>
 <Employee>
  <Detail>
    <Date>2018014
    <Name>XXYY
    <Age>0
    <LANGUAGE>ENG
    <Manager>
    <MName>YYXX
    <MID>5959
    </Manager>
    <EmployeeID>1234
  </Detail>
 </Employee>
</Employees>

回答1:


You could "fix" the XML by adding all the missing end-tags.

Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.

The rule of "contains text" ensures that e.g. the <Manager> tag doesn't get ended, since that is actually ended 3 lines down.

Example working code:

// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);

// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");

// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                                          .parse(new InputSource(new StringReader(xml)));
TransformerFactory.newInstance().newTransformer()
                  .transform(new DOMSource(document), new StreamResult(System.out));



回答2:


That appears to be SGML not XML. I've answered a newer question (for Javascript/node.js, but relevant to Java as well) detailing how to use the OpenSP SGML software to create XML from SGML.



来源:https://stackoverflow.com/questions/48129553/parsing-xml-with-no-closing-tags-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!