Problems getting XML node text in StAX XMLStreamConstants.CHARACTERS event

折月煮酒 提交于 2021-02-07 03:48:44

问题


While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.

So the problem is,

  1. In XMLStreamConstants.CHARACTERS event, when I collect node text as XMLStreamReader.getText() method.
  2. If there is &, <, > or even something hidden for instance in node text, it returns only the first part of the text string. e.g. ABC & XYZ returns only ABC

Simplified Java Source:

    // Start StaX reader
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    try {
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
        int event = xmlStreamReader.getEventType();
        while (true) {
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    switch (xmlStreamReader.getLocalName()) {
                        case "group":
                        // Do something
                            break;
                        case "source":
                            isSource = true;
                            break;
                        case "target":
                            isTarget = true;
                            break;
                        default:
                            isSource = false;
                            isTrans = false;
                            break;
                    }
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (srcData != null) {
                        String srcTrns = xmlStreamReader.getText();
                        if (srcTrns != null) {
                            if (isSource) {
                                // Set source text
                                isSource = false;
                            } else if (isTrans) {
                                // Set target text
                                isTrans = false;
                            }
                        }
                    }
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    if (xmlStreamReader.getLocalName().equals("group")) {
                        // Add to return list
                    }
                    break;
            }
            if (!xmlStreamReader.hasNext()) {
                break;
            }
            event = xmlStreamReader.next();
        }
    } catch (XMLStreamException ex) {
        LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));
    }

I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.

Any suggestions or tips would be a great help to move on learning StAX more. :-)


回答1:


I have solved the problem after struggling and researching a bit.

It was a problem reading text with escaped entity references. You need to set XMLInputFactory IS_COALESCING to true

XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);

Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.



来源:https://stackoverflow.com/questions/22781292/problems-getting-xml-node-text-in-stax-xmlstreamconstants-characters-event

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!