How to parse an XML with colons in some tags?

人走茶凉 提交于 2019-12-13 18:02:35

问题


I've been reading some tutorials on XmlPullParser in Android on how to parse XML data. To be more specific, I'm using the XML from https://gdata.youtube.com/feeds/api/standardfeeds/top_rated

Here I simplify part on an entry from this feed (I hope without altering the structure) in:

<entry>
<id>http://gdata.youtube.com/feeds/api/videos/abc45678qwe</id>
[...]
<title type='text'>THE TITLE</title>
[...]
<link rel='alternate' type='text/html' href='https://www.youtube.com/watch?v=abc45678qwe&amp;feature=youtube_gdata'/>
[...]
<media:group>
[...]
<media:title type='plain'>THE TITLE</media:title>
<yt:duration seconds='300'/>
[...]
<yt:videoid>abc45678qwe</yt:videoid>
</media:group>
<gd:rating average='1' max='5' min='1' numRaters='1' rel='http://schemas.google.com/g/2005#overall'/>
<yt:statistics favoriteCount='0' viewCount='11111111'/>
<yt:rating numDislikes='111' numLikes='111'/>
</entry>

I successfully get the title and the link with:

private String[] readEntry(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, null, "entry");
    String title = null;
    String link = null;

    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }

        String name = parser.getName();
        String rel = parser.getAttributeValue(null, "rel");

        if (name.equalsIgnoreCase("title")) {
            title = readTitle(parser);
        } else if (name.equalsIgnoreCase("link")
                && rel.equals("alternate")) {
            link = readLink(parser);
        } else {
            skip(parser);
        }
    }
    return new String[] { title, link };
}

private String readLink(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    String link = "";
    parser.require(XmlPullParser.START_TAG, null, "link");

    link = parser.getAttributeValue(null, "href");
    parser.nextTag();

    parser.require(XmlPullParser.END_TAG, null, "link");

    return link;
}

private String readTitle(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, null, "title");
    String title = readText(parser);
    parser.require(XmlPullParser.END_TAG, null, "title");
    return title;
}

But no matter what I try, I'm not able to get the duration in seconds from <yt:duration seconds='300'/>.

Clearly it can't be accessed with something similar to the above methods, as handling namespaces should be required, but I'm not sure. Since I'm kinda lost on this, any suggestion is much appreciated. Thanks.

====

edit: I'm adding what I tried to enter the tag yt:duration.

I added other checks before skip(parser);. I.e.:

} else if (name.equalsIgnoreCase("yt:")) {
    Utils.logger("i", "entering yt:", TAG);
    readDuration(parser)
}

and I changed "yt:" with "yt", or "yt:duration with no result.
Also with

String namespace = parser.getNamespace();

and changing name.equalsIgnoreCase... with namespace.equalsIgnoreCase... I don't get the log entry, so I don't even had a way to try this:

private String readDuration(XmlPullParser parser)
        throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, "yt", "duration");

    String seconds = parser.getAttributeValue(null, "seconds");
    parser.nextTag();

    parser.require(XmlPullParser.END_TAG, "yt", "duration");

    Utils.logger("i", "duration: " + seconds + " seconds", TAG);
    return seconds;
}

Addition made "on request". I'm not sure it's useful enough.


回答1:


XmlPullParser seems to have the ability to be namespace aware, the difference is it has to be explicitly set. Per the documentation of XmlPullParseFactory#setNamespaceAware:

Specifies that the parser produced by this factory will provide support for XML namespaces. By default the value of this is set to false.

You might want to try that option.

Also, as mentioned in the comments I have tried to traverse through your xml with DOM with zero issues, below is the source code of printing all the duration values (just to let you know, this is to be run as a Java program and not within the ADT):

public static void main(String[] args) throws ParserConfigurationException,
            SAXException, IOException {
        InputStream path = new URL(
                "https://gdata.youtube.com/feeds/api/standardfeeds/top_rated")
                .openStream();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(path);
        traverse(document.getDocumentElement());

    }

    public static void traverse(Node node) {
        NodeList list = node.getChildNodes();
        for (int i = 0; i < list.getLength(); i++) {
            Node currentNode = list.item(i);
            traverse(currentNode);

        }

        if (node.getNodeName().equals("yt:duration")) {
            Element durationElement = (Element) node;
            System.out.println(durationElement.getAttribute("seconds"));
        }

    }

Output I get:

56
361
225
265
219
220
259
267
376
205
127
308
249
17
162
220
183
298
172
267
204
209

I always prefer recursion (as above) with DOM as it simplifies the full traversal thereby providing the flexibility too.

If you want to know more about grouping these elements together, you can refer to my post here as well.



来源:https://stackoverflow.com/questions/21785054/how-to-parse-an-xml-with-colons-in-some-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!