Parsing the CDATA section in XML using XML Pull Parser

本小妞迷上赌 提交于 2019-12-13 08:03:06

问题


Sample XML

<feed xmlns="http://www.w3.org/2005/Atom">
    <title>NDTV News - Top Stories</title>
    <link>http://www.ndtv.com/</link>
    <description>Latest entries</description>
    <language>en</language>
    <pubDate>Wed, 31 Jul 2013 22:33:00 GMT</pubDate>
    <lastBuildDate>Wed, 31 Jul 2013 22:33:00 GMT</lastBuildDate>
    <entry>
    <title>Narendra Modi to be BJP's PM candidate, announcement before crucial assembly polls: sources</title>
    <link>http://feedproxy.google.com/~r/NdtvNews-TopStories/~3/XN7dMIDe5YI/story01.htm</link>
    <published>Wed, 31 Jul 2013 13:58:31 GMT</published>
    <author>
    <name>user42715</name>
    </author>
    <content type="html"><![CDATA[<div align="center"><a href="http://www.ndtv.com/news/images/topstory_thumbnail/  Shatrughan_Sinha_agency_120.jpg"><img border="0" src="http://www.ndtv.com/news/images/topstory_thumbnail/Shatrughan_Sinha_agency_120.jpg" alt="2013-07-29-08-43-05" /></a></div><p><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.</span><br /><br /><span style="font-size: large;"> The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September. </span><br /><br /><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.   </span><br /><br /></p>]]></content>
   </entry>
</feed>

With the below code I was able to retrieve , and values within the tag.

XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        private XmlPullParser parser = factory.newPullParser();
        private InputStream urlStream = downloadUrl(urlString);
        parser.setInput(urlStream, null);
        int eventType = parser.getEventType();
        boolean done = false;

        while (eventType != XmlPullParser.END_DOCUMENT && !done) {
            tagName = parser.getName();

            switch (eventType) {
            case XmlPullParser.START_DOCUMENT:                  
                break;
            case XmlPullParser.START_TAG:
                if (tagName.equals("entry")) {                      
                }
                if (tagName.equals("title")) {
                    title = parser.nextText().toString();
                    Log.i(TITLE, title);
                }
                if (tagName.equals("published")) {
                    pubDate = parser.nextText().toString();
                    Log.i(PUBLISHEDDATE, pubDate);
                }

                if (tagName.equals("author")) {
                    readAuthor(parser);
                    Log.i(AUTHOR, author);
                }

                break;
            case XmlPullParser.END_TAG:
                if (tagName.equals("feed")) {
                    done = true;
                } else if (tagName.equals("entry")) {

                    rssFeed = new RssFeedStructure(title);
                    rssFeedList.add(rssFeed);
                }
                break;
            }
            eventType = parser.next();
        }

        private String readAuthor(XmlPullParser parser) throws IOException,
            XmlPullParserException {
            parser.nextTag();
            parser.require(XmlPullParser.START_TAG, null, "name");
            author = parser.nextText().toString();
            parser.require(XmlPullParser.END_TAG, null, "name");
            return author;
        }

From the tag how can I retrieve the "href" value within the and the text value(The BJP is likely to anoint Narendra Modi.....) from the

tag.


回答1:


You can use JSoup. Download @ http://jsoup.org/download. Add the jar to the libs folder.

To parser i copied the rss feed to xml file in assests folder. (localy)

XmlPullParser xpp = factory.newPullParser();
InputStream is = this.getAssets().open("xmlparser.xml");
xpp.setInput(is, "UTF_8");

You can use the below since you have the url. I ave shown how to extract the url and the content. you need to extract the contents of other tags as you would do normally.

  XmlPullParser xpp = factory.newPullParser();

    xpp.setInput(urlStream, null);

    boolean insideItem = false;

    // Returns the type of current event: START_TAG, END_TAG, etc..
    int eventType = xpp.getEventType();
    while (eventType != XmlPullParser.END_DOCUMENT) {
        if (eventType == XmlPullParser.START_TAG) {

            if (xpp.getName().equalsIgnoreCase("entry")) {
                insideItem = true;
            }
             else if (xpp.getName().equalsIgnoreCase("content")) {
                    if (insideItem)
                    {
                        Document doc = Jsoup.parse(xpp.nextText());

                        Elements links = doc.select("a[href]"); // a with href
                          for (Element link : links) {
                                Log.i("........",""+link.attr("abs:href"));
                            }

                        Element divcontent = doc.select("span").first();

                        Log.i("..........",""+divcontent.text());

                    }
                }
        } else if (eventType == XmlPullParser.END_TAG
                && xpp.getName().equalsIgnoreCase("entry")) {
            insideItem = false;
        }

        eventType = xpp.next(); // move to next element
    }

} catch (MalformedURLException e) {
    e.printStackTrace();
} catch (XmlPullParserException e1) {
    e1.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
}

Log :

08-03 08:03:04.413: I/........(1524): http://www.ndtv.com/news/images/topstory_thumbnail/   Shatrughan_Sinha_agency_120.jpg
08-03 08:03:04.423: I/..........(1524): The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.

Edit: To loop through the elements

Elements divcontent = doc.select("span");
for(int k= 1;k<divcontent.size();k++)
{
     String spancontent =divcontent.get(k).text();
     Log.i("..........",spancontent);
}


来源:https://stackoverflow.com/questions/18030164/parsing-the-cdata-section-in-xml-using-xml-pull-parser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!