KXmlParser throws “Unexpected token” exception at the start of RSS pasing

血红的双手。 提交于 2019-11-26 18:17:06

问题


I'm trying to parse an RSS feed from Monster on Android v.17 using this URL:

http://rss.jobsearch.monster.com/rssquery.ashx?q=java

To get the content I'm using HttpUrlConnection in the following fashion

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(url.openStream());

What comes back is as far as I can say (and I verified it too) a legit RSS

Cache-Control:private
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:5958
Content-Type:text/xml
Date:Wed, 06 Mar 2013 17:15:20 GMT
P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET

It starts like this (click the URL above if you want to see complete XML):

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Monster Job Search Results java</title>
    <description>RSS Feed for Monster Job Search</description>
    <link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>

But when I attempt to parse it:

final XmlPullParser xpp = getPullParser();
xpp.setInput(is);
for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }

The code immediately chokes on type = xpp.next() with the following Exception

03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException: 
   Unexpected token (position:TEXT @1:2 in java.io.InputStreamReader@414b4538) 

Which actually means it cannot process 2nd char at line 1 <?xml version="1.0" encoding="utf-8"?>

Here are the offending lines in the KXmlParser.java (425-426). The type == TEXT evaluates to true

if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) {
    throw new XmlPullParserException("Unexpected token", this, null);
}

Any help? I did try to set parser to XmlPullParser.FEATURE_PROCESS_DOCDECL = false but that didn't help

I did research this on the web and here and can't find anything that helps


回答1:


The reason you are getting the error is that the xml file doesn't actually start with <?xml version="1.0" encoding="utf-8"?>. It starts with three special bytes EF BB BF which are Byte order mark.

InputStreamReader doesn't handle these bytes automatically, so you have to handle them manually. The simplest way to it is to use BOMInpustStream available in Commons IO library:

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));  

I've checked the code above and it works well for me.



来源:https://stackoverflow.com/questions/15254089/kxmlparser-throws-unexpected-token-exception-at-the-start-of-rss-pasing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!