Parsing content which contains html tags using XMLPullParser

一世执手 提交于 2019-12-11 13:18:13

问题


I am building an app in android using XmlPullParser.

How can I get the content from an html formatted like this?

<div class="content">
"Some text is here."
<br>
"some more text "<a class="link" href="adress">continues here</a>
<br>
</div>

I want to parse all the content like this:

"Some text is here. 
 some more text continues here"

"continues here" part should also be hyperlinked.

ADDITION after some comments: HTML is first put into Yahoo YQL and YQL generates an XML. I use the generated XML file in the code. Above mentioned part that i want to parse is from the generated XML.


回答1:


Both HTML and XML, although they share common syntax in some cases, are different. I think using a XmlPullParser for that purpose is not a good idea. I recommend using one of the several Java HTML parsers for that.




回答2:


XmlPullParser is meant to deal with XML. It's really rare to encounter XHMTL pages that are well structured on the web. An XML Parser would expect very well formatted data and is not supposed to be fault tolerant. On the other hand, HTML is usually loosely organized.

So, no, it's not a good idea. You should prefer other libraries like tagsoup or geronimo.

PS : and the best when you ask a stack over flow question is to try something by yourself and, if blocked, then ask. Not the other way around.



来源:https://stackoverflow.com/questions/21546147/parsing-content-which-contains-html-tags-using-xmlpullparser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!