问题
I am trying to learn how to parse HTML, but as I don't have a lot of experience in either Java or Android, it's a little complicated. I have read the IBM XML parsing tutorial and have learned to parse an RSS feed. My problem is: I would like to get data from an HTML site. I have read some information on HTML cleaner, JSON, etc., but I can't find a good tutorial to help me. Do you have any tutorials that might be helpful?
Thanks.
回答1:
Check out the following HTML parsers. There are more out there. Maybe one will work for you:
HTMLCleaner: http://htmlcleaner.sourceforge.net/
TagSoup: http://ccil.org/~cowan/XML/tagsoup/
Jericho: http://jericho.htmlparser.net/docs/index.html
回答2:
IMO there are two easy ways to parse HTML:
- Convert the HML to XML (XHTML) using a library (e.g. HTMLTidy) and then use an XML parser
- Use an existing HTML parser (e.g. a standard Web browser like WebKit, ForeFox, and/or IE) and then read the "DOM" which is a more-or-less-API-friendly representation of the parsed HTML
Alternatively, if you want to write your own parser (which I doubt you should, for homework: it would be long and complicated to implement it properly/completely), see the specs for parsing HTML.
来源:https://stackoverflow.com/questions/4831513/html-parsing-in-android