I have HTML code similar to :
1
Value 1
2
-
As described in this post, you should not be using regex to parse HTML.
Use an XML/HTML parser instead.
讨论(0)
-
http://htmlcleaner.sourceforge.net/
http://jsoup.org/
http://jericho.htmlparser.net/docs/index.html
are the well-known html parser for java. You can use any of them.
讨论(0)
-
Assuming the html is well formed, you can parse the html using HtmlUnit.
You could also write you own regular expression to process the page if there is just a single table but I would highly recommend against this as regular expressions might give strange results if the page added additional tables whereas with HtmlUnit you could validate that the page has only a single table before you start to parse or just target the table you wish.
讨论(0)
- 热议问题