HTML data extract in Java

前端 未结 3 1622
小鲜肉
小鲜肉 2021-01-23 16:11

I have HTML code similar to :

1    
Value 1

2    

        
相关标签:
3条回答
  • 2021-01-23 16:24

    As described in this post, you should not be using regex to parse HTML.

    Use an XML/HTML parser instead.

    0 讨论(0)
  • 2021-01-23 16:40

    http://htmlcleaner.sourceforge.net/

    http://jsoup.org/

    http://jericho.htmlparser.net/docs/index.html

    are the well-known html parser for java. You can use any of them.

    0 讨论(0)
  • 2021-01-23 16:47

    Assuming the html is well formed, you can parse the html using HtmlUnit.

    You could also write you own regular expression to process the page if there is just a single table but I would highly recommend against this as regular expressions might give strange results if the page added additional tables whereas with HtmlUnit you could validate that the page has only a single table before you start to parse or just target the table you wish.

    0 讨论(0)
提交回复
热议问题