I have a collection of HTML documents for which I need to parse the contents of the tags in the
section. These are the only HTML tags whose values I\'mYou can likely use the Jericho HTML Parser. In particular, have a look at this to see how you can go about finding specific tags.
If it suits your application you can use Tidy to convert HTML to valid XML, and then use as much XPath as you like!
JTidy should provide a good starting point for this.