The best known are NekoHTML and JTidy.
NekoHTML is based on Xerces, and provides a simple adaptable SAXParser which implements XMLReader JavaSE interface.
JTidy is more intented into formatting your html code into something XML-valid, but is still very useful as an XML parser, producing a DOM tree if needed.
You could have a look at this list for other alternatives.
Another choice could be to use hpricot through jRuby.