Looks like libxml2.2
comes in the SDK, and libxml/HTMLparser.h
claims the following:
This module implements an HTML 4.0 non-verifying parser with API compatible with the XML parser ones. It should be able to parse "real world" HTML, even if severely broken from a specification point of view.
That sounds like what I need, so I'm probably going to use that.