I need to parse HTML 4 in Java. Ideally I\'d like an implementation that is SAX compatible.
I\'m aware that there are numerous HTML parsers in for Java, however, the
You may wish to check http://lobobrowser.org/cobra.jsp. They have a pure Java web browser (Lobo) implemented. They have the parser component (Cobra) pulled out separately for use. I honestly am not sure if it will do what you require with the "no tidying" requirement, but it may be worth a look. I ran across it when exploring the wild for a pure Java web browser.