I need to convert HTML to plain text. My only requirement of formatting is to retain new lines in the plain text. New lines should be displayed not only in the case of <
<
I would use SAX. If your document is not well-formed XHTML, I would transform it with JTidy.