Is it possible and what tools could be used to parse an html document as a string or from a file and then to construct a DOM tree so that a developer can walk the tree throu
You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML.
This is bold, bold italic, italic, normal text gets correctly rewritten as: This is bold, bold italic, italic, normal text.