I have embedded HTML Tidy in my application to clean incoming HTML. But Tidy has a huge amount of bugs and fixing them directly in the source is my worst nightmare. Tidy source
Try Pretty Diff. It is a vastly superior beautification algorithm and it does not make any assumptions about your input.
http://prettydiff.com/?m=beautify&html