I have embedded HTML Tidy in my application to clean incoming HTML. But Tidy has a huge amount of bugs and fixing them directly in the source is my worst nightmare. Tidy source
Try Pretty Diff. It is a vastly superior beautification algorithm and it does not make any assumptions about your input.
http://prettydiff.com/?m=beautify&html
For something that actually fixes code, your best bet is still HTML Tidy. There are a lot of linters, but not really anything that repairs errors to HTML, other than Tidy.
At first glance, modern OOP programmers might think that the source code is an unreadable abomination, but in the C world, Tidy is pretty sophisticated library that uses a lot of advanced OO concepts and offers a very thoughtful interface that exposes nearly all of its functionality in a pure C API.
A casual developer will be lost, but once immersed, the code is quite beautiful. Granted, naming conventions are a mixed bad, but PR's are welcome!
Could you tell us what you plan to use this tool for? As in, do you want to fix static web pages, or do you want some sort of filtering step before other manipulations, so that some tool can handle buggy web pages?
Personally, I write my own tool atop Python's BeautifulSoup or lxml whenever I need to --- it's at most a dozen line script and does much of what I want.
There is a new, nice, proper HTML 5 supporting Tidy, so the alternative to old, ugly Tidy would be Tidy (GitHub repository).