I\'m interested to find out how to scrub a html page and present it nicely -- remove all the clutters and reformat the main text into a very readable format -- like http://lab.a
https://github.com/jiminoc/goose/wiki does something like you're asking, source code is openly available along with unit tests