A description of what I\'m going to accomplish:
You could start by using beautifulsoup to parse both documents.
Then you have a choice:
prettify
to render both documents as more or less standardized HTML and diff
those.The latter allows you to e.g. discard elements that only affect the presentation, not the content. The former is probably easier.