Of course an HTML page can be parsed using any number of python parsers, but I\'m surprised that there don\'t seem to be any public parsing scripts to extract meaningful content
Goose is just the library for this task. To quote their README:
Goose will try to extract the following information: Main text of an article Main image of article Any Youtube/Vimeo movies embedded in article Meta Description Meta tags
Goose will try to extract the following information: