Of course an HTML page can be parsed using any number of python parsers, but I\'m surprised that there don\'t seem to be any public parsing scripts to extract meaningful content
Have a look at templatemaker: http://www.holovaty.com/writing/templatemaker/
It's written by one of the founders of Django. Basically you feed it a few example html files and it will generate a "template" that you can then use to extract just the bits that are different (which is usually the meaningful content).
Here's an example from the google code page:
# Import the Template class.
>>> from templatemaker import Template
# Create a Template instance.
>>> t = Template()
# Learn a Sample String.
>>> t.learn('this and that')
# Output the template so far, using the "!" character to mark holes.
# We've only learned a single string, so the template has no holes.
>>> t.as_text('!')
'this and that'
# Learn another string. The True return value means the template gained
# at least one hole.
>>> t.learn('alex and sue')
True
# Sure enough, the template now has some holes.
>>> t.as_text('!')
'! and !'