Python has several options for HTML scraping in addition to Beatiful Soup. Here are some others:
- mechanize: similar to perl
WWW:Mechanize
. Gives you a browser like object to ineract with web pages
- lxml: Python binding to
libwww
. Supports various options to traverse and select elements (e.g. XPath and CSS selection)
- scrapemark: high level library using templates to extract informations from HTML.
- pyquery: allows you to make jQuery like queries on XML documents.
- scrapy: an high level scraping and web crawling framework. It can be used to write spiders, for data mining and for monitoring and automated testing