I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.
If I have a document of the form:>
Compared to the other parser libraries lxml
is extremely fast:
And with cssselect
it’s quite easy to use for scraping HTML pages too:
from lxml.html import parse
doc = parse('http://www.google.com').getroot()
for div in doc.cssselect('a'):
print '%s: %s' % (div.text_content(), div.get('href'))
lxml.html Documentation