Parsing HTML using Python

前端 未结 7 653
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 00:35

I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.

If I have a document of the form:

7条回答
  •  一整个雨季
    2020-11-22 01:19

    Compared to the other parser libraries lxml is extremely fast:

    • http://blog.dispatched.ch/2010/08/16/beautifulsoup-vs-lxml-performance/
    • http://www.ianbicking.org/blog/2008/03/python-html-parser-performance.html

    And with cssselect it’s quite easy to use for scraping HTML pages too:

    from lxml.html import parse
    doc = parse('http://www.google.com').getroot()
    for div in doc.cssselect('a'):
        print '%s: %s' % (div.text_content(), div.get('href'))
    

    lxml.html Documentation

提交回复
热议问题