发表新帖

发表新帖

Parsing HTML using Python

前端未结

关注

 7  653

爱一瞬间的悲伤 2020-11-22 00:35

I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.

If I have a document of the form:

7条回答

一整个雨季 (楼主)

2020-11-22 01:19
Compared to the other parser libraries lxml is extremely fast:
- http://blog.dispatched.ch/2010/08/16/beautifulsoup-vs-lxml-performance/
- http://www.ianbicking.org/blog/2008/03/python-html-parser-performance.html
And with cssselect it’s quite easy to use for scraping HTML pages too:
```
from lxml.html import parse
doc = parse('http://www.google.com').getroot()
for div in doc.cssselect('a'):
    print '%s: %s' % (div.text_content(), div.get('href'))
```
lxml.html Documentation
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题