jquery-like HTML parsing in Python?

前端未结

关注

 4  558

Is there any Python library that allows me to parse an HTML document similar to what jQuery does?

i.e. I\'d like to be able to use CSS selectors

相关标签:

4条回答

無奈伤痛

2020-12-04 11:36

If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
Soupselect is a CSS selector extension for BeautifulSoup.

Usage:

>>> from BeautifulSoup import BeautifulSoup as Soup >>> from soupselect import select >>> import urllib >>> soup = Soup(urllib.urlopen('http://slashdot.org/')) >>> select(soup, 'div.title h3') [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>, <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>, ..]

0 讨论(0)

发布评论:

提交评论

加载中...

别那么骄傲

2020-12-04 11:43

BeautifulSoup, now has support for css selectors

import requests from bs4 import BeautifulSoup as Soup html = requests.get('https://stackoverflow.com/questions/3051295').content soup = Soup(html)

Title of this question

soup.select('h1.grid--cell :first-child')[0].text

Number of question upvotes

# first item soup.select_one('[itemprop="upvoteCount"]').text

using Python Requests to get the html page

0 讨论(0)

发布评论:

提交评论

加载中...

北荒

2020-12-04 11:55

The lxml library supports CSS selectors.

0 讨论(0)

发布评论:

提交评论

加载中...

我寻月下人不归

2020-12-04 11:59

Consider PyQuery:

http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq >>> from lxml import etree >>> import urllib >>> d = pq("<html></html>") >>> d = pq(etree.fromstring("<html></html>")) >>> d = pq(url='http://google.com/') >>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read()) >>> d = pq(filename=path_to_html_file) >>> d("#hello") [<p#hello.hello>] >>> p = d("#hello") >>> p.html() 'Hello world !' >>> p.html("you know <a href='http://python.org/'>Python</a> rocks") [<p#hello.hello>] >>> p.html() u'you know <a href="http://python.org/">Python</a> rocks' >>> p.text() 'you know Python rocks'

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复

jquery-like HTML parsing in Python?

BeautifulSoup, now has support for css selectors

BeautifulSoup, now has support for `css selectors`