jquery-like HTML parsing in Python?

前端 未结 4 558
梦毁少年i
梦毁少年i 2020-12-04 10:54

Is there any Python library that allows me to parse an HTML document similar to what jQuery does?

i.e. I\'d like to be able to use CSS selectors

相关标签:
4条回答
  • 2020-12-04 11:36

    If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
    Soupselect is a CSS selector extension for BeautifulSoup.

    Usage:

    >>> from BeautifulSoup import BeautifulSoup as Soup
    >>> from soupselect import select
    >>> import urllib
    >>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
    >>> select(soup, 'div.title h3')
    [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
     <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
    ..]
    
    0 讨论(0)
  • 2020-12-04 11:43

    BeautifulSoup, now has support for css selectors

    import requests
    from bs4 import BeautifulSoup as Soup
    html = requests.get('https://stackoverflow.com/questions/3051295').content
    soup = Soup(html)
    

    Title of this question

    soup.select('h1.grid--cell :first-child')[0].text
    

    Number of question upvotes

    # first item 
    soup.select_one('[itemprop="upvoteCount"]').text
    

    using Python Requests to get the html page

    0 讨论(0)
  • 2020-12-04 11:55

    The lxml library supports CSS selectors.

    0 讨论(0)
  • 2020-12-04 11:59

    Consider PyQuery:

    http://packages.python.org/pyquery/

    >>> from pyquery import PyQuery as pq
    >>> from lxml import etree
    >>> import urllib
    >>> d = pq("<html></html>")
    >>> d = pq(etree.fromstring("<html></html>"))
    >>> d = pq(url='http://google.com/')
    >>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())
    >>> d = pq(filename=path_to_html_file)
    >>> d("#hello")
    [<p#hello.hello>]
    >>> p = d("#hello")
    >>> p.html()
    'Hello world !'
    >>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
    [<p#hello.hello>]
    >>> p.html()
    u'you know <a href="http://python.org/">Python</a> rocks'
    >>> p.text()
    'you know Python rocks'
    
    0 讨论(0)
提交回复
热议问题