Can you provide examples of parsing HTML?

后端 未结 29 2218
走了就别回头了
走了就别回头了 2020-11-22 13:49

How do you parse HTML with a variety of languages and parsing libraries?


When answering:

Individual comments will be linked to in answers to questions

29条回答
  •  有刺的猬
    2020-11-22 14:39

    language: Python
    library: lxml.html

    import lxml.html
    
    html = ""
    for link in ("foo", "bar", "baz"):
        html += '%s' % (link, link)
    html += ""
    
    tree = lxml.html.document_fromstring(html)
    for element, attribute, link, pos in tree.iterlinks():
        if attribute == "href":
            print link
    

    lxml also has a CSS selector class for traversing the DOM, which can make using it very similar to using JQuery:

    for a in tree.cssselect('a[href]'):
        print a.get('href')
    

提交回复
热议问题