Can you provide examples of parsing HTML?

后端未结

关注

 29  2255

走了就别回头了 2020-11-22 13:49

How do you parse HTML with a variety of languages and parsing libraries?

When answering:

Individual comments will be linked to in answers to questions

29条回答

有刺的猬 (楼主)

2020-11-22 14:39

language: Python
library: lxml.html

import lxml.html

html = ""
for link in ("foo", "bar", "baz"):
    html += '%s' % (link, link)
html += ""

tree = lxml.html.document_fromstring(html)
for element, attribute, link, pos in tree.iterlinks():
    if attribute == "href":
        print link

lxml also has a CSS selector class for traversing the DOM, which can make using it very similar to using JQuery:

for a in tree.cssselect('a[href]'):
    print a.get('href')

0 讨论(0)

查看其它29个回答