Python BeautifulSoup equivalent to lxml make_links_absolute

前端 未结 1 963
旧巷少年郎
旧巷少年郎 2021-02-15 15:31

So lxml has a very hand feature: make_links_absolute:

doc = lxml.html.fromstring(some_html_page)
doc.make_links_absolute(url_for_some_html_page)
<
相关标签:
1条回答
  • 2021-02-15 15:59

    In my answer to What is a simple way to extract the list of URLs on a webpage using python? I covered that incidentally as part of the extraction step; you could easily write a method to do it on the soup and not just extract it.

    from urllib.parse import urljoin
    
    def make_links_absolute(soup, url):
        for tag in soup.findAll('a', href=True):
            tag['href'] = urljoin(url, tag['href'])
    

    (Python 2: from urlparse import urljoin.)

    0 讨论(0)
提交回复
热议问题