Extract image links from the webpage using Python

后端未结

关注

 3  678

走了就别回头了 2021-01-07 00:39

So I wanted to get all of the pictures on this page(of the nba teams). http://www.cbssports.com/nba/draft/mock-draft

However, my code gives a bit more than that. It

3条回答

孤城傲影 (楼主)

2021-01-07 01:41

You can use this functions for getting the list of all images url from url.

#
#
# get_url_images_in_text()
#
# @param html - the html to extract urls of images from him.
# @param protocol - the protocol of the website, for append to urls that not start with protocol.
#
# @return list of imags url.
#
#
def get_url_images_in_text(html, protocol):
    urls = []
    all_urls = re.findall(r'((http\:|https\:)?\/\/[^"\' ]*?\.(png|jpg))', html, flags=re.IGNORECASE | re.MULTILINE | re.UNICODE)
    for url in all_urls:
        if not url[0].startswith("http"):
            urls.append(protocol + url[0])
        else:
            urls.append(url[0])

    return urls

#
#
# get_images_from_url()
#
# @param url - the url for extract images url from him. 
#
# @return list of images url.
#
#
def get_images_from_url(url):
    protocol = url.split('/')[0]
    resp = requests.get(url)
    return get_url_images_in_text(resp.text, protocol)

0 讨论(0)

查看其它3个回答