Extract image links from the webpage using Python

后端 未结 3 678
走了就别回头了
走了就别回头了 2021-01-07 00:39

So I wanted to get all of the pictures on this page(of the nba teams). http://www.cbssports.com/nba/draft/mock-draft

However, my code gives a bit more than that. It

3条回答
  •  孤城傲影
    2021-01-07 01:41

    You can use this functions for getting the list of all images url from url.

    #
    #
    # get_url_images_in_text()
    #
    # @param html - the html to extract urls of images from him.
    # @param protocol - the protocol of the website, for append to urls that not start with protocol.
    #
    # @return list of imags url.
    #
    #
    def get_url_images_in_text(html, protocol):
        urls = []
        all_urls = re.findall(r'((http\:|https\:)?\/\/[^"\' ]*?\.(png|jpg))', html, flags=re.IGNORECASE | re.MULTILINE | re.UNICODE)
        for url in all_urls:
            if not url[0].startswith("http"):
                urls.append(protocol + url[0])
            else:
                urls.append(url[0])
    
        return urls
    
    #
    #
    # get_images_from_url()
    #
    # @param url - the url for extract images url from him. 
    #
    # @return list of images url.
    #
    #
    def get_images_from_url(url):
        protocol = url.split('/')[0]
        resp = requests.get(url)
        return get_url_images_in_text(resp.text, protocol)
    

提交回复
热议问题