download image from url using python urllib but receiving HTTP Error 403: Forbidden

前端 未结 3 1621
广开言路
广开言路 2020-11-27 20:44

I want to download image file from a url using python module \"urllib.request\", which works for some website (e.g. mangastream.com), but does not work for another (mangadoo

相关标签:
3条回答
  • 2020-11-27 20:57

    I try wget with the url in terminal and it works:

    wget -O out_005.png  http://mangadoom.co/wp-content/manga/5170/886/005.png
    

    so my way around is to use the script below, and it works too.

    import os
    out_image = 'out_005.png'
    url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
    os.system("wget -O {0} {1}".format(out_image, url))
    
    0 讨论(0)
  • You can build an opener. Here's the example:

    import urllib.request
    
    opener=urllib.request.build_opener()
    opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
    urllib.request.install_opener(opener)
    
    url=''
    local=''
    urllib.request.urlretrieve(url,local)
    

    By the way, the following codes are the same:

    (none-opener)

    req=urllib.request.Request(url,data,hdr)   
    html=urllib.request.urlopen(req)
    

    (opener builded)

    html=operate.open(url,data,timeout)
    

    However, we are not able to add header when we use:

    urllib.request.urlretrieve()
    

    So in this case, we have to build an opener.

    0 讨论(0)
  • 2020-11-27 21:11

    This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve supports this directly.

    I advise for the use of the beautiful requests library, the code becomes (from here) :

    import requests
    import shutil
    
    r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True)
    if r.status_code == 200:
        with open("img.png", 'wb') as f:
            r.raw.decode_content = True
            shutil.copyfileobj(r.raw, f)
    

    Note that it seems this website does not forbide requests user-agent. But if need to be modified it is easy :

    r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',
                     stream=True, headers={'User-agent': 'Mozilla/5.0'})
    

    Also relevant : changing user-agent in urllib

    0 讨论(0)
提交回复
热议问题