So I\'m trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I\'ve found a few similar programs on here that do something simila
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
If you know that the files are located in the same directory dir
of the website site
and have the following format: filename_01.jpg, ..., filename_10.jpg then download all of them:
import requests
for x in range(1, 10):
str1 = 'filename_%2.2d.jpg' % (x)
str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)
f = open(str1, 'wb')
f.write(requests.get(str2).content)
f.close()
All the above codes, do not allow to preserve the original image name, which sometimes is required. This will help in saving the images to your local drive, preserving the original image name
IMAGE = URL.rsplit('/',1)[1]
urllib.urlretrieve(URL, IMAGE)
Try this for more details.
If you need proxy support you can do this:
if needProxy == False:
returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
else:
proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
urlReader = urllib2.urlopen( myUrl ).read()
with open( fullJpegPathAndName, "w" ) as f:
f.write( urlReader )
Another way to do this is via the fastai library. This worked like a charm for me. I was facing a SSL: CERTIFICATE_VERIFY_FAILED Error
using urlretrieve
so I tried that.
url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)
Using requests
import requests
import shutil,os
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder
def ImageDl(url):
attempts = 0
while attempts < 5:#retry 5 times
try:
filename = url.split('/')[-1]
r = requests.get(url,headers=headers,stream=True,timeout=5)
if r.status_code == 200:
with open(os.path.join(path,filename),'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw,f)
print(filename)
break
except Exception as e:
attempts+=1
print(e)
if __name__ == '__main__':
ImageDl(url)