So I\'m trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I\'ve found a few similar programs on here that do something simila
Maybe you need 'User-Agent':
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()
Using urllib, you can get this done instantly.
import urllib.request
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(URL, "images/0.jpg")
Just for the record, using requests library.
import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()
Though it should check for requests.get() error.
Python 2
Using urllib.urlretrieve
import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
Python 3
Using urllib.request.urlretrieve (part of Python 3's legacy interface, works exactly the same)
import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
This worked for me using python 3.
It gets a list of URLs from the csv file and starts downloading them into a folder. In case the content or image does not exist it takes that exception and continues making its magic.
import urllib.request
import csv
import os
errorCount=0
file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"
# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
images = csv.reader(images)
img_count = 1
print("Please Wait.. it will take some time")
for image in images:
try:
urllib.request.urlretrieve(image[0],
file_list.format(img_count))
img_count += 1
except IOError:
errorCount+=1
# Stop in case you reach 100 errors downloading images
if errorCount>100:
break
else:
print ("File does not exist")
print ("Done!")
For Python 3 you will need to import import urllib.request
:
import urllib.request
urllib.request.urlretrieve(url, filename)
for more info check out the link