问题
I'm wondering how you can prevent urllib2 from following a redirect request on my chosen url. I found this snippet of code while browsing but it seems it works globally and I only want it to disable redirect on a certain url:
import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
result.status = code
return result
http_error_301 = http_error_303 = http_error_307 = http_error_302
opener = urllib2.build_opener(RedirectHandler())
webpage = opener.open('http://www.website.com').geturl()
print webpage
I also should mention that I am requesting a url using urllib.urlopen('site.com') and I want the first redirect to be allowed to happen for example say site.com redirects to site.com/redirect but then it tries to redirect again from site.com/redirect to site.com/secondredirect I would like the script to recognise "secondredirect" within the url and stop that request from happening. I hope I explained this all and well and hope to see some replies as I have spent hours upon hours trying to figure this out :headache:
回答1:
There isn't a way to disable redirect-following on a per-request basis with urllib2. You have the option of using httplib which is normally a low-level module used by modules like urllib2.
>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location
Another option is using the Python Requests library, which gives you more fine-grained control over how to handle redirects. Requests is the better choice here in my opinion, if you have the option of using another library.
回答2:
import urllib.request
class RedirectFilter(urllib.request.HTTPRedirectHandler):
def redirect_request(self, req, fp, code, msg, hdrs, newurl):
if newurl.endswith('.jpg'):
return None # do not redirect, HTTPError will be raised
return urllib.request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, hdrs, newurl)
opener = urllib.request.build_opener(RedirectFilter)
opener.open('http://example.com/')
This is for Python 3. For Python 2 replace urllib.request
with urllib2
.
来源:https://stackoverflow.com/questions/19925583/python-dont-follow-redirect-on-one-url-only