Python - Don't follow redirect on one URL only

纵然是瞬间 提交于 2020-01-23 11:13:28

问题


I'm wondering how you can prevent urllib2 from following a redirect request on my chosen url. I found this snippet of code while browsing but it seems it works globally and I only want it to disable redirect on a certain url:

import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
        result.status = code
        return result
    http_error_301 = http_error_303 = http_error_307 = http_error_302

opener = urllib2.build_opener(RedirectHandler())
webpage = opener.open('http://www.website.com').geturl()
print webpage

I also should mention that I am requesting a url using urllib.urlopen('site.com') and I want the first redirect to be allowed to happen for example say site.com redirects to site.com/redirect but then it tries to redirect again from site.com/redirect to site.com/secondredirect I would like the script to recognise "secondredirect" within the url and stop that request from happening. I hope I explained this all and well and hope to see some replies as I have spent hours upon hours trying to figure this out :headache:


回答1:


There isn't a way to disable redirect-following on a per-request basis with urllib2. You have the option of using httplib which is normally a low-level module used by modules like urllib2.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Another option is using the Python Requests library, which gives you more fine-grained control over how to handle redirects. Requests is the better choice here in my opinion, if you have the option of using another library.




回答2:


import urllib.request

class RedirectFilter(urllib.request.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        if newurl.endswith('.jpg'):
            return None # do not redirect, HTTPError will be raised
        return urllib.request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, hdrs, newurl)

opener = urllib.request.build_opener(RedirectFilter)

opener.open('http://example.com/')

This is for Python 3. For Python 2 replace urllib.request with urllib2.



来源:https://stackoverflow.com/questions/19925583/python-dont-follow-redirect-on-one-url-only

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!