问题
I am trying to load a url which redirects to itself. I'm assuming its loading a cookie and its looking for it but it never sees it so there is this infinite loop of requests.
I have tried urllib2, urlfetch, and httplib2. None work.
I tried this though:
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
thing = urllib2.HTTPRedirectHandler()
thing2 = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(thing, thing2)
url = 'http://www.nytimes.com/2005/10/26/business/26fed.html?pagewanted=print'
page = opener.open(url)
This works in shell, but not on the Google App Engine. In the documentation for urlfetch: http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html
under follow_redirects, it says: "Cookies are not handled upon redirection. If cookie handling is needed, set follow_redirects to False and handle both cookies and redirects manually."
I have no idea how to do this and the documentation doesn't seem to give any clues either.
I googled the hell out of this issue and there are NO reported issues like this that work for my problem.
回答1:
A little more explanation. Glad that at least the website's behavior is explained: it wants some cookie, and if the cookie isn't set it redirects to itself with a cookie-setting header. You should probably read up on how cookies work; the website sends the cookie using a Set-Cookie header, and the browser must echo it back (with some variations) in a Cookie header. Python has a library for managing collections of cookies, cookielib to help you with this.
It's best to use the native urlfetch API; its return object has a headers object which is a dict giving all the headers (e.g. the Set-Cookie header). To send specific headers, use the headers argument to the urlfetch.fetch() function. Here you will use the Cookie header (but remember that the format of the Cookie header you set is not the same as that of the Set-Cookie header you receive -- that's where cookielib comes in.
Good luck!
PS. Using curl -v it's easy to see that the site actually sends three different Set-Cookie headers. You probably have to deal with all three.
来源:https://stackoverflow.com/questions/9420795/urlfetch-redirected-into-an-infinite-loop-in-python