urlfetch redirected into an infinite loop in python

和自甴很熟 提交于 2020-01-14 18:46:46

问题


I am trying to load a url which redirects to itself. I'm assuming its loading a cookie and its looking for it but it never sees it so there is this infinite loop of requests.

I have tried urllib2, urlfetch, and httplib2. None work.

I tried this though:

url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
thing = urllib2.HTTPRedirectHandler()
thing2 = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(thing, thing2)
url = 'http://www.nytimes.com/2005/10/26/business/26fed.html?pagewanted=print'
page = opener.open(url)

This works in shell, but not on the Google App Engine. In the documentation for urlfetch: http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html

under follow_redirects, it says: "Cookies are not handled upon redirection. If cookie handling is needed, set follow_redirects to False and handle both cookies and redirects manually."

I have no idea how to do this and the documentation doesn't seem to give any clues either.

I googled the hell out of this issue and there are NO reported issues like this that work for my problem.


回答1:


A little more explanation. Glad that at least the website's behavior is explained: it wants some cookie, and if the cookie isn't set it redirects to itself with a cookie-setting header. You should probably read up on how cookies work; the website sends the cookie using a Set-Cookie header, and the browser must echo it back (with some variations) in a Cookie header. Python has a library for managing collections of cookies, cookielib to help you with this.

It's best to use the native urlfetch API; its return object has a headers object which is a dict giving all the headers (e.g. the Set-Cookie header). To send specific headers, use the headers argument to the urlfetch.fetch() function. Here you will use the Cookie header (but remember that the format of the Cookie header you set is not the same as that of the Set-Cookie header you receive -- that's where cookielib comes in.

Good luck!

PS. Using curl -v it's easy to see that the site actually sends three different Set-Cookie headers. You probably have to deal with all three.



来源:https://stackoverflow.com/questions/9420795/urlfetch-redirected-into-an-infinite-loop-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!