302s and losing cookies with urllib2

后端未结

关注

 4  1885

I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload.

I\'ve seen some questions and answers on this,

相关标签:

4条回答

野趣味

2021-01-21 06:54

I was also having the same problem where the server would respond to the login POST request with a 302 and the session token in the Set-Cookie header. Using Wireshark it was clearly visible that urllib was following the redirect but not including the session token in the Cookie.

I literally just ripped out urllib and did a direct replacement with requests and it worked perfectly first time without having to change a thing. Big props to those guys.

0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2021-01-21 07:01
Depends on how the redirect is done. If it's done via a HTTP Refresh, then mechanize has a HTTPRefreshProcessor you can use. Try to create an opener like this:
```
cj = mechanize.CookieJar()
opener = mechanize.build_opener(
    mechanize.HTTPCookieProcessor(cj),
    mechanize.HTTPRefererProcessor,
    mechanize.HTTPEquivProcessor,
    mechanize.HTTPRefreshProcessor)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2021-01-21 07:05
I have been having the exact same problem recently but in the interest of time scrapped it and decided to go with mechanize. It can be used as a total replacement for urllib2 that behaves exactly as you would expect a browser to behave with regards to Referer headers, redirects, and cookies.
```
import mechanize
cj = mechanize.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cj)
browser.set_proxies({'http': '127.0.0.1:8888'})

# Use browser's handlers to create a new opener
opener = mechanize.build_opener(*browser.handlers)
```
The Browser object can be used as an opener itself (using the .open() method). It maintains state internally but also returns a response object on every call. So you get a lot of flexibility.

Also, if you don't have a need to inspect the cookiejar manually or pass it along to something else, you can omit the explicit creation and assignment of that object as well.

I am fully aware this doesn't address what is really going on and why urllib2 can't provide this solution out of the box or at least without a lot of tweaking, but if you're short on time and just want it to work, just use mechanize.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2021-01-21 07:05

I've just got a variation of the below working for me, at least when trying to read Atom from http://www.fudzilla.com/home?format=feed&type=atom

I can't verify that the below snippet will run as-is, but might give you a start:

import cookielib
cookie_jar = cookielib.LWPCookieJar()
cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)
handlers = [cookie_handler] #+others, we have proxy + progress handlers
opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2848 for implementation of _FeedURLHandler
opener.addheaders = [] #may not be needed but see the comments around the link referred to below
try:
    return opener.open(request) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2954 for implementation of request
finally:
    opener.close()

0 讨论(0)