问题
I am writing a script to retrieve transaction information from my bank's home banking website for use in a personal mobile application.
The website is laid out like so:
https:/ /homebanking.purduefed.com/OnlineBanking/Login.aspx
-> Enter username -> Submit form ->
https:/ /homebanking.purduefed.com/OnlineBanking/AOP/Password.aspx
-> Enter password -> Submit form ->
https:/ /homebanking.purduefed.com/OnlineBanking/AccountSummary.aspx
The problem I am having is since there are 2 separate pages to make POSTs, I first thought it was a problem with the session information being lost. But I use urllib2's HTTPCookieProcessor to store the cookies and make GET and POST requests to the website, and have found that this isn't the issue.
My current code is:
import urllib
import urllib2
import cookielib
loginUrl = 'https://homebanking.purduefed.com/OnlineBanking/Login.aspx'
passwordUrl = 'https://homebanking.purduefed.com/OnlineBanking/AOP/Password.aspx'
acctUrl = 'https://homebanking.purduefed.com/OnlineBanking/AccountSummary.aspx'
LoginName = 'sample_username'
Password = 'sample_password'
values = {'LoginName' : LoginName,
'Password' : Password}
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manipulation Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
login_cred = urllib.urlencode(values)
jar = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(jar)
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5')]
opener.addheader = [('Referer', loginUrl)]
response = opener.open(loginUrl, login_cred)
reqPage = opener.open(passwordUrl)
opener.addheader = [('Referer', passwordUrl)]
response2 = opener.open(passwordUrl, login_cred)
reqPage2 = opener.open(acctUrl)
content = reqPage2.read()
Currently, the script makes it to the passwordUrl page, so the username is POSTed correctly, but when the POST is made to the passwordUrl page, instead of going to the acctUrl, it is redirected to the Login page (the redirect location if acctUrl is opened without proper or a lack of credentials).
Any thoughts or comments on how to move forward are greatly appreciated at this point!
来源:https://stackoverflow.com/questions/15605408/logging-into-website-with-multiple-pages-using-python-urllib2-and-cookielib