How to log in to a website with urllib?

自闭症网瘾萝莉.ら 提交于 2019-12-01 10:49:13

The site is using a JSESSIONID cookie to create session since HTTP requests are stateless. When you're making your request, you're not getting that session id first.

I sniffed a session to log into that site using Fiddler and found that the POST is made to a different URL, but it has that JSESSIONID cookie set. So you need to make a get to the URL first, capture that cookie using the cookiehandler, then POST to this URL:

post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

You don't need to save the HTTP GET request at all, you can simply call opener.open(url), then in your code change the response line to this:

response = opener.open(post_url, binary_data)

Also the payload was missing the submit method. Here's the whole thing with the changes I suggest:

import http.cookiejar
import urllib

get_url = 'http://www.broadinstitute.org/cmap/index.jsp'
post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

values = urllib.parse.urlencode({'j_username': <MYCOOLUSERNAME>,
          'j_password': <MYCOOLPASSSWORD>,
          'submit': 'sign in'})
payload = bytes(values, 'ascii')
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(
    urllib.request.HTTPRedirectHandler(),
    urllib.request.HTTPHandler(debuglevel=0),
    urllib.request.HTTPSHandler(debuglevel=0),
    urllib.request.HTTPCookieProcessor(cj))

opener.open(get_url) #First call to capture the JSESSIONID
resp = opener.open(post_url, payload)
resp_html = resp.read()
resp_headers = resp.info()

Any other requests using the opener you created will re-use that cookie and you should be able to freely navigate the site.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!