How to programmatically retrieve access_token from client-side OAuth flow using Python?

问题

This question was posted on StackApps, but the issue may be more a programming issue than an authentication issue, hence it may deserve a better place here.

I am working on an desktop inbox notifier for StackOverflow, using the API with Python.

The script I am working on first logs the user in on StackExchange, and then requests authorisation for the application. Assuming the application has been authorised through web-browser interaction of the user, the application should be able to make requests to the API with authentication, hence it needs the access token specific to the user. This is done with the URL: https://stackexchange.com/oauth/dialog?client_id=54&scope=read_inbox&redirect_uri=https://stackexchange.com/oauth/login_success.

When requesting authorisation via the web-browser the redirect is taking place and an access code is returned after a #. However, when requesting this same URL with Python (urllib2), no hash or key is returned in the response.

Why is it my urllib2 request is handled differently from the same request made in Firefox or W3m? What should I do to programmatically simulate this request and retrieve the access_token?

Here is my script (it's experimental) and remember: it assumes the user has already authorised the application.

#!/usr/bin/env python

import urllib
import urllib2
import cookielib
from BeautifulSoup import BeautifulSoup
from getpass import getpass    

# Define URLs
parameters = [ 'client_id=54',
               'scope=read_inbox',
               'redirect_uri=https://stackexchange.com/oauth/login_success'
             ]

oauth_url = 'https://stackexchange.com/oauth/dialog?' + '&'.join(parameters)
login_url = 'https://openid.stackexchange.com/account/login'
submit_url = 'https://openid.stackexchange.com/account/login/submit'
authentication_url = 'http://stackexchange.com/users/authenticate?openid_identifier='

# Set counter for requests:
counter = 0

# Build opener
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))

def authenticate(username='', password=''):

    '''
        Authenticates to StackExchange using user-provided username and password
    '''

    # Build up headers
    user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
    headers = {'User-Agent' : user_agent}

    # Set Data to None
    data = None

    # 1. Build up URL request with headers and data    
    request = urllib2.Request(login_url, data, headers)
    response = opener.open(request)

    # Build up POST data for authentication
    html = response.read()
    fkey = BeautifulSoup(html).findAll(attrs={'name' : 'fkey'})[0].get('value').encode()

    values = {'email' : username,
              'password' : password,
              'fkey' : fkey}

    data = urllib.urlencode(values)

    # 2. Build up URL for authentication
    request = urllib2.Request(submit_url, data, headers)
    response = opener.open(request)

    # Check if logged in
    if response.url == 'https://openid.stackexchange.com/user':
        print ' Logged in! :) '
    else:
        print ' Login failed! :( '

    # Find user ID URL    
    html = response.read()
    id_url = BeautifulSoup(html).findAll('code')[0].text.split('"')[-2].encode()

    # 3. Build up URL for OpenID authentication
    data = None
    url = authentication_url + urllib.quote_plus(id_url)
    request = urllib2.Request(url, data, headers)
    response = opener.open(request)

    # 4. Build up URL request with headers and data
    request = urllib2.Request(oauth_url, data, headers)
    response = opener.open(request)

    if '#' in response.url:
        print 'Access code provided in URL.'
    else:
        print 'No access code provided in URL.'

if __name__ == '__main__':
    username = raw_input('Enter your username: ')
    password = getpass('Enter your password: ')
    authenticate(username, password)

To respond to comments below:

Tamper data in Firefox requests the above URL (as oauth_url in the code) with the following headers:

Host=stackexchange.com
User-Agent=Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection=keep-alive
Cookie=m=2; __qca=P0-556807911-1326066608353; __utma=27693923.1085914018.1326066609.1326066609.1326066609.1; __utmb=27693923.3.10.1326066609; __utmc=27693923; __utmz=27693923.1326066609.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); gauthed=1; ASP.NET_SessionId=nt25smfr2x1nwhr1ecmd4ok0; se-usr=t=z0FHKC6Am06B&s=pblSq0x3B0lC

In the urllib2 request the header provides the user-agent value only. The cookie is not passed explicitly, but the se-usr is available in the cookie jar at the time of the request.

The response headers will be first the redirect:

Status=Found - 302
Server=nginx/0.7.65
Date=Sun, 08 Jan 2012 23:51:12 GMT
Content-Type=text/html; charset=utf-8
Connection=keep-alive
Cache-Control=private
Location=https://stackexchange.com/oauth/login_success#access_token=OYn42gZ6r3WoEX677A3BoA))&expires=86400
Set-Cookie=se-usr=t=kkdavslJe0iq&s=pblSq0x3B0lC; expires=Sun, 08-Jul-2012 23:51:12 GMT; path=/; HttpOnly
Content-Length=218

Then the redirect will take place through another request with the fresh se-usr value from that header.

I don't know how to catch the 302 in urllib2, it handles it by itself (which is great). It would be nice however to see if the access token as provided in the location header would be available.

There's nothing special in the last response header, both Firefox and Urllib return something like:

Server: nginx/0.7.65
Date: Sun, 08 Jan 2012 23:48:16 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Content-Length: 5664

I hope I didn't provide confidential info, let me know if I did :D

回答1:

The token does not appear because of the way urllib2 handles the redirect. I am not familiar with the details so I won't elaborate here.

The solution is to catch the 302 before the urllib2 handles the redirect. This can be done by sub-classing the urllib2.HTTPRedirectHandler to get the redirect with its hashtag and token. Here is a short example of subclassing the handler:

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Going through 302:\n"
        print headers
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

In the headers the location attribute will provide the redirect URL in full length, i.e. including the hashtag and token:

Output extract:

...
Going through 302:

Server: nginx/0.7.65
Date: Mon, 09 Jan 2012 20:20:11 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Location: https://stackexchange.com/oauth/login_success#access_token=K4zKd*HkKw5Opx(a8t12FA))&expires=86400
Content-Length: 218
...

More on catching redirects with urllib2 on StackOverflow (of course).

来源：https://stackoverflow.com/questions/8779497/how-to-programmatically-retrieve-access-token-from-client-side-oauth-flow-using

标签

python

oauth

urllib2