Parse an HTTP request Authorization header with Python

前端 未结 10 507
情书的邮戳
情书的邮戳 2020-12-30 06:47

I need to take a header like this:

 Authorization: Digest qop=\"chap\",
     realm=\"testrealm@host.com\",
     username=\"Foobear\",
     response=\"6629fae         


        
相关标签:
10条回答
  • 2020-12-30 07:22

    Here's my pyparsing attempt:

    text = """Authorization: Digest qop="chap",
        realm="testrealm@host.com",     
        username="Foobear",     
        response="6629fae49393a05397450978507c4ef1",     
        cnonce="5ccc069c403ebaf9f0171e9517f40e41" """
    
    from pyparsing import *
    
    AUTH = Keyword("Authorization")
    ident = Word(alphas,alphanums)
    EQ = Suppress("=")
    quotedString.setParseAction(removeQuotes)
    
    valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
    authentry = AUTH + ":" + ident("protocol") + valueDict
    
    print authentry.parseString(text).dump()
    

    which prints:

    ['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'],
     ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
     ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
    - cnonce: 5ccc069c403ebaf9f0171e9517f40e41
    - protocol: Digest
    - qop: chap
    - realm: testrealm@host.com
    - response: 6629fae49393a05397450978507c4ef1
    - username: Foobear
    

    I'm not familiar with the RFC, but I hope this gets you rolling.

    0 讨论(0)
  • 2020-12-30 07:23

    You can also use urllib2 as CheryPy does.

    here is the snippet:

    input= """
     Authorization: Digest qop="chap",
         realm="testrealm@host.com",
         username="Foobear",
         response="6629fae49393a05397450978507c4ef1",
         cnonce="5ccc069c403ebaf9f0171e9517f40e41"
    """
    import urllib2
    field, sep, value = input.partition("Authorization: Digest ")
    if value:
        items = urllib2.parse_http_list(value)
        opts = urllib2.parse_keqv_list(items)
        opts['protocol'] = 'Digest'
        print opts
    

    it outputs:

    {'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': 'testrealm@host.com', 'response': '6629fae49393a05397450978507c4ef1'}
    
    0 讨论(0)
  • 2020-12-30 07:25

    Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

    It appears that getting pyparsing on google app engine is easy: How do I get PyParsing set up on the Google App Engine?

    So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

    0 讨论(0)
  • 2020-12-30 07:30

    If those components will always be there, then a regex will do the trick:

    test = '''Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''
    
    import re
    
    re_auth = re.compile(r"""
        Authorization:\s*(?P<protocol>[^ ]+)\s+
        qop="(?P<qop>[^"]+)",\s+
        realm="(?P<realm>[^"]+)",\s+
        username="(?P<username>[^"]+)",\s+
        response="(?P<response>[^"]+)",\s+
        cnonce="(?P<cnonce>[^"]+)"
        """, re.VERBOSE)
    
    m = re_auth.match(test)
    print m.groupdict()
    

    produces:

    { 'username': 'Foobear', 
      'protocol': 'Digest', 
      'qop': 'chap', 
      'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
      'realm': 'testrealm@host.com', 
      'response': '6629fae49393a05397450978507c4ef1'
    }
    
    0 讨论(0)
  • 2020-12-30 07:40

    I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

    For a while check the snippet below (it should mostly work):

    input= """
     Authorization: Digest qop="chap",
         realm="testrealm@host.com",
         username="Foob,ear",
         response="6629fae49393a05397450978507c4ef1",
         cnonce="5ccc069c403ebaf9f0171e9517f40e41"
    """
    
    field, sep, value = input.partition(":")
    if field.endswith('Authorization'):
       protocol, sep, opts_str = value.strip().partition(" ")
    
       opts = {}
       for opt in opts_str.split(",\n"):
            key, value = opt.strip().split('=')
            key = key.strip(" ")
            value = value.strip(' "')
            opts[key] = value
    
       opts['protocol'] = protocol
    
       print opts
    
    0 讨论(0)
  • 2020-12-30 07:42

    Nadia's regex only matches alphanumeric characters for the value of a parameter. That means it fails to parse at least two fields. Namely, the uri and qop. According to RFC 2617, the uri field is a duplicate of the string in the request line (i.e. the first line of the HTTP request). And qop fails to parse correctly if the value is "auth-int" due to the non-alphanumeric '-'.

    This modified regex allows the URI (or any other value) to contain anything but ' ' (space), '"' (qoute), or ',' (comma). That's probably more permissive than it needs to be, but shouldn't cause any problems with correctly formed HTTP requests.

    reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')
    

    Bonus tip: From there, it's fairly straight forward to convert the example code in RFC-2617 to python. Using python's md5 API, "MD5Init()" becomes "m = md5.new()", "MD5Update()" becomes "m.update()" and "MD5Final()" becomes "m.digest()".

    0 讨论(0)
提交回复
热议问题