问题
I use Python Rquests to extract full headers of responses.
I want to accurately count how many cookies (i.e. nam/variable) pairs in a response. There are two issues:
1) If a server responded with multiple Set-Cookie headers. How does Requests represent this? Does it combine both Set-Cookie values in one? Or leave it as is?
Here is my script to print headers (full header):
import requests
requests.packages.urllib3.disable_warnings() # to disable certificate warnings
response = requests.get("https://example.com",verify=False,timeout=3)
print(str(response.headers))
response_headers = response.headers.get('Set-Cookie')
But when I look at some Set-Cookie
response headers I found some name/value pairs are separated by comma like this:
dnn_IsMobile=False; path=/; secure; HttpOnly, Analytics_VisitorId=aa; expires=Mon 19-Aug-2019 14:20:02 GMT; path=/; secure; HttpOnly, Analytics=SessionId=vv&ContentItemId=-1; expires=Sat 20-Jul-2019 15:20:02 GMT; path=/; secure
2) Does this mean the server sent multiple Set-Cookie
and Requests combined them?
If requests adds the comma between the name/value pairs of the cookies, does it always separate them with a comma followed by a space? i.e. cookie1=value, cookie2=value
and not just a comma like cookie1=value,cookie2=value
.
Understanding this difference is very important to me to be able to count the right number of cookies received.
回答1:
How to count the number of cookies and fetching them
You can use the higher level .cookies
to get them, instead of using .headers
.
For example:
>>> url="https://github.com"
>>> r = requests.get(url)
>>> r.cookies
<RequestsCookieJar[Cookie(version=0, name='_octo', value='GH1.1.1081626831.1563694143', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1626852543, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='logged_in', value='no', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=True, expires=2194846143, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='_gh_sess', value='N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=True, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='has_recent_activity', value='1', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=1563697743, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> len(r.cookies)
4
>>> r.cookies.keys()
['_octo', 'logged_in', '_gh_sess', 'has_recent_activity']
>>> for key in r.cookies.iterkeys(): print("{}: {}".format(key, r.cookies[key]))
...
_octo: GH1.1.1081626831.1563694143
logged_in: no
_gh_sess: N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20
has_recent_activity: 1
P.S. Sometimes it's easier to read the source code, I found that by reading cookies.py :)
Edit about the delimiter(whether ", "
or ","
) in r.headers.get("Set-Cookie")
:
Requests
usesurllib3
under the hood, you will find thatr.raw
is an object ofurllib3.response.HTTPResponse
.In urllib3, headers are represented by
HTTPHeaderDict
defined in_collections.py
, and multiple values are joined by", "
there.def __getitem__(self, key): val = self._container[key.lower()] return ", ".join(val[1:])
Also, there an issue about this in urllib3, and a test case for it.
So, you can use ", "
to count the number of cookies.
Does requests combine multiple Set-Cookies
into one in headers
?
I'm afraid the answer is yes, as by examining its value (some unrelevant headers are removed for better reading):
>>> r.headers
{
'Date': 'Sun, 21 Jul 2019 07:29:03 GMT',
'Content-Type': 'text/html; charset=utf-8',
'Transfer-Encoding': 'chunked',
'Server': 'GitHub.com',
'Status': '200 OK',
'Set-Cookie': 'has_recent_activity=1; path=/; expires=Sun, 21 Jul 2019 08:29:03 -0000, _octo=GH1.1.1081626831.1563694143; domain=.github.com; path=/; expires=Wed, 21 Jul 2021 07:29:03 -0000, logged_in=no; domain=.github.com; path=/; expires=Thu, 21 Jul 2039 07:29:03 -0000; secure; HttpOnly, _gh_sess=N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20; path=/; secure; HttpOnly',
'Content-Encoding': 'gzip',
'X-GitHub-Request-Id': 'A947:3711:E0377A:13B4CEA:5D34143E'
}
来源:https://stackoverflow.com/questions/57125660/how-does-pythons-requests-treat-multiple-cookies-in-a-header