I\'ve this string Traor\\u0102\\u0160
Traor\\u0102\\u0160
Should produce Traoré
. Then Traoré
utf-8 decoded shou
For me your site returns "Traor\u00e9"
(the last character is é):
r = requests.get(url)
print(json.dumps(json.loads(r.content)['Item']['LastName']))
# -> "Traor\u00e9" -> Traoré
r.json
(r.text
) produces incorrect content here. Either server or requests
or both use incorrect encoding that results in "Traor\u0102\u0160"
. The encoding of JSON text is completely defined by its content therefore it is always possible to decode it whatever headers server sends, from json rfc:
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
In this case there are no zero bytes at the start of r.content
so json.loads
works otherwise you need manually to convert it to a Unicode string if the server sends incorrect character encoding in Content-Type
header or to workaround requests
bug