Parsing HTTP Response in Python

半城伤御伤魂 提交于 2019-12-18 10:42:09

问题


I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep.

Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)?

from urllib.request import urlopen

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

print(response.read()) # returns string with info

回答1:


When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.

I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:

from urllib.request import urlopen
import json

# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

print(json_obj['source_name']) # prints the string with 'source_name' key



回答2:


You can also use python's requests library instead.

import requests

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'    
response = requests.get(url)    
dict = response.json()

Now you can manipulate the "dict" like a python dictionary.




回答3:


json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:

#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r, \
     io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
    result = json.load(file)
print(result['headers']['User-Agent'])

It is not necessary to use io.TextIOWrapper here:

#!/usr/bin/env python3
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r:
    result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])



回答4:


I guess things have changed in python 3.4. This worked for me:

print("resp:" + json.dumps(resp.json()))


来源:https://stackoverflow.com/questions/23049767/parsing-http-response-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!