What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:
from urllib2
Optimization of the accepted answer:
The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:
import requests
from pandas import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'
response = requests.get(url)
dictr = response.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)
Output:
_id HourUTC HourDK ... ElbasAveragePriceEUR ElbasMaxPriceEUR ElbasMinPriceEUR
0 264028 2019-01-01T00:00:00+00:00 2019-01-01T01:00:00 ... NaN NaN NaN
1 138428 2017-09-03T15:00:00+00:00 2017-09-03T17:00:00 ... 33.28 33.4 32.0
2 138429 2017-09-03T16:00:00+00:00 2017-09-03T18:00:00 ... 35.20 35.7 34.9
3 138430 2017-09-03T17:00:00+00:00 2017-09-03T19:00:00 ... 37.50 37.8 37.3
4 138431 2017-09-03T18:00:00+00:00 2017-09-03T20:00:00 ... 39.65 42.9 35.3
.. ... ... ... ... ... ... ...
995 139290 2017-10-09T13:00:00+00:00 2017-10-09T15:00:00 ... 38.40 38.4 38.4
996 139291 2017-10-09T14:00:00+00:00 2017-10-09T16:00:00 ... 41.90 44.3 33.9
997 139292 2017-10-09T15:00:00+00:00 2017-10-09T17:00:00 ... 46.26 49.5 41.4
998 139293 2017-10-09T16:00:00+00:00 2017-10-09T18:00:00 ... 56.22 58.5 49.1
999 139294 2017-10-09T17:00:00+00:00 2017-10-09T19:00:00 ... 56.71 65.4 42.2
PS: API is for Danish electricity prices
billmanH's solution helped me but didn't work until i switched from:
n = data.loc[row,'json_column']
to:
n = data.iloc[[row]]['json_column']
here's the rest of it, converting to a dictionary is helpful for working with json data.
import json
for row in range(len(data)):
n = data.iloc[[row]]['json_column'].item()
jsonDict = json.loads(n)
if ('mykey' in jsonDict):
display(jsonDict['mykey'])
I prefer a more generic method in which may be user doesn't prefer to give key 'results'. You can still flatten it by using a recursive approach of finding key having nested data or if you have key but your JSON is very nested. It is something like:
from pandas import json_normalize
def findnestedlist(js):
for i in js.keys():
if isinstance(js[i],list):
return js[i]
for v in js.values():
if isinstance(v,dict):
return check_list(v)
def recursive_lookup(k, d):
if k in d:
return d[k]
for v in d.values():
if isinstance(v, dict):
return recursive_lookup(k, v)
return None
def flat_json(content,key):
nested_list = []
js = json.loads(content)
if key is None or key == '':
nested_list = findnestedlist(js)
else:
nested_list = recursive_lookup(key, js)
return json_normalize(nested_list,sep="_")
key = "results" # If you don't have it, give it None
csv_data = flat_json(your_json_string,root_key)
print(csv_data)
I found a quick and easy solution to what I wanted using json_normalize()
included in pandas 1.01
.
from urllib2 import Request, urlopen
import json
import pandas as pd
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])
This gives a nice flattened dataframe with the json data that I got from the Google Maps API.
You could first import your json data in a Python dictionnary :
data = json.loads(elevations)
Then modify data on the fly :
for result in data['results']:
result[u'lat']=result[u'location'][u'lat']
result[u'lng']=result[u'location'][u'lng']
del result[u'location']
Rebuild json string :
elevations = json.dumps(data)
Finally :
pd.read_json(elevations)
You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)
The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.
for row in range(len(data)):
#First I load the dict (one at a time)
n = data.loc[row,'dict_column']
#Now I make a new column that pulls out the data that I want.
data.loc[row,'new_column'] = n.get('key')