JSON to pandas DataFrame

后端 未结 11 1611
离开以前
离开以前 2020-11-22 05:46

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:

from urllib2         


        
相关标签:
11条回答
  • 2020-11-22 06:12

    Optimization of the accepted answer:

    The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:

    import requests
    from pandas import json_normalize
    url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'
    
    response = requests.get(url)
    dictr = response.json()
    recs = dictr['result']['records']
    df = json_normalize(recs)
    print(df)
    

    Output:

            _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
    0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
    1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
    2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
    3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
    4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
    ..      ...                        ...                  ...  ...                  ...               ...               ...
    995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
    996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
    997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
    998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
    999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2 
    

    PS: API is for Danish electricity prices

    0 讨论(0)
  • 2020-11-22 06:13

    billmanH's solution helped me but didn't work until i switched from:

    n = data.loc[row,'json_column']
    

    to:

    n = data.iloc[[row]]['json_column']
    

    here's the rest of it, converting to a dictionary is helpful for working with json data.

    import json
    
    for row in range(len(data)):
        n = data.iloc[[row]]['json_column'].item()
        jsonDict = json.loads(n)
        if ('mykey' in jsonDict):
            display(jsonDict['mykey'])
    
    0 讨论(0)
  • 2020-11-22 06:14

    I prefer a more generic method in which may be user doesn't prefer to give key 'results'. You can still flatten it by using a recursive approach of finding key having nested data or if you have key but your JSON is very nested. It is something like:

    from pandas import json_normalize
    
    def findnestedlist(js):
        for i in js.keys():
            if isinstance(js[i],list):
                return js[i]
        for v in js.values():
            if isinstance(v,dict):
                return check_list(v)
    
    
    def recursive_lookup(k, d):
        if k in d:
            return d[k]
        for v in d.values():
            if isinstance(v, dict):
                return recursive_lookup(k, v)
        return None
    
    def flat_json(content,key):
        nested_list = []
        js = json.loads(content)
        if key is None or key == '':
            nested_list = findnestedlist(js)
        else:
            nested_list = recursive_lookup(key, js)
        return json_normalize(nested_list,sep="_")
    
    key = "results" # If you don't have it, give it None
    
    csv_data = flat_json(your_json_string,root_key)
    print(csv_data)
    
    0 讨论(0)
  • 2020-11-22 06:19

    I found a quick and easy solution to what I wanted using json_normalize() included in pandas 1.01.

    from urllib2 import Request, urlopen
    import json
    
    import pandas as pd    
    
    path1 = '42.974049,-81.205203|42.974298,-81.195755'
    request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
    response = urlopen(request)
    elevations = response.read()
    data = json.loads(elevations)
    df = pd.json_normalize(data['results'])
    

    This gives a nice flattened dataframe with the json data that I got from the Google Maps API.

    0 讨论(0)
  • 2020-11-22 06:21

    You could first import your json data in a Python dictionnary :

    data = json.loads(elevations)
    

    Then modify data on the fly :

    for result in data['results']:
        result[u'lat']=result[u'location'][u'lat']
        result[u'lng']=result[u'location'][u'lng']
        del result[u'location']
    

    Rebuild json string :

    elevations = json.dumps(data)
    

    Finally :

    pd.read_json(elevations)
    

    You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)

    0 讨论(0)
  • 2020-11-22 06:25

    The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.

    for row in range(len(data)):
        #First I load the dict (one at a time)
        n = data.loc[row,'dict_column']
        #Now I make a new column that pulls out the data that I want.
        data.loc[row,'new_column'] = n.get('key')
    
    0 讨论(0)
提交回复
热议问题