Pandas - expand nested json array within column in dataframe

前端 未结 3 986
礼貌的吻别
礼貌的吻别 2021-02-05 17:17

I have a json data (coming from mongodb) containing thousands of records (so an array/list of json object) with a structure like the below one for each object:

{         


        
相关标签:
3条回答
  • 2021-02-05 17:32
    import pandas as pd
    import json
    
    data = '''
    [
      {
       "id":1,
       "first_name":"Mead",
       "last_name":"Lantaph",
       "email":"mlantaph0@opensource.org",
       "gender":"Male",
       "ip_address":"231.126.209.31",
       "nested_array_to_expand":[
          {
             "property":"Quaxo",
             "json_obj":{
                "prop1":"Chevrolet",
                "prop2":"Mercy Streets"
             }
          },
          {
             "property":"Blogpad",
             "json_obj":{
                "prop1":"Hyundai",
                "prop2":"Flashback"
             }
          },
          {
             "property":"Yabox",
             "json_obj":{
                "prop1":"Nissan",
                "prop2":"Welcome Mr. Marshall (Bienvenido Mister Marshall)"
             }
          }
       ]
      }
    ]
    '''
    data = json.loads(data)
    result = pd.json_normalize(data, "nested_array_to_expand", 
                               ['email', 'first_name', 'gender', 'id', 'ip_address', 'last_name'])
    
    

    result

    
      property json_obj.prop1                                     json_obj.prop2  \
    0    Quaxo      Chevrolet                                      Mercy Streets   
    1  Blogpad        Hyundai                                          Flashback   
    2    Yabox         Nissan  Welcome Mr. Marshall (Bienvenido Mister Marshall)   
    
                          email first_name gender id      ip_address last_name  
    0  mlantaph0@opensource.org       Mead   Male  1  231.126.209.31   Lantaph  
    1  mlantaph0@opensource.org       Mead   Male  1  231.126.209.31   Lantaph  
    2  mlantaph0@opensource.org       Mead   Male  1  231.126.209.31   Lantaph  
    

    More information about json_normalize: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

    0 讨论(0)
  • 2021-02-05 17:34

    The following code is what you want. You can unroll the nested list using python's built in list function and passing that as a new dataframe. pd.DataFrame(list(json_dict['nested_col']))

    You might have to do several iterations of this, depending on how nested your data is.

    from pandas.io.json import json_normalize
    
    
    df= pd.concat([pd.DataFrame(json_dict), pd.DataFrame(list(json_dict['nested_array_to_expand']))], axis=1).drop('nested_array_to_expand', 1)
    
    0 讨论(0)
  • 2021-02-05 17:37

    I propose an interesting answer I think using pandas.json_normalize.
    I use it to expand the nested json -- maybe there is a better way, but you definitively should consider using this feature. Then you have just to rename the columns as you want.

    import io
    from pandas import json_normalize
    
    # Loading the json string into a structure
    json_dict = json.load(io.StringIO(json_str))
    
    df = pd.concat([pd.DataFrame(json_dict), 
                    json_normalize(json_dict['nested_array_to_expand'])], 
                    axis=1).drop('nested_array_to_expand', 1)
    

    0 讨论(0)
提交回复
热议问题