Unfold a nested dictionary with lists into a pandas DataFrame

后端未结

关注

 2  1630

I have a nested dictionary, whereby the sub-dictionary use lists:

nested_dict = {\'string1\': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    `string2` :{2


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2021-01-18 19:01
              
            
            
                                                                       
This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas way) to do it.

I parsed your nested dict and built a list of dictionaries (one for each row).

# some sample input
nested_dict = {
    'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
    'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}

# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
    curr_dict = nested_dict[k1]
    for k2 in curr_dict:
        new_dict = {'col1': k1, 'col2': k2}
        new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
        new_list.append(new_dict)

# create a DataFrame from new list
df = pd.DataFrame(new_list)


The output:

      col1   col2  col3  col4    col5   col6
0  string2  28672    82    23     NaN    NaN
1  string2  22736    82    93  1102.0  102.0
2  string2  19423    64    23     NaN    NaN
3  string3  19424    65    24     NaN    NaN
4  string3  28673    83    24     NaN    NaN
5  string3  22737    83    94  1103.0  103.0
6  string1     65     1     1     NaN    NaN
7  string1     67   682    12     NaN    NaN
8  string1     69  1231   232     NaN    NaN


There is an assumption that the input will always contain enough data to create a col1 and a col2.

I loop through nested_dict. It is assumed that each element of nested_dict is also a dictionary. We loop through that dictionary as well (curr_dict). The keys k1 and k2 are used to populate col1 and col2. For the rest of the keys, we iterate through the list contents and add a column for each element.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2021-01-18 19:08
              
            
            
                                                                       
Here's a method which uses a recursive generator to unroll the nested dictionaries. It won't assume that you have exactly two levels, but continues unrolling each dict until it hits a list.

nested_dict = {
    'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
    'string3': [101, 102]}

def unroll(data):
    if isinstance(data, dict):
        for key, value in data.items():
            # Recursively unroll the next level and prepend the key to each row.
            for row in unroll(value):
                yield [key] + row
    if isinstance(data, list):
        # This is the bottom of the structure (defines exactly one row).
        yield data

df = pd.DataFrame(list(unroll(nested_dict)))


Because unroll produces a list of lists rather than dicts, the columns will be named numerically (from 0 to 5 in this case). So you need to use rename to get the column labels you want:

df.rename(columns=lambda i: 'col{}'.format(i+1))


This returns the following result (note that the additional string3 entry is also unrolled).

      col1   col2  col3   col4    col5   col6
0  string1     69  1231  232.0     NaN    NaN
1  string1     67   682   12.0     NaN    NaN
2  string1     65     1    1.0     NaN    NaN
3  string2  28672    82   23.0     NaN    NaN
4  string2  22736    82   93.0  1102.0  102.0
5  string2  19423    64   23.0     NaN    NaN
6  string3    101   102    NaN     NaN    NaN

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复