Accessing nested JSON data as dataframes in Pandas

前端未结

关注

 1  1333

I have the following data

{ \"results\": [
    {
        \"company\": \"XYZ\",
        \"createdAt\": \"2014-03-27T23:21:48.758Z\",
        \"email\": \"abc@


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-16 13:26
              
            
            
                                                                       
Not sure how your multiple observations are organized in json. But it is clear that what is causing problem is you are having a nested structure for the "profilePicture" field. Therefore each observation is expressed as a nested dictionary. You need to convert each observation to a dataframe and concat them into the final dataframe as in this solution.

In [3]:
print df
                                             results
0  {u'linkedinAccount': u'', u'username': u'abc@g...
1  {u'linkedinAccount': u'', u'username': u'abc@g...

[2 rows x 1 columns]
In [4]: 
print pd.concat([pd.DataFrame.from_dict(item, orient='index').T for item in df.results])


  linkedinAccount       username registrationGate firstName title lastName  \
0                  abc@gmail.com           normal       abc    AA      xyz   
0                  abc@gmail.com           normal       abc    AA      xyz   

  company telephone                                     profilePicture  \
0     XYZ            {u'url': u'url.url.com', u'__type': u'File', u...   
0     ABC            {u'url': u'url.url.com', u'__type': u'File', u...   

  location                 updatedAt          email                 createdAt  \
0           2014-03-27T23:24:20.220Z  abc@gmail.com  2014-03-27T23:21:48.758Z   
0           2014-03-27T23:24:20.220Z  abc@gmail.com  2014-03-27T23:21:48.758Z   

  zipcode  
0   00000  
0   00000  

[2 rows x 14 columns]


Then you may want to think about how to deal the the profilePicture column. You can do what @U2EF1 suggested in the link. But I would probably just break that column into three columns pfPIC_url, pfPIC_type, pfPIC_name
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复