Normalize a complex nested JSON file

后端未结

关注

 1  1229

离开以前 2021-01-28 04:10

Im trying to normalize the below json file into 4 tables - \"content\", \"Modules\", \"Images\" and \"Everything Else in another table\"

{
    \"id\": \"0000050


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   孤城傲影
                                             
                
                
                (楼主)
            
              
              
                2021-01-28 04:38
              

            
            
                        
You could use the function defined by https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10, as follows, and then use json_normalize :

import pandas as pd
import json
with open('test.json') as json_file:
    data = json.load(json_file)

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

module =  flatten_json(data["content"][0])
module = pd.json_normalize(module)


Then, what you have to do is select the columns according to the four categories you described. 
The output is:

ID_0  content_revision  ... locale_data_locale locale_data_identified_by
0  B01     1580225050941  ...              en_US            MACHINE_DETECT


Then you select as follows, for instance for your module and image DataFrames:

module = df.loc[:,df.columns.str.contains("module")]
image = df.loc[:,df.columns.str.contains("image")]


The result you get  for module for instance is : 

template_module_0_id  ... template_module_1_product
0            module-11  ...                      None


Then, I give the example for the transformation of the module DataFrame, you only have two modules so you can do a concat after renaming the columns:

module1 = module.loc[:,module.columns.str.contains("module_0")]
module1.columns = module1.columns.str.replace("_0","")
module2 = module.loc[:,module.columns.str.contains("module_1")]
module2.columns = module2.columns.str.replace("_1","")
modules = pd.concat([module1,
                     module2])


And you get:

 template_module_id  ... template_module_image_7_originalSrc
0          module-11  ...                                 NaN
0           module-6  ...                                None


The other option if you had a lot more elements would be to use the flatten_json and json_normalize functions directly on the nested element you want. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复