python split data frame columns into multiple rows

后端未结

关注

 1  1854

I have a dataframe like this:

--------------------------------------------------------------------
Product        ProductType     SKU                Size
-------


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-02-10 16:08
              
            
            
                                                                       
This is open to bugs so use with caution:

Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)

df["Product"] = df["Product"].map(list) * df["SKU"].map(len)

Out[184]: 
                    SKU           Size       Product
0  [111, 222, 333, 444]  [XS, S, M, L]  [a, a, a, a]
1            [555, 666]         [M, L]        [b, b]


Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict():

pd.DataFrame(df.sum().to_dict())
Out[185]: 
  Product  SKU Size
0       a  111   XS
1       a  222    S
2       a  333    M
3       a  444    L
4       b  555    M
5       b  666    L


Edit:

For several columns, you can define the columns to be repeated:

cols_to_be_repeated = ["Product", "ProductType"]


Save the rows that has None values in another dataframe:

na_df = df[pd.isnull(df["SKU"])].copy()


Drop None's from the original dataframe:

df.dropna(inplace = True)


Iterate over those columns:

for col in cols_to_be_repeated:
    df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)


And use the same approach:

pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])

        Product ProductType    SKU  Size
0       T-shirt         Top  111.0    XS
1       T-shirt         Top  222.0     S
2       T-shirt         Top  333.0     M
3       T-shirt         Top  444.0     L
4  Pant(Flared)     Bottoms  555.0     M
5  Pant(Flared)     Bottoms  666.0     L
2       Sweater         Top    NaN  None


It might be better to work on a copy of the original dataframe.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复