Pandas pivot table ValueError: Index contains duplicate entries, cannot reshape

后端未结

关注

 2  928

I have a dataframe as shown below (top 3 rows):

Sample_Name Sample_ID   Sample_Type IS  Component_Name  IS_Name Component_Group_Name    Outlier_Reasons Actua


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-01-03 04:08
              
            
            
                                                                       
You should be able to accomplish what you are looking to do by using the the pandas.pivot_table() functionality as documented here. 

With your dataframe stored as df use the following code:

import pandas as pd
df = pd.read_table('table_from_which_to_read')

new_df = pd.pivot_table(df,index=['Simple Name'], columns = 'Component_Name', values = "Calculated_Concentration")


If you want something other than the mean of the concentration value, you will need to change the aggfunc parameter. 

EDIT

Since you don't want to aggregate over the values, you can reshape the data by using the set_index function on your DataFrame with documentation found here. 

import pandas as pd
df = pd.DataFrame({'NonUniqueLabel':['Item1','Item1','Item1','Item2'],
     'SemiUniqueValue':['X','Y','Z','X'], 'Value':[1.0,100,5,None])

new_df = df.set_index(['NonUniqueLabel','SemiUniqueLabel'])


The resulting table should look like what you expect the results to be and will have a multi-index. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2021-01-03 04:24
              
            
            
                                                                       
You can use groupby() and unstack() to get around the error you're seeing with pivot().  

Here's some example data, with a few edge cases added, and some column values removed or substituted for MCVE:

# df
      Sample_Name  Sample_ID     IS Component_Name Calculated_Concentration Outlier_Reasons
Index                                                                    
1             foo        NaN   True              x                  NaN              NaN  
1             foo        NaN   True              y                  NaN              NaN 
2             foo        NaN   False             z            125.92766              NaN 
2             bar        NaN   False             x                 1.00              NaN  
2             bar        NaN   False             y                 2.00              NaN  
2             bar        NaN   False             z                  NaN              NaN  

(df.groupby(['Sample_Name','Component_Name'])
   .Calculated_Concentration
   .first()
   .unstack()
)


Output:

Component_Name    x   y          z
Sample_Name                       
bar             1.0 2.0        NaN
foo             NaN NaN  125.92766

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复