Pivot a pandas DataFrame to be the correct format: `DataError: No numeric types to aggregate`

后端未结

关注

 4  1805

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd

data = {\"grouping\": [\"item1\", \"item1\", \"item1\", \"item2\", \"item2\", \"


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-21 07:06
              
            
            
                                                                       
Use set_index and unstack:

df = df.set_index(['grouping','labels']).unstack().rename_axis(None)
df.columns = df.columns.droplevel()
print(df)


Output:

labels  A    B    C     D
item1   5    1    8  None
item2   3  731  189     9

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-21 07:15
              
            
            
                                                                       
There are four idiomatic pandas ways to do this.


No duplicates among grouping columns.  Does not require aggregation


pivot
set_index

Duplicates among grouping columns. Does require aggregation


pivot_table
groupby



pivot  

df.pivot('grouping', 'labels', 'count')


set_index  

df.set_index(['grouping', 'labels'])['count'].unstack()


pivot_table  

df.pivot_table('count', 'grouping', 'labels')


groupby  

df.groupby(['grouping', 'labels'])['count'].sum().unstack()


All yield

labels      A      B      C    D
grouping                        
item1     5.0    1.0    8.0  NaN
item2     3.0  731.0  189.0  9.0


timing  



With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0

df.pivot_table('count', 'grouping', 'labels', fill_value=0)

df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)

df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)


All yield

labels    A    B    C  D
grouping                
item1     5    1    8  0
item2     3  731  189  9




Additional thoughts on groupby

Because we don't require any aggregation.  If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator. 

df.groupby(['grouping', 'labels'])['count'].max().unstack()


or 

df.groupby(['grouping', 'labels'])['count'].first().unstack()


timing groupby  


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-12-21 07:18
              
            
            
                                                                       
Try:

In [1]: import pandas as pd
   ...: 
   ...: data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2"],
   ...:         "labels": ["A", "B", "C", "A", "B", "C", "D"],
   ...:         "count": [5, 1, 8, 3, 731, 189, 9]}
   ...: 
In [2]: df = pd.DataFrame(data)
In [3]: df.pivot_table(index="grouping",columns="labels")

Out[3]: 
             count              
    labels       A    B    C   D
    grouping                    
    item1        5    1    8 NaN
    item2        3  731  189   9

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-21 07:19
              
            
            
                                                                       
You put labels in the index, but you want it in the columns:

>>> df.pivot_table(index='grouping', columns='labels')
         count                   
labels       A      B      C    D
grouping                         
item1      5.0    1.0    8.0  NaN
item2      3.0  731.0  189.0  9.0


Note that this makes the columns a MultiIndex.  If you don't want that, explicitly pass values: df.pivot_table(index='grouping', columns='labels', values='count').

Also, note that the kind of reshape you seem to be looking for will only be possible if each combination of grouping and label has exactly one or zero values.  If any combination occurs more than once, you need to decide how to aggregate them (e.g., by summing the matching values).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复