'Could not interpret input' error with Seaborn when plotting groupbys

后端未结

关注

 1  918

再見小時候 2021-02-19 20:02

Say I have this dataframe

d = {     \'Path\'   : [\'abc\', \'abc\', \'ghi\',\'ghi\', \'jkl\',\'jkl\'],
          \'Detail\' : [\'foo\', \'bar\', \'bar\',\'fo


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   小鲜肉
                                             
                
                
                (楼主)
            
              
              
                2021-02-19 20:59
              

            
            
                        
The reason for the exception you are getting is that Program becomes an index of the dataframes df_mean and df_count after your group_by operation.

If you wanted to get the factorplot from df_mean, an easy solution is to add the index as a column,

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)


However you could even more simply let factorplot do the calculations for you,

sns.factorplot(x='Program', y='Value', data=df)


You'll obtain the same result.
Hope it helps.

EDIT after comments

Indeed you make a very good point about the parameter as_index; by default it is set to True, and in that case Program becomes part of the index, as in your question.

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20


Just to be clear, this way Program is not column anymore, but it becomes the index. the trick df_mean['Program'] = df_mean.index actually keeps the index as it is, and adds a new column for the index, so that Program is duplicated now.

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1


However, if you set as_index to False, you get Program as a column, plus a new autoincrement index,

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20


This way you could feed it directly to seaborn. Still, you could use df and get the same result.

Hope it helps.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复