'Could not interpret input' error with Seaborn when plotting groupbys

后端 未结 1 902
再見小時候
再見小時候 2021-02-19 20:02

Say I have this dataframe

d = {     \'Path\'   : [\'abc\', \'abc\', \'ghi\',\'ghi\', \'jkl\',\'jkl\'],
          \'Detail\' : [\'foo\', \'bar\', \'bar\',\'fo         


        
1条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-02-19 20:59

    The reason for the exception you are getting is that Program becomes an index of the dataframes df_mean and df_count after your group_by operation.

    If you wanted to get the factorplot from df_mean, an easy solution is to add the index as a column,

    In [7]:
    
    df_mean['Program'] = df_mean.index
    
    In [8]:
    
    %matplotlib inline
    import seaborn as sns
    sns.factorplot(x='Program', y='Value', data=df_mean)
    

    However you could even more simply let factorplot do the calculations for you,

    sns.factorplot(x='Program', y='Value', data=df)
    

    You'll obtain the same result. Hope it helps.

    EDIT after comments

    Indeed you make a very good point about the parameter as_index; by default it is set to True, and in that case Program becomes part of the index, as in your question.

    In [14]:
    
    df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
    df_mean
    
    Out[14]:
            Value
    Program 
    prog3   45
    prog2   40
    prog1   20
    

    Just to be clear, this way Program is not column anymore, but it becomes the index. the trick df_mean['Program'] = df_mean.index actually keeps the index as it is, and adds a new column for the index, so that Program is duplicated now.

    In [15]:
    
    df_mean['Program'] = df_mean.index
    df_mean
    
    Out[15]:
            Value   Program
    Program     
    prog3   45  prog3
    prog2   40  prog2
    prog1   20  prog1
    

    However, if you set as_index to False, you get Program as a column, plus a new autoincrement index,

    In [16]:
    
    df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
    df_mean
    
    Out[16]:
        Program Value
    2   prog3   45
    1   prog2   40
    0   prog1   20
    

    This way you could feed it directly to seaborn. Still, you could use df and get the same result.

    Hope it helps.

    0 讨论(0)
提交回复
热议问题