Pandas, Pivot table from 2 columns with values being a count of one of those columns

后端 未结 2 1435
一整个雨季
一整个雨季 2021-01-01 07:49

I have a pandas dataframe:

+---------------+-------------+
| Test_Category | Test_Result |
+---------------+-------------+
| Cat_1         | Pass        |
|          


        
相关标签:
2条回答
  • 2021-01-01 08:33

    You could construct a new dataframe using unique values in the two columns as indices and columns, and use pandas' iterrows()

    df_out = pd.DataFrame(index=df['Test_Category'].unique().tolist(), columns=df['Test_Result'].unique().tolist())
    
    for index, row in df_out.iterrows():
        for col in df_out.columns:
            df_out.loc[index, col] = len(df[(df['Test_Category'] == index) & (df['Test_Result'] == col)])
    

    Output:

           Pass  nan  Fail
    Cat1     1    1     0
    Cat2     0    0     2
    Cat3     2    1     1
    

    Although using groupby() should definitely be faster.

    0 讨论(0)
  • 2021-01-01 08:35

    Here is problem NaN values are exluded, so necessary use fillna with crosstab:

    df1 = pd.crosstab(df['Test_Category'], df['Test_Result'].fillna('n/a'))
    print (df1)
    Test_Result    Fail  Pass  n/a
    Test_Category                 
    Cat_1             0     1    1
    Cat_2             2     0    0
    Cat_3             1     2    1
    

    Or use GroupBy.size with unstack for reshape:

    df['Test_Result'] = df['Test_Result'].fillna('n/a')
    
    df1 = df.groupby(['Test_Category','Test_Result']).size().unstack()
    print (df1)
    Test_Result    Fail  Pass  n/a
    Test_Category                 
    Cat_1           NaN   1.0  1.0
    Cat_2           2.0   NaN  NaN
    Cat_3           1.0   2.0  1.0
    

    df1 = df.groupby(['Test_Category','Test_Result']).size().unstack(fill_value=0)
    print (df1)
    Test_Result    Fail  Pass  n/a
    Test_Category                 
    Cat_1             0     1    1
    Cat_2             2     0    0
    Cat_3             1     2    1
    

    Another solution with pivot_table:

    df = df.pivot_table(index='Test_Category',columns='Test_Result', aggfunc='size')
    
    0 讨论(0)
提交回复
热议问题