Pandas pivot table for multiple columns at once

后端 未结 3 655
无人及你
无人及你 2020-12-05 17:04

Let\'s say I have a DataFrame:

   nj  ptype  wd  wpt
0   2      1   2    1
1   3      2   1    2
2   1      1   3    1
3   2      2   3    3
4   3      1   2         


        
相关标签:
3条回答
  • 2020-12-05 17:38

    Instead of doing it in one step, you can do the aggregation firstly and then pivot it using unstack method:

    (df.set_index('ptype')
     .groupby(level='ptype')
    # to do the count of columns nj, wd, wpt against the column ptype using 
    # groupby + value_counts
     .apply(lambda g: g.apply(pd.value_counts))
     .unstack(level=1)
     .fillna(0))
    
    #      nj             wd            wpt
    #       1    2    3    1    2    3    1    2    3
    #ptype                                  
    #1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
    #2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0
    

    Another option to avoid using apply method:

    (df.set_index('ptype').stack()
     .groupby(level=[0,1])
     .value_counts()
     .unstack(level=[1,2])
     .fillna(0)
     .sort_index(axis=1))
    

    Naive Timing on the sample data:

    Original solution:

    %%timeit
    nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
    wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
    wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
    out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
    out.columns.names = [None, None]
    # 100 loops, best of 3: 12 ms per loop
    

    Option one:

    %%timeit
    (df.set_index('ptype')
     .groupby(level='ptype')
     .apply(lambda g: g.apply(pd.value_counts))
     .unstack(level=1)
     .fillna(0))
    # 100 loops, best of 3: 10.1 ms per loop
    

    Option two:

    %%timeit 
    (df.set_index('ptype').stack()
     .groupby(level=[0,1])
     .value_counts()
     .unstack(level=[1,2])
     .fillna(0)
     .sort_index(axis=1))
    # 100 loops, best of 3: 4.3 ms per loop
    
    0 讨论(0)
  • 2020-12-05 17:54

    An easier solution is

    employee.pivot_table(index= ‘Title’, values= “Salary”, aggfunc= [np.mean, np.median, min, max, np.std], fill_value=0)
    

    In this case, for the salary column we are using different aggregate functions

    0 讨论(0)
  • 2020-12-05 17:56

    Another solution using groupby and unstack.

    df2 = pd.concat([df.groupby(['ptype',e])[e].count().unstack() for e in ['nj','wd','wpt']],axis=1).fillna(0).astype(int)    
    df2.columns=pd.MultiIndex.from_product([['nj','wd','wpt'],[1.0,2.0,3.0]])
    
    df2
    Out[207]: 
           nj          wd         wpt        
          1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0
    ptype                                    
    1       1   1   1   0   2   1   2   1   0
    2       0   1   1   1   0   1   0   1   1
    
    0 讨论(0)
提交回复
热议问题