pandas: how to run a pivot with a multi-index?

前端 未结 3 1434
轻奢々
轻奢々 2020-12-07 13:10

I would like to run a pivot on a pandas DataFrame, with the index being two columns, not one. For example, one field for the year, one for the month, an \'item\

相关标签:
3条回答
  • 2020-12-07 13:59

    I believe if you include item in your MultiIndex, then you can just unstack:

    df.set_index(['year', 'month', 'item']).unstack(level=-1)
    

    This yields:

                    value      
    item       item 1 item 2
    year month              
    2004 1         21    277
         2         43    244
         3         12    262
         4         80    201
         5         22    287
         6         52    284
         7         90    249
         8         14    229
         9         52    205
         10        76    207
         11        88    259
         12        90    200
    

    It's a bit faster than using pivot_table, and about the same speed or slightly slower than using groupby.

    0 讨论(0)
  • 2020-12-07 14:11

    You can group and then unstack.

    >>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
    item        item 1  item 2
    year month                
    2004 1          33     250
         2          44     224
         3          41     268
         4          29     232
         5          57     252
         6          61     255
         7          28     254
         8          15     229
         9          29     258
         10         49     207
         11         36     254
         12         23     209
    

    Or use pivot_table:

    >>> df.pivot_table(
            values='value', 
            index=['year', 'month'], 
            columns='item', 
            aggfunc=np.sum)
    item        item 1  item 2
    year month                
    2004 1          33     250
         2          44     224
         3          41     268
         4          29     232
         5          57     252
         6          61     255
         7          28     254
         8          15     229
         9          29     258
         10         49     207
         11         36     254
         12         23     209
    
    0 讨论(0)
  • 2020-12-07 14:11

    thanks to gmoutso comment you can use this:

    def multiindex_pivot(df, index=None, columns=None, values=None):
        if index is None:
            names = list(df.index.names)
            df = df.reset_index()
        else:
            names = index
        list_index = df[names].values
        tuples_index = [tuple(i) for i in list_index] # hashable
        df = df.assign(tuples_index=tuples_index)
        df = df.pivot(index="tuples_index", columns=columns, values=values)
        tuples_index = df.index  # reduced
        index = pd.MultiIndex.from_tuples(tuples_index, names=names)
        df.index = index
        return df
    

    usage:

    df.pipe(multiindex_pivot, index=['idx_column1', 'idx_column2'], columns='foo', values='bar')
    

    You might want to have a simple flat column structure and have columns to be of their intended type, simply add this:

    (df
       .infer_objects()  # coerce to the intended column type
       .rename_axis(None, axis=1))  # flatten column headers
    
    0 讨论(0)
提交回复
热议问题