How can I generalize my pandas data grouping to more than 3 dimensions?

后端 未结 1 1302
盖世英雄少女心
盖世英雄少女心 2021-02-04 12:26

I\'m using the excellent pandas package to deal with a large amount of varied meteorological diagnostic data and I\'m quickly running out of dimensions as I stitch

1条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-04 13:10

    I might suggest using pandas.concat along with its keys argument to glue together Series DataFrames to create a MultiIndex in the columns:

    In [20]: data
    Out[20]: 
    {'a': 2012-04-16    0
    2012-04-17    1
    2012-04-18    2
    2012-04-19    3
    2012-04-20    4
    2012-04-21    5
    2012-04-22    6
    2012-04-23    7
    2012-04-24    8
    2012-04-25    9
    Freq: D,
     'b': 2012-04-16    0
    2012-04-17    1
    2012-04-18    2
    2012-04-19    3
    2012-04-20    4
    2012-04-21    5
    2012-04-22    6
    2012-04-23    7
    2012-04-24    8
    2012-04-25    9
    Freq: D,
     'c': 2012-04-16    0
    2012-04-17    1
    2012-04-18    2
    2012-04-19    3
    2012-04-20    4
    2012-04-21    5
    2012-04-22    6
    2012-04-23    7
    2012-04-24    8
    2012-04-25    9
    Freq: D}
    
    In [21]: df = pd.concat(data, axis=1, keys=['a', 'b', 'c'])
    
    In [22]: df
    Out[22]: 
                a  b  c
    2012-04-16  0  0  0
    2012-04-17  1  1  1
    2012-04-18  2  2  2
    2012-04-19  3  3  3
    2012-04-20  4  4  4
    2012-04-21  5  5  5
    2012-04-22  6  6  6
    2012-04-23  7  7  7
    2012-04-24  8  8  8
    2012-04-25  9  9  9
    
    In [23]: df2 = pd.concat([df, df], axis=1, keys=['group1', 'group2'])
    
    In [24]: df2
    Out[24]: 
                group1        group2      
                     a  b  c       a  b  c
    2012-04-16       0  0  0       0  0  0
    2012-04-17       1  1  1       1  1  1
    2012-04-18       2  2  2       2  2  2
    2012-04-19       3  3  3       3  3  3
    2012-04-20       4  4  4       4  4  4
    2012-04-21       5  5  5       5  5  5
    2012-04-22       6  6  6       6  6  6
    2012-04-23       7  7  7       7  7  7
    2012-04-24       8  8  8       8  8  8
    2012-04-25       9  9  9       9  9  9
    

    You have then:

    In [25]: df2['group2']
    Out[25]: 
                a  b  c
    2012-04-16  0  0  0
    2012-04-17  1  1  1
    2012-04-18  2  2  2
    2012-04-19  3  3  3
    2012-04-20  4  4  4
    2012-04-21  5  5  5
    2012-04-22  6  6  6
    2012-04-23  7  7  7
    2012-04-24  8  8  8
    2012-04-25  9  9  9
    

    or even

    In [27]: df2.xs('b', axis=1, level=1)
    Out[27]: 
                group1  group2
    2012-04-16       0       0
    2012-04-17       1       1
    2012-04-18       2       2
    2012-04-19       3       3
    2012-04-20       4       4
    2012-04-21       5       5
    2012-04-22       6       6
    2012-04-23       7       7
    2012-04-24       8       8
    2012-04-25       9       9
    

    You can have arbitrarily many levels:

    In [29]: pd.concat([df2, df2], axis=1, keys=['tier1', 'tier2'])
    Out[29]: 
                 tier1                       tier2                    
                group1        group2        group1        group2      
                     a  b  c       a  b  c       a  b  c       a  b  c
    2012-04-16       0  0  0       0  0  0       0  0  0       0  0  0
    2012-04-17       1  1  1       1  1  1       1  1  1       1  1  1
    2012-04-18       2  2  2       2  2  2       2  2  2       2  2  2
    2012-04-19       3  3  3       3  3  3       3  3  3       3  3  3
    2012-04-20       4  4  4       4  4  4       4  4  4       4  4  4
    2012-04-21       5  5  5       5  5  5       5  5  5       5  5  5
    2012-04-22       6  6  6       6  6  6       6  6  6       6  6  6
    2012-04-23       7  7  7       7  7  7       7  7  7       7  7  7
    2012-04-24       8  8  8       8  8  8       8  8  8       8  8  8
    2012-04-25       9  9  9       9  9  9       9  9  9       9  9  9
    

    0 讨论(0)
提交回复
热议问题