How can I generalize my pandas data grouping to more than 3 dimensions?

后端未结

关注

 1  1303

I\'m using the excellent pandas package to deal with a large amount of varied meteorological diagnostic data and I\'m quickly running out of dimensions as I stitch

相关标签:

1条回答

傲寒

2021-02-04 13:10

I might suggest using pandas.concat along with its keys argument to glue together Series DataFrames to create a MultiIndex in the columns:

In [20]: data
Out[20]: 
{'a': 2012-04-16    0
2012-04-17    1
2012-04-18    2
2012-04-19    3
2012-04-20    4
2012-04-21    5
2012-04-22    6
2012-04-23    7
2012-04-24    8
2012-04-25    9
Freq: D,
 'b': 2012-04-16    0
2012-04-17    1
2012-04-18    2
2012-04-19    3
2012-04-20    4
2012-04-21    5
2012-04-22    6
2012-04-23    7
2012-04-24    8
2012-04-25    9
Freq: D,
 'c': 2012-04-16    0
2012-04-17    1
2012-04-18    2
2012-04-19    3
2012-04-20    4
2012-04-21    5
2012-04-22    6
2012-04-23    7
2012-04-24    8
2012-04-25    9
Freq: D}

In [21]: df = pd.concat(data, axis=1, keys=['a', 'b', 'c'])

In [22]: df
Out[22]: 
            a  b  c
2012-04-16  0  0  0
2012-04-17  1  1  1
2012-04-18  2  2  2
2012-04-19  3  3  3
2012-04-20  4  4  4
2012-04-21  5  5  5
2012-04-22  6  6  6
2012-04-23  7  7  7
2012-04-24  8  8  8
2012-04-25  9  9  9

In [23]: df2 = pd.concat([df, df], axis=1, keys=['group1', 'group2'])

In [24]: df2
Out[24]: 
            group1        group2      
                 a  b  c       a  b  c
2012-04-16       0  0  0       0  0  0
2012-04-17       1  1  1       1  1  1
2012-04-18       2  2  2       2  2  2
2012-04-19       3  3  3       3  3  3
2012-04-20       4  4  4       4  4  4
2012-04-21       5  5  5       5  5  5
2012-04-22       6  6  6       6  6  6
2012-04-23       7  7  7       7  7  7
2012-04-24       8  8  8       8  8  8
2012-04-25       9  9  9       9  9  9

You have then:

In [25]: df2['group2']
Out[25]: 
            a  b  c
2012-04-16  0  0  0
2012-04-17  1  1  1
2012-04-18  2  2  2
2012-04-19  3  3  3
2012-04-20  4  4  4
2012-04-21  5  5  5
2012-04-22  6  6  6
2012-04-23  7  7  7
2012-04-24  8  8  8
2012-04-25  9  9  9

or even

In [27]: df2.xs('b', axis=1, level=1)
Out[27]: 
            group1  group2
2012-04-16       0       0
2012-04-17       1       1
2012-04-18       2       2
2012-04-19       3       3
2012-04-20       4       4
2012-04-21       5       5
2012-04-22       6       6
2012-04-23       7       7
2012-04-24       8       8
2012-04-25       9       9

You can have arbitrarily many levels:

In [29]: pd.concat([df2, df2], axis=1, keys=['tier1', 'tier2'])
Out[29]: 
             tier1                       tier2                    
            group1        group2        group1        group2      
                 a  b  c       a  b  c       a  b  c       a  b  c
2012-04-16       0  0  0       0  0  0       0  0  0       0  0  0
2012-04-17       1  1  1       1  1  1       1  1  1       1  1  1
2012-04-18       2  2  2       2  2  2       2  2  2       2  2  2
2012-04-19       3  3  3       3  3  3       3  3  3       3  3  3
2012-04-20       4  4  4       4  4  4       4  4  4       4  4  4
2012-04-21       5  5  5       5  5  5       5  5  5       5  5  5
2012-04-22       6  6  6       6  6  6       6  6  6       6  6  6
2012-04-23       7  7  7       7  7  7       7  7  7       7  7  7
2012-04-24       8  8  8       8  8  8       8  8  8       8  8  8
2012-04-25       9  9  9       9  9  9       9  9  9       9  9  9

0 讨论(0)