reshape a pandas dataframe with multiple columns

坚强是说给别人听的谎言 提交于 2021-02-10 18:21:45

问题


I have an issue in reshaping a pandas DatFrame. It looks like this (the numbers of lines and columns can vary) :

columns       col1        col2       col3       col4
Species                                                
sp1     218.000000  521.000000 533.000000 793.000000
sp1       0.105569    0.252300   0.258111   0.384019
sp1              2           2          2          3
sp2     225.000000  521.000000 540.000000 800.000000
sp2       0.107862    0.249760   0.258869   0.383509
sp2              2           2          2          3
sp3     217.000000  477.000000 512.000000 725.000000
sp3       0.112377    0.247022   0.265148   0.375453
sp3              1           1          3          3

The column Species is my index. I want to reshape it like this :

Species columns          c        f p
sp1        col1 218.000000 0.105569 2
sp1        col2 521.000000 0.252300 2
sp1        col3 533.000000 0.258111 2
sp1        col4 793.000000 0.384019 3
sp2
sp2
sp2
sp2
sp3                         etc
sp3
sp3
sp3

But I can't find how to do.

The purpose is to then make a heatmap with the p.rect() function of bokeh, the x-axis being the columns c or f, the y-axis being the column Species. The size of the rectangle would be determined by the column p.

Thanks in advance.


回答1:


First create MultiIndex by floor division and then reshape by stack and unstack:

c = np.array(['c','f','p'])
df.index = [df.index, c[np.arange(len(df.index)) % 3]]
print (df)
columns          col1        col2        col3        col4
Species                                                  
sp1     c  218.000000  521.000000  533.000000  793.000000
        f    0.105569    0.252300    0.258111    0.384019
        p    2.000000    2.000000    2.000000    3.000000
sp2     c  225.000000  521.000000  540.000000  800.000000
        f    0.107862    0.249760    0.258869    0.383509
        p    2.000000    2.000000    2.000000    3.000000
sp3     c  217.000000  477.000000  512.000000  725.000000
        f    0.112377    0.247022    0.265148    0.375453
        p    1.000000    1.000000    3.000000    3.000000

df = df.stack().unstack(1).reset_index()
print (df)
   Species columns      c         f    p
0      sp1    col1  218.0  0.105569  2.0
1      sp1    col2  521.0  0.252300  2.0
2      sp1    col3  533.0  0.258111  2.0
3      sp1    col4  793.0  0.384019  3.0
4      sp2    col1  225.0  0.107862  2.0
5      sp2    col2  521.0  0.249760  2.0
6      sp2    col3  540.0  0.258869  2.0
7      sp2    col4  800.0  0.383509  3.0
8      sp3    col1  217.0  0.112377  1.0
9      sp3    col2  477.0  0.247022  1.0
10     sp3    col3  512.0  0.265148  3.0
11     sp3    col4  725.0  0.375453  3.0


来源:https://stackoverflow.com/questions/49173928/reshape-a-pandas-dataframe-with-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!