问题
I have an issue in reshaping a pandas DatFrame. It looks like this (the numbers of lines and columns can vary) :
columns col1 col2 col3 col4
Species
sp1 218.000000 521.000000 533.000000 793.000000
sp1 0.105569 0.252300 0.258111 0.384019
sp1 2 2 2 3
sp2 225.000000 521.000000 540.000000 800.000000
sp2 0.107862 0.249760 0.258869 0.383509
sp2 2 2 2 3
sp3 217.000000 477.000000 512.000000 725.000000
sp3 0.112377 0.247022 0.265148 0.375453
sp3 1 1 3 3
The column Species
is my index. I want to reshape it like this :
Species columns c f p
sp1 col1 218.000000 0.105569 2
sp1 col2 521.000000 0.252300 2
sp1 col3 533.000000 0.258111 2
sp1 col4 793.000000 0.384019 3
sp2
sp2
sp2
sp2
sp3 etc
sp3
sp3
sp3
But I can't find how to do.
The purpose is to then make a heatmap with the p.rect()
function of bokeh, the x-axis being the columns c
or f
, the y-axis being the column Species
. The size of the rectangle would be determined by the column p
.
Thanks in advance.
回答1:
First create MultiIndex
by floor division and then reshape by stack and unstack:
c = np.array(['c','f','p'])
df.index = [df.index, c[np.arange(len(df.index)) % 3]]
print (df)
columns col1 col2 col3 col4
Species
sp1 c 218.000000 521.000000 533.000000 793.000000
f 0.105569 0.252300 0.258111 0.384019
p 2.000000 2.000000 2.000000 3.000000
sp2 c 225.000000 521.000000 540.000000 800.000000
f 0.107862 0.249760 0.258869 0.383509
p 2.000000 2.000000 2.000000 3.000000
sp3 c 217.000000 477.000000 512.000000 725.000000
f 0.112377 0.247022 0.265148 0.375453
p 1.000000 1.000000 3.000000 3.000000
df = df.stack().unstack(1).reset_index()
print (df)
Species columns c f p
0 sp1 col1 218.0 0.105569 2.0
1 sp1 col2 521.0 0.252300 2.0
2 sp1 col3 533.0 0.258111 2.0
3 sp1 col4 793.0 0.384019 3.0
4 sp2 col1 225.0 0.107862 2.0
5 sp2 col2 521.0 0.249760 2.0
6 sp2 col3 540.0 0.258869 2.0
7 sp2 col4 800.0 0.383509 3.0
8 sp3 col1 217.0 0.112377 1.0
9 sp3 col2 477.0 0.247022 1.0
10 sp3 col3 512.0 0.265148 3.0
11 sp3 col4 725.0 0.375453 3.0
来源:https://stackoverflow.com/questions/49173928/reshape-a-pandas-dataframe-with-multiple-columns