问题
Let's say I have this data frame:
df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
and this is what I'm trying to get:
col1 col2
l n l n
12 0 32 1
16 1 47 1
92 0 22 0
77 0 14 1
I've been playing around with set_index
and stack
/unstack
methods but with no success...
回答1:
import pandas as pd
df = pd.DataFrame(
{'n':[0 ,1 ,0 ,0 ,1 ,1 ,0 ,1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14],
'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
df['index'] = df.groupby(['cols']).cumcount()
result = df.pivot(index='index', columns='cols')
print(result)
# l n
# cols col1 col2 col1 col2
# index
# 0 12 32 0 1
# 1 16 47 1 1
# 2 92 22 0 0
# 3 77 14 0 1
If you care about the order of the labels in the MultiIndex column, you could use stack and unstack to exactly reproduce result you posted:
result = result.stack(level=0).unstack(level=1)
print(result)
# cols col1 col2
# l n l n
# index
# 0 12 0 32 1
# 1 16 1 47 1
# 2 92 0 22 0
# 3 77 0 14 1
When looking for a solution it is often useful to think backwards.
Start with the desired DataFrame and ask yourself what operation might
result in the desired DataFrame. In this case, the operation that came to mind
was pd.pivot
. Then the question becomes, what DataFrame,
something
, is needed so that
desired = something.pivot(index='index', columns='cols')
By looking at other examples of pivot
in action, it became clear than something
had to equal
cols l n index
0 col1 12 0 0
1 col1 16 1 1
2 col1 92 0 2
3 col1 77 0 3
4 col2 32 1 0
5 col2 47 1 1
6 col2 22 0 2
7 col2 14 1 3
Then you see if you can find a way to massage df
into something
, or again
working backwards, massage something
into df
... From this point of view, in
this case, the missing link became apparent: something
has an index
column
that df
lacked.
回答2:
You can use a combination of DataFrame.groupby
, DataFrame.reset_index
and DataFrame.T
(transpose)
import pandas as pd
df = pd.DataFrame({'n':[0 ,1 ,0 ,0 ,1 ,1 ,0, 1],'l':[12 ,16 ,92, 77 ,32 ,47, 22, 14], 'cols':['col1','col1','col1','col1','col2','col2','col2','col2']})
print df.groupby('cols').apply(lambda x: x.reset_index(drop=True).drop('cols',axis=1).T).T
Output:
cols col1 col2
l n l n
0 12 0 32 1
1 16 1 47 1
2 92 0 22 0
3 77 0 14 1
Or you can use concat
:
print pd.concat([g.drop('cols',axis=1).reset_index(drop=True) for _,g in df.groupby('cols')],axis=1,keys=df['cols'].unique())
Output:
col1 col2
l n l n
0 12 0 32 1
1 16 1 47 1
2 92 0 22 0
3 77 0 14 1
Hope it helps, :)
来源:https://stackoverflow.com/questions/21637477/reshaping-data-frame-in-pandas