Backfilling columns by groups in Pandas

问题

I have a csv like

A,B,C,D
1,2,,
1,2,30,100
1,2,40,100
4,5,,
4,5,60,200
4,5,70,200
8,9,,

In row 1 and row 4 C value is missing (NaN). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value).

If no matching row is found, just put 0 (like in last line) Expected op:

A,B,C,D
1,2,30,
1,2,30,100
1,2,40,100
4,5,60,
4,5,60,200
4,5,70,200
8,9,0,

using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically (looking at col A,B values) and not just the upcoming C column value

回答1:

You'll have to call df.groupby on A and B first and then apply the bfill function:

In [501]: df.C = df.groupby(['A', 'B']).apply(lambda x: x.C.bfill()).reset_index(drop=True)

In [502]: df
Out[502]: 
   A  B   C      D
0  1  2  30    NaN
1  1  2  30  100.0
2  1  2  40  100.0
3  4  5  60    NaN
4  4  5  60  200.0
5  4  5  70  200.0
6  8  9   0    NaN

You can also group and then call dfGroupBy.bfill directly (I think this would be faster):

In [508]: df.C = df.groupby(['A', 'B']).C.bfill().fillna(0).astype(int); df
Out[508]: 
   A  B   C      D
0  1  2  30    NaN
1  1  2  30  100.0
2  1  2  40  100.0
3  4  5  60    NaN
4  4  5  60  200.0
5  4  5  70  200.0
6  8  9   0    NaN

If you wish to get rid of NaNs in D, you could do:

df.D.fillna('', inplace=True)

来源：https://stackoverflow.com/questions/45837490/backfilling-columns-by-groups-in-pandas

标签

python

pandas

dataframe

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!