I have a datafarme which looks like as follows (there are more columns having been dropped off):
memberID shipping_country
264991
264991
For the following sample dataframe (I added a memberID
group that only contains ''
in the shipping_country
column):
memberID shipping_country
0 264991
1 264991 Canada
2 100 USA
3 5000
4 5000 UK
5 54
This should work for you, and also as the behavior that if a memberID
group only has empty string values (''
) in shipping_country
, those will be retained in the output df
:
df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first').fillna('')
Yields:
memberID shipping_country
0 264991 Canada
1 264991 Canada
2 100 USA
3 5000 UK
4 5000 UK
5 54
If you would like to leave the empty strings ''
as NaN
in the output df
, then just remove the fillna('')
, leaving:
df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first')
You can use chained groupby
s, one with forward fill and one with backfill:
# replace blank values with `NaN` first:
df['shipping_country'].replace('',pd.np.nan,inplace=True)
df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()
memberID shipping_country
0 264991 Canada
1 264991 Canada
2 100 USA
3 5000 UK
4 5000 UK
This method will also allow a group made up of all NaN
to remain NaN
:
>>> df
memberID shipping_country
0 264991
1 264991 Canada
2 100 USA
3 5000
4 5000 UK
5 1
6 1
df['shipping_country'].replace('',pd.np.nan,inplace=True)
df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()
memberID shipping_country
0 264991 Canada
1 264991 Canada
2 100 USA
3 5000 UK
4 5000 UK
5 1 NaN
6 1 NaN
You can use GroupBy
+ ffill
/ bfill
:
def filler(x):
return x.ffill().bfill()
res = df.groupby('memberID')['shipping_country'].apply(filler)
A custom function is necessary as there's no combined Pandas method to ffill
and bfill
sequentially.
This also caters for the situation where all values are NaN
for a specific memberID
; in this case they will remain NaN
.