I want to fill missing values of categorical values in Pandas data frame with the most frequent values on another category. For example,
import pandas as pd
impo
IIUC
Using mode
Data input
import pandas as pd
import numpy as np
data = {'type': ['softdrink', 'juice', 'softdrink', 'softdrink', 'juice','juice','softdrink'],
'product': ['coca', np.nan, 'pepsi', 'pepsi', 'orange','grape',np.nan],
'price': [25, 94, 57, 62, 70,50,60]}
df = pd.DataFrame(data)
solution
df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
Out[28]:
0 coca
1 grape
2 pepsi
3 pepsi
4 orange
5 grape
6 pepsi
Name: product, dtype: object
New df
df['product']=df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
df
Out[40]:
price product type
0 25 coca softdrink
1 94 grape juice
2 57 pepsi softdrink
3 62 pepsi softdrink
4 70 orange juice
5 50 grape juice
6 60 pepsi softdrink