Filling missing values of categorical values based on other categorical values in pandas dataframe

后端 未结 1 1450
暗喜
暗喜 2021-02-06 18:22

I want to fill missing values of categorical values in Pandas data frame with the most frequent values on another category. For example,

import pandas as pd
impo         


        
1条回答
  •  挽巷
    挽巷 (楼主)
    2021-02-06 19:14

    IIUC

    Using mode


    Data input

    import pandas as pd
    import numpy as np
    data = {'type': ['softdrink', 'juice', 'softdrink', 'softdrink',    'juice','juice','softdrink'],
        'product': ['coca', np.nan, 'pepsi', 'pepsi', 'orange','grape',np.nan],
        'price': [25, 94, 57, 62, 70,50,60]}
    df = pd.DataFrame(data)
    

    solution

    df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
    
    Out[28]: 
    0      coca
    1     grape
    2     pepsi
    3     pepsi
    4    orange
    5     grape
    6     pepsi
    Name: product, dtype: object
    

    New df

    df['product']=df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
    df
    Out[40]: 
       price product       type
    0     25    coca  softdrink
    1     94   grape      juice
    2     57   pepsi  softdrink
    3     62   pepsi  softdrink
    4     70  orange      juice
    5     50   grape      juice
    6     60   pepsi  softdrink
    

    0 讨论(0)
提交回复
热议问题