GroupBy pandas DataFrame and select most common value

后端 未结 10 1728
梦谈多话
梦谈多话 2020-11-22 07:59

I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to

10条回答
  •  悲&欢浪女
    2020-11-22 08:36

    If you don't want to include NaN values, using Counter is much much faster than pd.Series.mode or pd.Series.value_counts()[0]:

    def get_most_common(srs):
        x = list(srs)
        my_counter = Counter(x)
        return my_counter.most_common(1)[0][0]
    
    df.groupby(col).agg(get_most_common)
    

    should work. This will fail when you have NaN values, as each NaN will be counted separately.

提交回复
热议问题