I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to
If you don't want to include NaN values, using Counter
is much much faster than pd.Series.mode
or pd.Series.value_counts()[0]
:
def get_most_common(srs):
x = list(srs)
my_counter = Counter(x)
return my_counter.most_common(1)[0][0]
df.groupby(col).agg(get_most_common)
should work. This will fail when you have NaN values, as each NaN will be counted separately.