Calculate the mode of a PySpark DataFrame column?

前端未结

关注

 4  558

再見小時候 2021-01-05 15:52

Ultimately what I want is the mode of a column, for all the columns in the DataFrame. For other summary statistics, I see a couple of options: use DataFrame aggregation, or

4条回答

鱼传尺愫 (楼主)

2021-01-05 16:52

This line will give you the mode of "col" in spark data frame df:

df.groupby("col").count().orderBy("count", ascending=False).first()[0]

For a list of modes for all columns in df use:

[df.groupby(i).count().orderBy("count", ascending=False).first()[0] for i in df.columns]

To add names to identify which mode for which column, make 2D list:

[[i,df.groupby(i).count().orderBy("count", ascending=False).first()[0]] for i in df.columns]

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...