Calculate the mode of a PySpark DataFrame column?

前端 未结 4 558
再見小時候
再見小時候 2021-01-05 15:52

Ultimately what I want is the mode of a column, for all the columns in the DataFrame. For other summary statistics, I see a couple of options: use DataFrame aggregation, or

4条回答
  •  鱼传尺愫
    2021-01-05 16:52

    This line will give you the mode of "col" in spark data frame df:

    df.groupby("col").count().orderBy("count", ascending=False).first()[0]

    For a list of modes for all columns in df use:

    [df.groupby(i).count().orderBy("count", ascending=False).first()[0] for i in df.columns]

    To add names to identify which mode for which column, make 2D list:

    [[i,df.groupby(i).count().orderBy("count", ascending=False).first()[0]] for i in df.columns]

提交回复
热议问题