Ultimately what I want is the mode of a column, for all the columns in the DataFrame. For other summary statistics, I see a couple of options: use DataFrame aggregation, or
This line will give you the mode of "col" in spark data frame df:
df.groupby("col").count().orderBy("count", ascending=False).first()[0]
For a list of modes for all columns in df use:
[df.groupby(i).count().orderBy("count", ascending=False).first()[0] for i in df.columns]
To add names to identify which mode for which column, make 2D list:
[[i,df.groupby(i).count().orderBy("count", ascending=False).first()[0]] for i in df.columns]