I am trying to calculate percentile of a column in a DataFrame? I cant find any percentile_approx function in Spark aggregation functions.
For e.g. in Hive we have perc
Since Spark2.0, things are getting easier,simply use this function in DataFrameStatFunctions like :
df.stat.approxQuantile("Open_Rate",Array(0.25,0.50,0.75),0.0)
There are also some useful statistic functions for DataFrame in DataFrameStatFunctions.