I have a DataFrame generated as follow:
df.groupBy($\"Hour\", $\"Category\")
.agg(sum($\"value\") as \"TotalValue\")
.sort($\"Hour\".asc, $\"TotalValue\"
The pattern is group by keys => do something to each group e.g. reduce => return to dataframe
I thought the Dataframe abstraction is a bit cumbersome in this case so I used RDD functionality
val rdd: RDD[Row] = originalDf
.rdd
.groupBy(row => row.getAs[String]("grouping_row"))
.map(iterableTuple => {
iterableTuple._2.reduce(reduceFunction)
})
val productDf = sqlContext.createDataFrame(rdd, originalDf.schema)