How to select the first row of each group?

前端 未结 8 795
心在旅途
心在旅途 2020-11-21 05:49

I have a DataFrame generated as follow:

df.groupBy($\"Hour\", $\"Category\")
  .agg(sum($\"value\") as \"TotalValue\")
  .sort($\"Hour\".asc, $\"TotalValue\"         


        
8条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-21 06:53

    The pattern is group by keys => do something to each group e.g. reduce => return to dataframe

    I thought the Dataframe abstraction is a bit cumbersome in this case so I used RDD functionality

     val rdd: RDD[Row] = originalDf
      .rdd
      .groupBy(row => row.getAs[String]("grouping_row"))
      .map(iterableTuple => {
        iterableTuple._2.reduce(reduceFunction)
      })
    
    val productDf = sqlContext.createDataFrame(rdd, originalDf.schema)
    

提交回复
热议问题