How to select the first row of each group?

前端 未结 8 794
心在旅途
心在旅途 2020-11-21 05:49

I have a DataFrame generated as follow:

df.groupBy($\"Hour\", $\"Category\")
  .agg(sum($\"value\") as \"TotalValue\")
  .sort($\"Hour\".asc, $\"TotalValue\"         


        
8条回答
  •  野性不改
    2020-11-21 06:29

    A nice way of doing this with the dataframe api is using the argmax logic like so

      val df = Seq(
        (0,"cat26",30.9), (0,"cat13",22.1), (0,"cat95",19.6), (0,"cat105",1.3),
        (1,"cat67",28.5), (1,"cat4",26.8), (1,"cat13",12.6), (1,"cat23",5.3),
        (2,"cat56",39.6), (2,"cat40",29.7), (2,"cat187",27.9), (2,"cat68",9.8),
        (3,"cat8",35.6)).toDF("Hour", "Category", "TotalValue")
    
      df.groupBy($"Hour")
        .agg(max(struct($"TotalValue", $"Category")).as("argmax"))
        .select($"Hour", $"argmax.*").show
    
     +----+----------+--------+
     |Hour|TotalValue|Category|
     +----+----------+--------+
     |   1|      28.5|   cat67|
     |   3|      35.6|    cat8|
     |   2|      39.6|   cat56|
     |   0|      30.9|   cat26|
     +----+----------+--------+
    

提交回复
热议问题