I have a DataFrame generated as follow:
df.groupBy($\"Hour\", $\"Category\")
.agg(sum($\"value\") as \"TotalValue\")
.sort($\"Hour\".asc, $\"TotalValue\"
We can use the rank() window function (where you would choose the rank = 1) rank just adds a number for every row of a group (in this case it would be the hour)
here's an example. ( from https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-sql-functions.adoc#rank )
val dataset = spark.range(9).withColumn("bucket", 'id % 3)
import org.apache.spark.sql.expressions.Window
val byBucket = Window.partitionBy('bucket).orderBy('id)
scala> dataset.withColumn("rank", rank over byBucket).show
+---+------+----+
| id|bucket|rank|
+---+------+----+
| 0| 0| 1|
| 3| 0| 2|
| 6| 0| 3|
| 1| 1| 1|
| 4| 1| 2|
| 7| 1| 3|
| 2| 2| 1|
| 5| 2| 2|
| 8| 2| 3|
+---+------+----+