I would like to access to the min and max of a specific column from my dataframe but I don\'t have the header of the column, just its number, so I should I do using scala ?
Hope this will help
val sales=sc.parallelize(List(
("West", "Apple", 2.0, 10),
("West", "Apple", 3.0, 15),
("West", "Orange", 5.0, 15),
("South", "Orange", 3.0, 9),
("South", "Orange", 6.0, 18),
("East", "Milk", 5.0, 5)))
val salesDf= sales.toDF("store","product","amount","quantity")
salesDf.registerTempTable("sales")
val result=spark.sql("SELECT store, product, SUM(amount), MIN(amount), MAX(amount), SUM(quantity) from sales GROUP BY store, product")
//OR
salesDf.groupBy("store","product").agg(min("amount"),max("amount"),sum("amount"),sum("quantity")).show
//output
+-----+-------+-----------+-----------+-----------+-------------+
|store|product|min(amount)|max(amount)|sum(amount)|sum(quantity)|
+-----+-------+-----------+-----------+-----------+-------------+
|South| Orange| 3.0| 6.0| 9.0| 27|
| West| Orange| 5.0| 5.0| 5.0| 15|
| East| Milk| 5.0| 5.0| 5.0| 5|
| West| Apple| 2.0| 3.0| 5.0| 25|
+-----+-------+-----------+-----------+-----------+-------------+