get min and max from a specific column scala spark dataframe

后端未结

关注

 7  1089

梦谈多话 2021-02-01 04:37

I would like to access to the min and max of a specific column from my dataframe but I don\'t have the header of the column, just its number, so I should I do using scala ?

7条回答

小蘑菇 (楼主)

2021-02-01 04:58

Hope this will help

val sales=sc.parallelize(List(
   ("West",  "Apple",  2.0, 10),
   ("West",  "Apple",  3.0, 15),
   ("West",  "Orange", 5.0, 15),
   ("South", "Orange", 3.0, 9),
   ("South", "Orange", 6.0, 18),
   ("East",  "Milk",   5.0, 5)))



val salesDf= sales.toDF("store","product","amount","quantity")

salesDf.registerTempTable("sales") 

val result=spark.sql("SELECT store, product, SUM(amount), MIN(amount), MAX(amount), SUM(quantity) from sales GROUP BY store, product")


//OR

salesDf.groupBy("store","product").agg(min("amount"),max("amount"),sum("amount"),sum("quantity")).show


//output
    +-----+-------+-----------+-----------+-----------+-------------+
    |store|product|min(amount)|max(amount)|sum(amount)|sum(quantity)|
    +-----+-------+-----------+-----------+-----------+-------------+
    |South| Orange|        3.0|        6.0|        9.0|           27|
    | West| Orange|        5.0|        5.0|        5.0|           15|
    | East|   Milk|        5.0|        5.0|        5.0|            5|
    | West|  Apple|        2.0|        3.0|        5.0|           25|
    +-----+-------+-----------+-----------+-----------+-------------+

0 讨论(0)

查看其它7个回答