get min and max from a specific column scala spark dataframe

后端 未结 7 1078
梦谈多话
梦谈多话 2021-02-01 04:37

I would like to access to the min and max of a specific column from my dataframe but I don\'t have the header of the column, just its number, so I should I do using scala ?

7条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-01 04:58

    Hope this will help

    val sales=sc.parallelize(List(
       ("West",  "Apple",  2.0, 10),
       ("West",  "Apple",  3.0, 15),
       ("West",  "Orange", 5.0, 15),
       ("South", "Orange", 3.0, 9),
       ("South", "Orange", 6.0, 18),
       ("East",  "Milk",   5.0, 5)))
    
    
    
    val salesDf= sales.toDF("store","product","amount","quantity")
    
    salesDf.registerTempTable("sales") 
    
    val result=spark.sql("SELECT store, product, SUM(amount), MIN(amount), MAX(amount), SUM(quantity) from sales GROUP BY store, product")
    
    
    //OR
    
    salesDf.groupBy("store","product").agg(min("amount"),max("amount"),sum("amount"),sum("quantity")).show
    
    
    //output
        +-----+-------+-----------+-----------+-----------+-------------+
        |store|product|min(amount)|max(amount)|sum(amount)|sum(quantity)|
        +-----+-------+-----------+-----------+-----------+-------------+
        |South| Orange|        3.0|        6.0|        9.0|           27|
        | West| Orange|        5.0|        5.0|        5.0|           15|
        | East|   Milk|        5.0|        5.0|        5.0|            5|
        | West|  Apple|        2.0|        3.0|        5.0|           25|
        +-----+-------+-----------+-----------+-----------+-------------+
    

提交回复
热议问题