How to calculate sum and count in a single groupBy?

前端未结

关注

 3  559

Based on the following DataFrame:

val client = Seq((1,\"A\",10),(2,\"A\",5),(3,\"B\",56)).toDF(\"ID\",\"Categ\",\"Amnt\")
+---+-----+----+
| ID|


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2020-12-28 16:21
              
            
            
                                                                       
I'm giving different example than yours

multiple group functions are possible like this. try it accordingly

  // In 1.3.x, in order for the grouping column "department" to show up,
// it must be included explicitly as part of the agg function call.
df.groupBy("department").agg($"department", max("age"), sum("expense"))

// In 1.4+, grouping column "department" is included automatically.
df.groupBy("department").agg(max("age"), sum("expense"))




import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

val spark: SparkSession = SparkSession
      .builder.master("local")
      .appName("MyGroup")
      .getOrCreate()
import spark.implicits._
    val client: DataFrame = spark.sparkContext.parallelize(
Seq((1,"A",10),(2,"A",5),(3,"B",56))
).toDF("ID","Categ","Amnt")

client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()




+-----+---------+---------+
|Categ|sum(Amnt)|count(ID)|
+-----+---------+---------+
|    B|       56|        1|
|    A|       15|        2|
+-----+---------+---------+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2020-12-28 16:32
              
            
            
                                                                       
There are multiple ways to do aggregate functions in spark,

val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt")


1.

val aggdf = client.groupBy('Categ).agg(Map("ID"->"count","Amnt"->"sum"))

+-----+---------+---------+
|Categ|count(ID)|sum(Amnt)|
+-----+---------+---------+
|B    |1        |56       |
|A    |2        |15       |
+-----+---------+---------+

//Rename and sort as needed.
aggdf.sort('Categ).withColumnRenamed("count(ID)","Count").withColumnRenamed("sum(Amnt)","sum")
+-----+-----+---+
|Categ|Count|sum|
+-----+-----+---+
|A    |2    |15 |
|B    |1    |56 |
+-----+-----+---+


2.

import org.apache.spark.sql.functions._
client.groupBy('Categ).agg(count("ID").as("count"),sum("Amnt").as("sum"))
+-----+-----+---+
|Categ|count|sum|
+-----+-----+---+
|B    |1    |56 |
|A    |2    |15 |
+-----+-----+---+


3.

import com.google.common.collect.ImmutableMap;
client.groupBy('Categ).agg(ImmutableMap.of("ID", "count", "Amnt", "sum"))
+-----+---------+---------+
|Categ|count(ID)|sum(Amnt)|
+-----+---------+---------+
|B    |1        |56       |
|A    |2        |15       |
+-----+---------+---------+
//Use column rename is required. 


4. If you are SQL expert, you can do this too

client.createOrReplaceTempView("df")

 val aggdf = spark.sql("select Categ, count(ID),sum(Amnt) from df group by Categ")
 aggdf.show()

    +-----+---------+---------+
    |Categ|count(ID)|sum(Amnt)|
    +-----+---------+---------+
    |    B|        1|       56|
    |    A|        2|       15|
    +-----+---------+---------+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2020-12-28 16:38
              
            
            
                                                                       
You can do aggregation like below on given table:   

client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()

+-----+---------+---------+
|Categ|sum(Amnt)|count(ID)|
+-----+---------+---------+
|    A|       15|        2|
|    B|       56|        1|
+-----+---------+---------+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复