Hive: Sum over a specified group (HiveQL)

后端未结

关注

 6  984

I have a table:

key    product_code    cost
1      UK              20
1      US              10
1      EU              5
2      UK              3
2      EU


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2021-02-04 14:55
              
            
            
                                                                       
The table above looked like

key    product_code    cost
1      UK              20
1      US              10
1      EU              5
2      UK              3
2      EU              6


The user wanted a tabel with the total costs like the following

key    product_code    cost     total_costs
1      UK              20       35
1      US              10       35
1      EU              5        35
2      UK              3        9
2      EU              6        9


Therefor we used the following query

SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;


So far so good. 
I want a column more, counting the occurences of each country

key    product_code    cost     total_costs     occurences
1      UK              20       35              2
1      US              10       35              1
1      EU              5        35              2
2      UK              3        9               2
2      EU              6        9               2


Therefor I used the following query

SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as total_costs
COUNT(product code) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as occurences
FROM test;


Sadly this is not working. I get an cryptic error. To exclude an error in my query I want to ask if I did something wrong.
Thanks
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2021-02-04 14:56
              
            
            
                                                                       
similar answer (if we use oracle emp table):

select deptno, ename, sal, sum(sal) over(partition by deptno) from emp;

output will be like below:

deptno  ename   sal sum_window_0
10  MILLER  1300    8750
10  KING    5000    8750
10  CLARK   2450    8750
20  SCOTT   3000    10875
20  FORD    3000    10875
20  ADAMS   1100    10875
20  JONES   2975    10875
20  SMITH   800     10875
30  BLAKE   2850    9400
30  MARTIN  1250    9400
30  ALLEN   1600    9400
30  WARD    1250    9400
30  TURNER  1500    9400
30  JAMES   950     9400

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-02-04 15:00
              
            
            
                                                                       
Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING statement.

The HiveQL query is therefore:

SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-02-04 15:03
              
            
            
                                                                       
The analytics function sum gives cumulative sums. For example, if you did:

select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test


then you would get:

key    product_code    cost     total_costs
1      UK              20       20
1      US              10       30
1      EU              5        35
2      UK              3        3
2      EU              6        9


which, it seems, is not what you want.

Instead, you should use the aggregation function sum, combined with a self join to accomplish this:

select test.key, test.product_code, test.cost, agg.total_cost
from (
  select key, sum(cost) as total_cost
  from test
  group by key
) agg
join test
on agg.key = test.key;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-02-04 15:05
              
            
            
                                                                       
You could use BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to achieve that without a self join.

Code as below:

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2021-02-04 15:10
              
            
            
                                                                       
This query gives me perfect result

select key, product_code, cost, sum(cost) over (partition by key) as total_costs from zone;
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复