Hive: Sum over a specified group (HiveQL)

后端 未结 6 830
渐次进展
渐次进展 2021-02-04 14:49

I have a table:

key    product_code    cost
1      UK              20
1      US              10
1      EU              5
2      UK              3
2      EU               


        
相关标签:
6条回答
  • 2021-02-04 14:55

    The table above looked like

    key    product_code    cost
    1      UK              20
    1      US              10
    1      EU              5
    2      UK              3
    2      EU              6
    

    The user wanted a tabel with the total costs like the following

    key    product_code    cost     total_costs
    1      UK              20       35
    1      US              10       35
    1      EU              5        35
    2      UK              3        9
    2      EU              6        9
    

    Therefor we used the following query

    SELECT key, product_code,
    SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM test;
    

    So far so good. I want a column more, counting the occurences of each country

    key    product_code    cost     total_costs     occurences
    1      UK              20       35              2
    1      US              10       35              1
    1      EU              5        35              2
    2      UK              3        9               2
    2      EU              6        9               2
    

    Therefor I used the following query

    SELECT key, product_code,
    SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as total_costs
    COUNT(product code) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as occurences
    FROM test;
    

    Sadly this is not working. I get an cryptic error. To exclude an error in my query I want to ask if I did something wrong. Thanks

    0 讨论(0)
  • 2021-02-04 14:56

    similar answer (if we use oracle emp table):

    select deptno, ename, sal, sum(sal) over(partition by deptno) from emp;

    output will be like below:

    deptno  ename   sal sum_window_0
    10  MILLER  1300    8750
    10  KING    5000    8750
    10  CLARK   2450    8750
    20  SCOTT   3000    10875
    20  FORD    3000    10875
    20  ADAMS   1100    10875
    20  JONES   2975    10875
    20  SMITH   800     10875
    30  BLAKE   2850    9400
    30  MARTIN  1250    9400
    30  ALLEN   1600    9400
    30  WARD    1250    9400
    30  TURNER  1500    9400
    30  JAMES   950     9400
    
    0 讨论(0)
  • 2021-02-04 15:00

    Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING statement.

    The HiveQL query is therefore:

    SELECT key, product_code,
    SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM test;
    
    0 讨论(0)
  • 2021-02-04 15:03

    The analytics function sum gives cumulative sums. For example, if you did:

    select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test
    

    then you would get:

    key    product_code    cost     total_costs
    1      UK              20       20
    1      US              10       30
    1      EU              5        35
    2      UK              3        3
    2      EU              6        9
    

    which, it seems, is not what you want.

    Instead, you should use the aggregation function sum, combined with a self join to accomplish this:

    select test.key, test.product_code, test.cost, agg.total_cost
    from (
      select key, sum(cost) as total_cost
      from test
      group by key
    ) agg
    join test
    on agg.key = test.key;
    
    0 讨论(0)
  • 2021-02-04 15:05

    You could use BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to achieve that without a self join.

    Code as below:

    SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM T;
    
    0 讨论(0)
  • 2021-02-04 15:10

    This query gives me perfect result

    select key, product_code, cost, sum(cost) over (partition by key) as total_costs from zone;

    0 讨论(0)
提交回复
热议问题