Hive sort array column with respect to other array column in same table

后端 未结 1 1796
轻奢々
轻奢々 2021-01-14 14:39

I have a table in hive , with 2 columns as col1 array and col2 array. Output is as shown below

col1                


        
相关标签:
1条回答
  • 2021-01-14 15:38

    Explode both arrays, sort, then aggregate arrays again. Use sort in the subquery before collect_list to sort the array:

    with your_data as(
    select array(1,2,3,4,5) as col1,array(0.43,0.01,0.45,0.22,0.001)as col2
    )
    
    select original_col1,original_col2, collect_list(c1_x) as new_col1, collect_list(c2_x) as new_col2
    from
    (
    select d.col1 as original_col1,d.col2 as original_col2, c1.x as c1_x, c2.x as c2_x, c1.i as c1_i  
     from your_data d
          lateral view posexplode(col1) c1 as i,x
          lateral view posexplode(col2) c2 as i,x
    where c1.i=c2.i 
    distribute by original_col1,original_col2
    sort by c2_x
    )s
    group by original_col1,original_col2;
    

    Result:

    OK
    original_col1   original_col2                   new_col1        new_col2
    [1,2,3,4,5]     [0.43,0.01,0.45,0.22,0.001]     [5,2,4,1,3]     [0.001,0.01,0.22,0.43,0.45]
    Time taken: 34.642 seconds, Fetched: 1 row(s)
    

    Edit: Simplified version of the same script, you can do without second posexplode, use direct reference by position d.col2[c1.i] as c2_x

    with your_data as(
    select array(1,2,3,4,5) as col1,array(0.43,0.01,0.45,0.22,0.001)as col2
    )
    
    select original_col1,original_col2, collect_list(c1_x) as new_col1, collect_list(c2_x) as new_col2
    from
    (
    select d.col1 as original_col1,d.col2 as original_col2, c1.x as c1_x, d.col2[c1.i] as c2_x, c1.i as c1_i  
     from your_data d
          lateral view posexplode(col1) c1 as i,x
    distribute by original_col1,original_col2
    sort by c2_x
    )s
    group by original_col1,original_col2;
    
    0 讨论(0)
提交回复
热议问题