How to get array/bag of elements from Hive group by operator?

后端 未结 2 846
遥遥无期
遥遥无期 2021-01-01 12:35

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-

Imagine a table named \'sample_table\'

相关标签:
2条回答
  • 2021-01-01 13:26

    collect_set actually works as expected since a set as per definition is a collection of well defined and distinct objects i.e. objects occur exactly once or not at all within a set.

    0 讨论(0)
  • 2021-01-01 13:36

    The built in aggregate function collect_set (doumented here) gets you almost what you want. It would actually work on your example input:

    SELECT F1, collect_set(F2)
    FROM sample_table
    GROUP BY F1
    

    Unfortunately, it also removes duplicate elements and I imagine this isn't your desired behavior. I find it odd that collect_set exists, but no version to keep duplicates. Someone else apparently thought the same thing. It looks like the top and second answer there will give you the UDAF you need.

    0 讨论(0)
提交回复
热议问题