Matched size random samples from hive table

后端 未结 3 903
情深已故
情深已故 2021-01-15 04:19

I have a hive table activity with columns userid, itemid, and rating, with possible ratings of 1 and 0, in which there ar

3条回答
  •  野的像风
    2021-01-15 05:01

    If there are a lot of classes, you can use the following query to get samples across all the classes without writing the query multiple times:

    select * from 
        (select userid, item_id, rating, 
        row_number() over(partition by rating  order by rand()) as rn 
        from activity
        ) a 
    where rn <= x
    

    x can be whatever the count you want each class to be of.

提交回复
热议问题