I have a hive table activity
with columns userid
, itemid
, and rating
, with possible ratings of 1 and 0, in which there ar
If there are a lot of classes, you can use the following query to get samples across all the classes without writing the query multiple times:
select * from
(select userid, item_id, rating,
row_number() over(partition by rating order by rand()) as rn
from activity
) a
where rn <= x
x can be whatever the count you want each class to be of.