Matched size random samples from hive table

后端 未结 3 898
情深已故
情深已故 2021-01-15 04:19

I have a hive table activity with columns userid, itemid, and rating, with possible ratings of 1 and 0, in which there ar

3条回答
  •  有刺的猬
    2021-01-15 04:46

    If you know in advance that negatives are the limiting factor, you can get the exact number with the first query (let's say N). Then you can get the entire sample with (hardcode N here)

    select * from
    (
      select * from activity where rating=1 order by rand() limit N
      union all
      select * from activity where rating=0  
    ) all_sample
    order by rand() limit 2N
    

    the last order may not be necessary, depending on your need.

提交回复
热议问题