Taking Sample in SQL Query

问题

I'm working on a problem which is something like this :

I have a table with many columns but major are DepartmentId and EmployeeIds

Employee Ids    Department Ids
------------------------------
A                   1
B                   1
C                   1
D                   1
AA                  2
BB                  2
CC                  2
A1                  3
B1                  3
C1                  3
D1                  3

I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID.

Employee Id  Dept Ids
B              1
C              1
AA             2
CC             2
D1             3
A1             3

Currently I am writing the query,

select
   EmployeeId, DeptIds, count(*)
from 
   table_name
group by 1,2
sample 2

but it gives me total two rows.

Any help?

回答1:

If the number of departments i know and small you could do a stratified sampling:

select *
from table_name
sample
   when DeptIds = 1 then 2
   when DeptIds = 2 then 2
   when DeptIds = 3 then 2
end

Otherwise a combination of RANDOM and ROW_NUMBER:

select *
from
 (
   sel EmployeeId, DeptIds, random(1,10000000) as rand
   from table_name
 ) as dt
qualify
   row_number()
   over (partition by DeptIds
         order by rand) <= 2

来源：https://stackoverflow.com/questions/26894976/taking-sample-in-sql-query

标签

sql

teradata

sample

random-sample

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!