问题
I'm working on a problem which is something like this :
I have a table with many columns but major are DepartmentId
and EmployeeIds
Employee Ids Department Ids
------------------------------
A 1
B 1
C 1
D 1
AA 2
BB 2
CC 2
A1 3
B1 3
C1 3
D1 3
I want to write a SQL query such that I take out 2 sample EmployeeIds
for each DepartmentID
.
like
Employee Id Dept Ids
B 1
C 1
AA 2
CC 2
D1 3
A1 3
Currently I am writing the query,
select
EmployeeId, DeptIds, count(*)
from
table_name
group by 1,2
sample 2
but it gives me total two rows.
Any help?
回答1:
If the number of departments i know and small you could do a stratified sampling:
select *
from table_name
sample
when DeptIds = 1 then 2
when DeptIds = 2 then 2
when DeptIds = 3 then 2
end
Otherwise a combination of RANDOM and ROW_NUMBER:
select *
from
(
sel EmployeeId, DeptIds, random(1,10000000) as rand
from table_name
) as dt
qualify
row_number()
over (partition by DeptIds
order by rand) <= 2
来源:https://stackoverflow.com/questions/26894976/taking-sample-in-sql-query