问题
I have a table with a few columns (id, description, created (timestamp) and ipaddress). I have inserted 200 rows as dummy data. I need a way to pull 88 random rows with no duplicates from that table.
I have tried this:
create or replace function GetRandomCrazy88() returns setof varchar(255) as
'
select description
from task
left join tagassignment t on task.id = t.taskid
order by random()
limit 88;
' language 'sql';
But this returns duplicate rows.
I also tried this (it got a bit out of hand):
CREATE OR REPLACE FUNCTION GetRandomCrazy88(amount INTEGER)
RETURNS SETOF VARCHAR(255) AS
$$
DECLARE
tasklist INTEGER[] := '{}'::INTEGER[];
randomid INTEGER;
counter INTEGER := 0;
BEGIN
WHILE counter <= amount LOOP
SELECT CASE WHEN id = 0 THEN 1 ELSE id END INTO randomid
FROM ROUND(RANDOM() * (SELECT COUNT(*) - 1 FROM task)) AS id;
IF randomid = ANY(tasklist) OR ARRAY_LENGTH(tasklist, 1) IS NULL THEN
tasklist = array_append(tasklist, randomid);
counter := counter + 1;
ELSE
RAISE NOTICE 'DUPLICATE ID!!!';
END IF;
END LOOP;
RETURN QUERY SELECT description
FROM task t
WHERE t.id = ANY(tasklist);
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER;
It fails in the while loop. It never reaches the desired 88 numbers, since it can't add anything to the array in the if-statement, since the array is empty with a NULL-value.
Is there any way I can get exactly 88 random rows, without any duplicates?
回答1:
Here's a quick solution that you might like:
CREATE EXTENSION IF NOT EXISTS tsm_system_rows;
select * from task
tablesample system_rows (88);
For reference, TABLESAMPLE is in the docs for SELECT: https://www.postgresql.org/docs/current/sql-select.html
Here's quite a good write-up of the feature:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
...and another piece on the general subject of random sampling by the same author:
https://www.2ndquadrant.com/en/blog/tablesample-and-other-methods-for-getting-random-tuples/
tsm_system_rows is one of two standard sampling extensions, documented here: https://www.postgresql.org/docs/current/tsm-system-rows.html
Hey! I'm glad you asked this question. I tend to use the BERNOULLI method, which is built into SELECT out of the box, but it's based on a percentage. I just tried this out and it works fine:
select * from task
tablesample BERNOULLI (1)
limit 88
来源:https://stackoverflow.com/questions/57454021/i-need-a-function-to-select-88-random-rows-from-a-table-without-duplicates