I need a function to select 88 random rows from a table (without duplicates)

三世轮回 提交于 2019-12-13 17:24:54

问题


I have a table with a few columns (id, description, created (timestamp) and ipaddress). I have inserted 200 rows as dummy data. I need a way to pull 88 random rows with no duplicates from that table.

I have tried this:

create or replace function GetRandomCrazy88() returns setof varchar(255) as
'
    select description
    from task
             left join tagassignment t on task.id = t.taskid
    order by random()
    limit 88;
' language 'sql';

But this returns duplicate rows.

I also tried this (it got a bit out of hand):

CREATE OR REPLACE FUNCTION GetRandomCrazy88(amount INTEGER)
    RETURNS SETOF VARCHAR(255) AS
$$
DECLARE

    tasklist INTEGER[] := '{}'::INTEGER[];

    randomid INTEGER;
    counter INTEGER := 0;

BEGIN
    WHILE counter <= amount LOOP

        SELECT CASE WHEN id = 0 THEN 1 ELSE id END INTO randomid
        FROM ROUND(RANDOM() * (SELECT COUNT(*) - 1 FROM task)) AS id;

        IF randomid = ANY(tasklist) OR ARRAY_LENGTH(tasklist, 1) IS NULL THEN
            tasklist = array_append(tasklist, randomid);
            counter := counter + 1;
        ELSE
            RAISE NOTICE 'DUPLICATE ID!!!';
        END IF;
    END LOOP;

    RETURN QUERY SELECT description
    FROM task t
    WHERE t.id = ANY(tasklist);

END;
$$ LANGUAGE plpgsql
    SECURITY DEFINER;

It fails in the while loop. It never reaches the desired 88 numbers, since it can't add anything to the array in the if-statement, since the array is empty with a NULL-value.

Is there any way I can get exactly 88 random rows, without any duplicates?


回答1:


Here's a quick solution that you might like:

CREATE EXTENSION IF NOT EXISTS tsm_system_rows;

     select * from task 
tablesample system_rows (88);

For reference, TABLESAMPLE is in the docs for SELECT: https://www.postgresql.org/docs/current/sql-select.html

Here's quite a good write-up of the feature:

https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/

...and another piece on the general subject of random sampling by the same author:

https://www.2ndquadrant.com/en/blog/tablesample-and-other-methods-for-getting-random-tuples/

tsm_system_rows is one of two standard sampling extensions, documented here: https://www.postgresql.org/docs/current/tsm-system-rows.html

Hey! I'm glad you asked this question. I tend to use the BERNOULLI method, which is built into SELECT out of the box, but it's based on a percentage. I just tried this out and it works fine:

select * from task 
tablesample BERNOULLI (1)
limit 88


来源:https://stackoverflow.com/questions/57454021/i-need-a-function-to-select-88-random-rows-from-a-table-without-duplicates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!