I want a random selection of rows in PostgreSQL, I tried this:
select * from table where random() < 0.01;
But some other recommend this:
The one with the ORDER BY is going to be the slower one.
select * from table where random() < 0.01;
goes record by record, and decides to randomly filter it or not. This is going to be O(N)
because it only needs to check each record once.
select * from table order by random() limit 1000;
is going to sort the entire table, then pick the first 1000. Aside from any voodoo magic behind the scenes, the order by is O(N * log N)
.
The downside to the random() < 0.01
one is that you'll get a variable number of output records.
Note, there is a better way to shuffling a set of data than sorting by random: The Fisher-Yates Shuffle, which runs in O(N)
. Implementing the shuffle in SQL sounds like quite the challenge, though.