randomizing large dataset

前端 未结 3 1753
失恋的感觉
失恋的感觉 2021-01-15 14:58

I am trying to find a way to get a random selection from a large dataset.

We expect the set to grow to ~500K records, so it is important to find a way that keeps per

3条回答
  •  北恋
    北恋 (楼主)
    2021-01-15 15:24

    You can do this efficiently, but you have to do it in two queries.

    First get a random offset scaled by the number of rows that match your 5% conditions:

    SELECT ROUND(RAND() * (SELECT COUNT(*) FROM MyTable WHERE ...conditions...))
    

    This returns an integer. Next, use the integer as an offset in a LIMIT expression:

    SELECT * FROM MyTable WHERE ...conditions... LIMIT 1 OFFSET ?
    

    Not every problem must be solved in a single SQL query.

提交回复
热议问题