Select a random sample of results from a query result

徘徊边缘 提交于 2019-11-26 15:16:50

问题


This question asks about getting a random(ish) sample of records on SQL Server and the answer was to use TABLESAMPLE. Is there an equivalent in Oracle 10?

If there isn't, is there a standard way to get a random sample of results from a query set? For example how can one get 1,000 random rows from a query that will return millions normally?


回答1:


SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        ORDER BY
                dbms_random.value
        )
WHERE rownum <= 1000



回答2:


The SAMPLE clause will give you a random sample percentage of all rows in a table.

For example, here we obtain 25% of the rows:

SELECT * FROM emp SAMPLE(25)

The following SQL (using one of the analytical functions) will give you a random sample of a specific number of each occurrence of a particular value (similar to a GROUP BY) in a table.

Here we sample 10 of each:

SELECT * FROM (
SELECT job, sal, ROW_NUMBER()
OVER (
PARTITION BY job ORDER BY job
) SampleCount FROM emp
)
WHERE SampleCount <= 10



回答3:


This in not a perfect answer but will get much better performance.

SELECT  *
FROM    (
    SELECT  *
    FROM    mytable sample (0.01)
    ORDER BY
            dbms_random.value
    )
WHERE rownum <= 1000

Sample will give you a percent of your actual table, if you really wanted a 1000 rows you would need to adjust that number. More often I just need an arbitrary number of rows anyway so I don't limit my results. On my database with 2 million rows I get 2 seconds vs 60 seconds.

select * from mytable sample (0.01)



回答4:


SELECT * FROM TABLE_NAME SAMPLE(1)

Will give you olny an approximate 1% share rather than exactly 1/100 of the number of observations. The likely reason is than Oracle generates a random flag for each observation as to whether include in in the sample that it generates. The argument 1 (1%) in such a generation process takes the role of probability of each observation's being selected into the sample.

If this is true, the actual distribution of sample sizes will be binomial.




回答5:


Sample function is used for sample data in ORACLE. So you can try like this:-

SELECT * FROM TABLE_NAME SAMPLE(50);

Here 50 is the percentage of data contained by the table. So if you want 1000 rows from 100000. You can execute a query like:

SELECT * FROM TABLE_NAME SAMPLE(1);

Hope this can help you.




回答6:


I know this has already been answered, but seeing so many visits here I'd like to add one version that uses the SAMPLE clause but still allows to filter the rows first:

with cte1 as (
    select *
    from t_your_table
    where your_column = 'ABC'
)
select * from cte1 sample (5)

Note however that the base select needs a ROWID column, which means it may not work for some views for example.




回答7:


Something like this should work:

SELECT * 
FROM table_name
WHERE primary_key IN (SELECT primary_key 
                      FROM
                      (
                        SELECT primary_key, SYS.DBMS_RANDOM.RANDOM 
                        FROM table_name 
                        ORDER BY 2
                      )
                      WHERE rownum <= 10 );



回答8:


We were given and assignment to select only two records from the list of agents..i.e 2 random records for each agent over the span of a week etc.... and below is what we got and it works

with summary as (
Select Dbms_Random.Random As Ran_Number,
             colmn1,
             colm2,
             colm3
             Row_Number() Over(Partition By col2 Order By Dbms_Random.Random) As Rank
    From table1, table2
 Where Table1.Id = Table2.Id
 Order By Dbms_Random.Random Asc)
Select tab1.col2,
             tab1.col4,
             tab1.col5,
    From Summary s
 Where s.Rank <= 2;


来源:https://stackoverflow.com/questions/733652/select-a-random-sample-of-results-from-a-query-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!