I have table which is having about 1000 rows.I have to update a column(\"X\") in the table to \'Y\' for n ramdom rows. For this i can have following query
I would use the ROWID:
UPDATE xyz SET x='Y' WHERE rowid IN (
SELECT ROWID r FROM xyz ORDER BY dbms_random.value
) RNDM WHERE rownum < n+1
The actual reason I would use ROWID isn't for efficiency though (it will still do a full table scan) - your SQL may not update the number of rows you want if column m
isn't unique.
With only 1000 rows, you shouldn't really be worried about efficiency (maybe with a hundred million rows). Without any index on this table, you're stuck doing a full table scan to select random records.
[EDIT:] "But what if there are 100,000 rows"
Well, that's still 3 orders of magnitude less than 100 million.
I ran the following:
create table xyz as select * from all_objects;
[created about 50,000 rows on my system - non-indexed, just like your table]
UPDATE xyz SET owner='Y' WHERE rowid IN (
SELECT ROWID r FROM xyz ORDER BY dbms_random.value
) RNDM WHERE rownum < 10000
This took approximately 1.5 seconds. Maybe it was 1 second, maybe up to 3 seconds (didn't formally time it, it just took about enough time to blink).
You can improve performance by replacing the full table scan with a sample.
The first problem you run into is that you can't use SAMPLE in a DML subquery, ORA-30560: SAMPLE clause not allowed
. But logically this is what is needed:
UPDATE xyz SET x='Y' WHERE rowid IN (
SELECT ROWID r FROM xyz sample(0.15) ORDER BY dbms_random.value
) RNDM WHERE rownum < 100/*n*/+1
You can get around this by using a collection to store the rowids, and then update the rows using the rowid collection. Normally breaking a query into separate parts and gluing them together with PL/SQL leads to horrible performance. But in this case you can still save a lot of time by significantly reducing the amount of data read.
type rowid_nt is table of rowid;
rowids rowid_nt;
--Get the rowids
SELECT r bulk collect into rowids
FROM xyz sample(0.15)
ORDER BY dbms_random.value
) RNDM WHERE rownum < 100/*n*/+1;
--update the table
forall i in 1 .. rowids.count
update xyz set x = 'Y'
where rowid = rowids(i);
I ran a simple test with 100,000 rows (on a table with only two columns), and N = 100. The original version took 0.85 seconds, @Gerrat's answer took 0.7 seconds, and the PL/SQL version took 0.015 seconds.
But that's only one scenario, I don't have enough information to say my answer will always be better. As N increases the sampling advantage is lost, and the writing will be more significant than the reading. If you have a very small amount of data, the PL/SQL context switching overhead in my answer may make it slower than @Gerrat's solution.
For performance issues, the size of the table in bytes is usually much more important than the size in rows. 1000 rows that use a terabyte of space is much larger than 100 million rows that only use a gigabyte.
Here are some problems to consider with my answer:
change, you'll need to use dynamic SQL to change the percent.The following solution works just fine. It's performant and seems to be similar to sample()
create table t1 as
select level id, cast ('item'||level as varchar2(32)) item
from dual connect by level<=100000;
Table T1 created.
update t1 set item='*'||item
where exists (
select rnd from (
select dbms_random.value() rnd
from t1
) t2 where t2.rowid = t1.rowid and rnd < 0.15
14,858 rows updated.
Elapsed: 00:00:00.717
Consider that alias rnd
must be included in select clause. Otherwise changes the omptimizer the filter predicat from RND<0.1
. In that case dbms_random.value
will be executed only once.
As mentioned in answer @JonHeller, the best solution remains the pl/sql code block because it allows to avoid full table scan. Here is my suggestion:
create or replace type rowidListType is table of varchar(18);
create or replace procedure updateRandomly (prefix varchar2 := '*') is
rowidList rowidListType;
select rowidtochar (rowid) bulk collect into rowidList
from t1 sample(15)
update t1 set item=prefix||item
where exists (
select 1 from table (rowidList) t2
where chartorowid(t2.column_value) = t1.rowid
dbms_output.put_line ('updated '||sql%rowcount||' rows.');
begin updateRandomly; end;
Elapsed: 00:00:00.293
updated 14892 rows.