At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is
I would try what Tom Kyte suggested on large updates. When it comes to huge tables, it best to go like this : take a few rows, update them, take some more, update those etc. Don't try to issue an update on all the table. That's a killer move right from the start.
Basically create binary_integer indexed table, fetch 10 rows at a time, and update them.
Here is a piece of code that i have used of large tables with success. Because im lazy and its like 2AM now ill just copy paste it here and let you figure it out, but let me know if you need help :
DECLARE
TYPE BookingRecord IS RECORD (
bprice number,
bevent_id number,
book_id number
);
TYPE array is TABLE of BookingRecord index by binary_integer;
l_data array;
CURSOR c1 is
SELECT LVC_USD_PRICE_V2(ev.activity_version_id,ev.course_start_date,t.local_update_date,ev.currency,nvl(t.delegate_country,ev.sponsor_org_country),ev.price,ev.currency,t.ota_status,ev.location_type) x,
ev.title,
t.ota_booking_id
FROM ota_gsi_delegate_bookings_t@diseulprod t,
inted_parted_events_t@diseulprod ev
WHERE t.event_id = ev.event_id
and t.ota_booking_id =
BEGIN
open c1;
loop
fetch c1 bulk collect into l_data limit 20;
for i in 1..l_data.count
loop
update ou_inc_int_t_01
set price = l_data(i).bprice,
updated = 'Y'
where booking_id = l_data(i).book_id;
end loop;
exit when c1%notfound;
end loop;
close c1;
END;
what can also help speed up updates is to use alter table table1 nologging
so that the update won't generate redo logs. another possibility is to drop the column and re-add it. since this is a DDL operation it will generate neither redo nor undo.
Is column Y indexed? It could be that setting the column to null means Oracle has to delete from the index, rather than just update it. If that's the case, you could drop and rebuild it after updating the data.
EDIT:
Is it just column Y that exhibits the issue, or is it independent of the column being updated? Can you post the table definition, including constraints?
That's because it deletes from blocks that data.
And delete
is the hardest operation. If you can avoid a delete
, do it.
I recommend you to create another table with that column null(Create table as select
for example, or insert select
), and fill it(the column) with your procedure. Drop old table and then rename the new table with current name.
UPDATE:
Another important thing is that you should update the column as is, with new values. It is useless to set them null and after that refill them. If you do not have values for all rows, you can do the update like this:
udpate table1
set y = (select new_value from source where source.key = table1.key)
and will set to null those rows that does not exists in source.
Summary
I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.
What's so special about null?
From the Oracle Database Concepts:
"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns, the columns more likely to contain nulls should be defined last to conserve disk space."
Test
Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs, objects should be recreated for each run, and the high and low values should be discarded.
For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.
I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.
create table test1(col1 number);
BEGIN
dbms_output.enable(1000000);
runstats_pkg.rs_start;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = 1';
commit;
end loop;
runstats_pkg.rs_pause;
runstats_pkg.rs_resume;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = null';
commit;
end loop;
runstats_pkg.rs_stop();
END;
/
Result
There are dozens of differences, these are the four I think are most relevant:
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs) 1,269 4,738 3,469
STAT heap block compress 1 2,028 2,027
STAT undo change vector size 55,855,008 181,387,456 125,532,448
STAT redo size 133,260,596 581,641,084 448,380,488
Solutions?
The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables. So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything. The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.
However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.