Oracle DBA in KSA

June 16, 2008

Updating Millions of Rows (Merge vs. Bulk Collect)

Filed under: Oracle — ksadba @ 3:21 pm
Tags: , ,

For a 9.2.0.5 database, I have been asked to add few columns and update them with new values from another table. Base table contained 35 million rows.

To accomplish this, we researched 3 solutions: update via insert, merge statement and pl/sql with bulk collect

Update via Insert

With this method, you don’t execute any updates, instead you create a new table with the old and new data.

Here are the steps:

1. Create a new empty table with both old and new columns.
2. Insert data using append hint into the new table.
3. Recreate indexes on new table.
4. Gather statistics on new table.
5. Rename old table or drop.
6. Rename new table to old name.

Pros:

  • The fastest method to finish.
  • Not complicated, steps are straight forward.

Cons:

  • When executing steps 5 & 6 there is a sub-second where the table is not accessible. Even that tiny bit is not acceptable in our environment.
  • If you have materialized view log or CDC (change data capture), you would have to drop it and then recreate it.

We didn’t benchmark this method because it wouldn’t work in our environment due the sub-second unavailability of base table.

Merge Statement

Use Oracle built-in function merge introduced in 9i.

Pros:

  • Easiest to implement, just 1 SQL statement.

Cons:

  • Merge statement was very slow. It took 17 hours to complete which was unacceptable.

Bulk Collect

Using rowid to perform bulk updates on the base table.

Here are the steps:

1. Add/populate rowid of the base table to new table containing new data.
2. If possible, try to sort the new data in the same way like base table. In my case, we sorted the new data by the primary key. This will reduce multiple reads/writes of the same base table blocks.

Pros:

  • It took 74 minutes down from 17 hours

Cons:

  • Writing pl/sql was a bit of pain and must be adjusted and retested every time there is a change.
  • You might have to drop materialized view log and change data capture tables and then recreate them again after the operation finishes.

Here is sample code for bulk collect


DECLARE
-- source table cursor (only columns to be updated)

CURSOR base_table_cur IS
SELECT rowid, col1, col2
FROM   base_table;

type base_type  IS TABLE OF base_table_cur%ROWTYPE  INDEX BY PLS_INTEGER;

base_tab base_TYPE;

-- new data

CURSOR new_data_cur IS
SELECT row_id, new_col1, new_col2
FROM   new_data;

type new_data_type  IS TABLE OF new_data_cur%ROWTYPE  INDEX BY PLS_INTEGER;

new_data_tab new_data_TYPE;

type row_id_type  IS TABLE OF ROWID INDEX BY PLS_INTEGER;

row_id_tab row_id_TYPE;

TYPE rt_update_cols IS RECORD
(
new_col1 new_data.new_col1%type,
new_col2 new_data.new_col1 %type
);

TYPE update_cols_type IS TABLE OF rt_update_cols INDEX BY PLS_INTEGER;

update_cols_tab update_cols_type;

dml_errors EXCEPTION;

PRAGMA EXCEPTION_INIT(dml_errors, -24381);

BEGIN

open base_table_cur;
open new_data_cur;

loop

fetch base_table_cur bulk collect into base_tab limit 5000;
fetch new_data_cur bulk collect into new_data_tab limit 5000;

FOR i in base_tab.FIRST .. base_tab.LAST LOOP
  row_id_tab(i) := new_data_tab(i).row_id;
  update_cols_tab(i).col1 := new_data_tab(i).new_col1;
  update_cols_tab(i).col2 := new_data_tab(i).new_col2;
end loop;

FORALL i IN base_tab.FIRST .. base_tab.LAST SAVE EXCEPTIONS

update (select col1, col2
              from base_table )
set row = update_cols_tab(i)
where rowid = row_id_tab(i);

commit;

exit when base_tab.count < 5000;

end loop;

commit;

close base_table_cur;
close new_data_cur;

EXCEPTION

WHEN dml_errors THEN

FOR i IN 1 .. SQL%BULK_EXCEPTIONS.COUNT LOOP

dbms_output.put_line('Output your error here');

END LOOP;

END;
/

Conclusion

Use insert via update if sub-second of downtime is acceptable

Use merge if you have small set data

User pl/sql bulk collect if you have large set of data and even sub-second of downtime is not acceptable

Hazem Ameen
Senior Oracle DBA

4 Comments »

  1. Thanks, it was a very informative post and I’ve learned from it!

    Comment by Roni — June 17, 2008 @ 8:54 am | Reply

  2. What would happen if new_data table had just 100 record, against 35mlln records from base_table?
    Would this sentence still work when i = 101?
    FOR i in base_tab.FIRST .. base_tab.LAST LOOP
    row_id_tab(i) := new_data_tab(i).row_id;
    update_cols_tab(i).col1 :=new_data_tab
    (i).new_col1;
    update_cols_tab(i).col2 := new_data_tab
    (i).new_col2;
    end loop;

    Comment by Luis Eduardo Coca — July 21, 2008 @ 8:52 pm | Reply

  3. Luis,

    No, this will not work if new_data table has less rows than base_table. In my entry under bulk collect section, the first step is to create new_data table as a combination of base_table’s rowid and new data to be merged. When you finish this step, the number of rows in both tables will be the same.

    Hazem

    Comment by ksadba — July 22, 2008 @ 1:55 am | Reply

  4. Hi-

    Interesting post. I’m curious if you think I can use it for an update I’m performing. We have a fact table which has a Primary Key based on the values of two columns; the first column is called ACCT_TIME_KEY which is an an INTEGER in the format of (YYYYMMDD), all records for a given month will have the same value. The second column is named CAT_KEY, which is a concatenation of all the dimension keys for a given record. The CAT_KEY for a given record looks something like this “AC1442AL4BCK999BI198945BU263BL4C1K1C2K1CK1350DK5OU751PK1SK1ST3″.

    The fact table has a series of columns as follows (assume the record has an ACCT_TIME_KEY of 20081231):
    PT_AMT_ACTUAL <– Current Amount
    PT_AMT_ACTUAL_MAGO <– Amount for corresponding row with ACCT_TIME_KEY for 20081130
    PT_AMT_ACTUAL_2MAGO <– Amount for corresponding row with ACCT_TIME_KEY for 20081031

    So basically we are updating the table from itself as follows:
    UPDATE fact_table a
    SET a.pt_amt_actual_mago =
    SELECT SUM (b.pt_amt_actual)
    FROM fact_table b
    WHERE acct_time_key = 20081130
    AND a.cat_key = b.cat_key)
    WHERE a.acct_time_key = 20081231
    AND EXISTS (SELECT 1
    FROM fact_table c
    WHERE acct_time_key = 20081231
    AND a.cat_key = c.cat_key);

    Do you think we can make this work using bulk collect method?

    Comment by Mitchell Sacks — June 11, 2009 @ 7:15 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.