Updating Millions of Rows (Merge vs. Bulk Collect)

For a 9.2.0.5 database, I have been asked to add few columns and update them with new values from another table. Base table contained 35 million rows.

To accomplish this, we researched 3 solutions: update via insert, merge statement and pl/sql with bulk collect.

 

Update via Insert

With this method, you don’t execute any updates, instead you create a new table with the old and new data.

Here are the steps:

  1. Create a new empty table with both old and new columns.
  2. Insert data using append hint into the new table.
  3. Recreate indexes on new table.
  4. Gather statistics on new table.
  5. Rename old table or drop.
  6. Rename new table to old name.

Pros:

  • The fastest method to finish.
  • Not complicated, steps are straight forward.

Cons:

  • When executing steps 5 & 6 there is a sub-second where the table is not accessible. Even that tiny bit is not acceptable in our environment.
  • If you have materialized view log or CDC (change data capture), you would have to drop it and then recreate it.

We didn’t benchmark this method because it wouldn’t work in our environment due the sub-second unavailability of base table.

 

Merge Statement

Use Oracle built-in function merge introduced in 9i.

Pros:

  • Easiest to implement, just 1 SQL statement.

Cons:

  • Merge statement was very slow. It took 17 hours to complete which was unacceptable.

 

Bulk Collect

Using rowid to perform bulk updates on the base table.

Here are the steps:

  1. Add/populate rowid of the base table to new table containing new data.
  2. If possible, try to sort the new data in the same way like base table. In my case, we sorted the new data by the primary key. This will reduce multiple reads/writes of the same base table blocks.

Pros:

  • It took 74 minutes down from 17 hours

Cons:

  • Writing pl/sql was a bit of pain and must be adjusted and retested every time there is a change.
  • You might have to drop materialized view log and change data capture tables and then recreate them again after the operation finishes.

Here is sample code for bulk collect

DECLARE
-- source table cursor (only columns to be updated)

CURSOR base_table_cur IS
SELECT rowid, col1, col2
FROM   base_table;

type base_type  IS TABLE OF base_table_cur%ROWTYPE  INDEX BY PLS_INTEGER;

base_tab base_TYPE;

-- new data

CURSOR new_data_cur IS
SELECT row_id, new_col1, new_col2
FROM   new_data;

type new_data_type  IS TABLE OF new_data_cur%ROWTYPE  INDEX BY PLS_INTEGER;

new_data_tab new_data_TYPE;

type row_id_type  IS TABLE OF ROWID INDEX BY PLS_INTEGER;

row_id_tab row_id_TYPE;

TYPE rt_update_cols IS RECORD
(
new_col1 new_data.new_col1%type,
new_col2 new_data.new_col1 %type
);

TYPE update_cols_type IS TABLE OF rt_update_cols INDEX BY PLS_INTEGER;

update_cols_tab update_cols_type;

dml_errors EXCEPTION;

PRAGMA EXCEPTION_INIT(dml_errors, -24381);

BEGIN

open base_table_cur;
open new_data_cur;

loop

fetch base_table_cur bulk collect into base_tab limit 5000;
fetch new_data_cur bulk collect into new_data_tab limit 5000;

FOR i in base_tab.FIRST .. base_tab.LAST LOOP
  row_id_tab(i) := new_data_tab(i).row_id;
  update_cols_tab(i).col1 := new_data_tab(i).new_col1;
  update_cols_tab(i).col2 := new_data_tab(i).new_col2;
end loop;

FORALL i IN base_tab.FIRST .. base_tab.LAST SAVE EXCEPTIONS

update (select col1, col2
              from base_table )
set row = update_cols_tab(i)
where rowid = row_id_tab(i);

commit;

exit when base_tab.count < 5000;

end loop;

commit;

close base_table_cur;
close new_data_cur;

EXCEPTION

WHEN dml_errors THEN

FOR i IN 1 .. SQL%BULK_EXCEPTIONS.COUNT LOOP

dbms_output.put_line('Output your error here');

END LOOP;

END;
/

 

Conclusion

Use insert via update if sub-second of downtime is acceptable

Use merge if you have small set data

Use pl/sql bulk collect if you have large set of data and even sub-second of downtime is not acceptable

Hazem Ameen
Senior Oracle DBA

Advertisements

19 thoughts on “Updating Millions of Rows (Merge vs. Bulk Collect)

  1. What would happen if new_data table had just 100 record, against 35mlln records from base_table?
    Would this sentence still work when i = 101?
    FOR i in base_tab.FIRST .. base_tab.LAST LOOP
    row_id_tab(i) := new_data_tab(i).row_id;
    update_cols_tab(i).col1 :=new_data_tab
    (i).new_col1;
    update_cols_tab(i).col2 := new_data_tab
    (i).new_col2;
    end loop;

  2. Luis,

    No, this will not work if new_data table has less rows than base_table. In my entry under bulk collect section, the first step is to create new_data table as a combination of base_table’s rowid and new data to be merged. When you finish this step, the number of rows in both tables will be the same.

    Hazem

  3. Hi-

    Interesting post. I’m curious if you think I can use it for an update I’m performing. We have a fact table which has a Primary Key based on the values of two columns; the first column is called ACCT_TIME_KEY which is an an INTEGER in the format of (YYYYMMDD), all records for a given month will have the same value. The second column is named CAT_KEY, which is a concatenation of all the dimension keys for a given record. The CAT_KEY for a given record looks something like this “AC1442AL4BCK999BI198945BU263BL4C1K1C2K1CK1350DK5OU751PK1SK1ST3”.

    The fact table has a series of columns as follows (assume the record has an ACCT_TIME_KEY of 20081231):
    PT_AMT_ACTUAL <– Current Amount
    PT_AMT_ACTUAL_MAGO <– Amount for corresponding row with ACCT_TIME_KEY for 20081130
    PT_AMT_ACTUAL_2MAGO <– Amount for corresponding row with ACCT_TIME_KEY for 20081031

    So basically we are updating the table from itself as follows:
    UPDATE fact_table a
    SET a.pt_amt_actual_mago =
    SELECT SUM (b.pt_amt_actual)
    FROM fact_table b
    WHERE acct_time_key = 20081130
    AND a.cat_key = b.cat_key)
    WHERE a.acct_time_key = 20081231
    AND EXISTS (SELECT 1
    FROM fact_table c
    WHERE acct_time_key = 20081231
    AND a.cat_key = c.cat_key);

    Do you think we can make this work using bulk collect method?

  4. You don’t elaborate on the Merge option much .. did you try Merging in parallel?
    I just completed an optimization using Merge .. took an Update statement which ran over 12 hours, and ran it with 8 paralle Merges in 15 min …
    Divide and conquer!! :p

  5. FYI: I was Merging 17 million rows …
    So, I either go to 16 parallel, or it takes 30 min … either way … Merge + Parallel *should* be quicker than your loop-bulk-slow-by-slow logic (in theory anyway .. heh)

  6. (Using the same tables/names in the original example, I’d go for this):

    — setup.sql
    CREATE TABLE ntile_work
    AS ( SELECT pk_col, new_col1, new_col2,
    NTILE(8) OVER (ORDER BY pk_col) grp
    FROM new_data )
    /

    — p.sql
    MERGE INTO base_table bt
    USING ( SELECT * FROM ntile_work
    WHERE grp = &1 ) work
    ON ( work.pk_col = bt.pk_col )
    WHEN MATCHED THEN UPDATE
    SET bt.col1 = work.new_col1,
    bt.col2 = work.new_col2
    /

    — p0.sql
    @p 1
    @p 2
    @p 3
    @p 4

    — I ran from a Unix shell, so I spawned background processes using “&” … but that’s system dependent …

    — Ideally, you wouldn’t need the above, you’d just let Oracle handle the Parallel processing,
    — but for whatever reason, it wasn’t spawning parallel slaves in my environment, and I didn’t
    — feel like digging .. so I went the above route .. 🙂
    MERGE /*+ PARALLEL(bt, 8) */ INTO base_table bt
    USING ( SELECT * FROM ntile_work
    WHERE grp = &1 ) work
    ON ( work.pk_col = bt.pk_col )
    WHEN MATCHED THEN UPDATE
    SET bt.col1 = work.new_col1,
    bt.col2 = work.new_col2
    /

    Note that I prefer using the Primary Key column (or columns if you have a composite key), as opposed to rowid … as the rowid may easily be different between the two tables – it’s not a good item to join to tables by.

  7. — setup.sql
    CREATE TABLE ntile_work
    AS ( SELECT pk_col, col1, col2,
    NTILE(8) OVER ( order by pk_col ) grp
    FROM new_data
    )
    /

    — p.sql
    MERGE INTO base_table bt
    USING ( SELECT pk_col, new_col1, new_col2
    FROM ntile_work
    WHERE grp = &1 ) wt
    ON ( bt.pk_col = wt.pk_col )
    WHEN MATCHED THEN UPDATE
    SET bt.col1 = wt.new_col1,
    bt.col2 = wt.new_col2
    /

    — Then just invoke p.sql 8 times:
    @p 1
    @p 2
    @p 3

    etc.

    — I was running in Unix, so I launched background processes via & … but that’s just dependant on your o/s.

    — Ideally, you’d just let Oracle do the footwork, and use the below .. however, when I tried, it wasn’t firing the parallel
    — and I didn’t want to spend time digging into the issue, so I used the above work-around.

    MERGE /*+ PARALLEL (bt, 8) */ INTO base_table bt
    USING ( SELECT pk_col, new_col1, new_col2
    FROM ntile_work ) wt
    ON ( bt.pk_col = wt.pk_col )
    WHEN MATCHED THEN UPDATE
    SET bt.col1 = wt.new_col1,
    bt.col2 = wt.new_col2
    /

  8. Sub-second down time may be eliminated by using a public synonym:

    Hopefully you already did …

    CREATE PUBLIC SYNONYM my_OLD_tbl FOR my_schema.my_OLD_tbl;

    … and the readers and writers go through the synonym because they do not log on to my_schema nor do they use INSERT INTO my_schema.my_OLD_tbl ….;

    If they do INSERT INTO my_schema.my_OLD_tbl … then you can use a synonym belonging to my_schema.

    Now create the new table with old + new data and then …

    CREATE OR REPLACE PUBLIC SYNONYM my_OLD_tbl FOR my_schema.my_NEW_tbl;

    I love indirection … let’s you change things behind the scenes without anyone noticing that anything happened except that the performance is suddenly better or a new feature is suddenly available.

    The CREATE OR REPLACE PUBLIC SYNONYM can take a few seconds if the writers are hammering on the table.

    But then, with the update via insert taking a few hours, what’s a few seconds more?

  9. I guess I understood this example, but does it just update data? What about insert data, like a common merge?

  10. hi i need to update a millions of rows.. but someone can explain to me how convert my sentence to a bulk collect, i do de sentence with merge.

  11. Can we write the if clause inside the forall statement means ,need to write update and insert under the same forall (before need to do some validation) .

    For example

    FORALL i IN base_tab.FIRST .. base_tab.LAST SAVE EXCEPTIONS

    If match then
    Update
    Else
    Insert into the table
    end if ;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s