Oracle Scratchpad

July 28, 2008

Big Update

Filed under: Performance,Tuning — Jonathan Lewis @ 8:34 pm BST Jul 28,2008

A recent post on the OTN forum asked:

I was wondering is there any fast method for updating 8 million records out of 10 million table? For eg :
I am having a customer table of 10m records and columns are cust_id, cust_num and cust_name.
i need to update 8m records out of 10m customer table as follows.
update customer set cust_id=46 where cust_id=75;
The above statement will update 8m records. And cust_id is indexed.

If you read the replies to this post you will see that there are a number of variations on a theme that might be appropriate. Ultimately your final choice comes down to a trade-off between the size of the maintenance window, your desire to avoid complexity, and where your most serious bottlenecks are.

Option 1: Just update the column – probably very slow with lots of undo and redo, and with a massive impact on other processes because of their need to work hard to achieve read-consistency. But this is possibly the ONLY guaranteed correct update if you aren’t allowed to block access to the table for the duration. Given the particular nature of this update, you might want to rebuild (online) or coalesce the critical index afterwards

Option 2: Drop the index on cust_id, update the column, then rebuild the index. You need to be a little careful that you do rebuild the index; and any other sessions code may run into serious problems while the index is out of place – so this solution (like all those that follow) is really an “offline” solution.  This is likely to save you a lot of undo and redo (from updating the index – an index update requires roughly double the resources of a table update) and could save you quite a lot of random reads and writes as the index is updated. This is quite clean and simple, and could be pretty quick – especially if you have enough resources to do things like parallel updates and multiple concurrent index rebuilds.

Option 3: Do a CTAS (create table as select) with a decode to change the value of the cust_id as you make the copy. This could be a messy job if the table is in the middle of a number of referential integrity constraints. You also have the expense of rebuilding all the indexes after you’ve created the table – but you could do everything nologging (so long as you remember to do a backup of the relevant tablespaces afterwards). After the CTAS, you have to run through a series of renames (and / or drops). So this needs some care. Even in the simplest case (i.e. no referential integrity to worry about) you have the added expense of all the tablescans to create new indexes – though the impact of this could, perhaps, be minimised if you can build all the indexes at the same time.

Option 4: Create an empty copy of the table with its indexes in place, and run an insert with the /*+ append */ hint, possibly with the nologging option. The benefit of this is that Oracle optimises the index update when using /*+ append */. Instead of updating the index row by row as the data arrives, it accumulates all the key entries, then when the table is loaded, sorts each set of index entries in turn and does a type of array update into the index. This adds to the demand for memory, but minimises undo, redo, random I/O and index leaf block splits. (Oracle does this without the /*+ append */ hint for bitmap indexes – it is only b-tree indexes that require you to use the hint to enable this optimisation). The hint also eliminates undo for the table; and nologging can be used to eliminate redo for the table – but again, if you load the table using nologging then you need to backup the tablespace afterwards. Finally, of course, you don’t have to scan the table once for each index that you rebuild – all the data for all the indexes is accumulated (in memory) as the table is populated.

So you’ve got lots of choices – and if you’ve got a full size development or test system, you can even test the different strategies to see which one works best for your environment, your data, and your set of indexes.

Footnote:

Every now and again I need to point out that driving an update (or delete) through an index may result in a reduction of undo and redo – but it takes me ages to find the blog note I wrote about it because I never remember the title and always search for “big update” and get to this post. So here’s the link to “Tuning Updates”.

8 Comments »

  1. Well, I would consider going with “Option 3” or “Option 4” depending on

    * Referential integrity constraints,
    * Number of indexes,
    * and presence of database trigger

    Comment by Asif Momen — July 28, 2008 @ 10:03 pm BST Jul 28,2008 | Reply

  2. Jonathan,

    I think dbms_redefinition might work as well. Something like this:

    create table cust (cust_num number, constraint cust_pk primary key (cust_num), cust_id number, name varchar2(10));

    create index cust_id_idx on cust(cust_id);

    insert into cust values( 1, 1, ‘a’);
    insert into cust values( 2, 2, ‘b’);
    insert into cust values( 3, 1, ‘c’);
    insert into cust values( 4, 4, ‘d’);
    insert into cust values( 5, 1, ‘e’);

    select * From cust;

    create table cust_int (cust_num number, cust_id number, name varchar2(10));

    exec dbms_redefinition.start_redef_table(user,’cust’, ‘cust_int’, ‘cust_num cust_num, decode(cust_id, 1, 99, cust_id) cust_id, name name’);

    declare
    i pls_integer;
    begin
    dbms_redefinition.copy_table_dependents( user, ‘cust’, ‘cust_int’, copy_indexes=>dbms_redefinition.cons_orig_params, copy_triggers=>true, copy_constraints=>true, copy_privileges=>true, ignore_errors=>false, num_errors=>i);
    dbms_output.put_line(‘Errors: ‘ || i);
    end;
    /

    exec dbms_redefinition.finish_redef_table(user, ‘cust’, ‘cust_int’);

    select * From cust;

    select table_name, index_name from user_indexes;

    Comment by Jeff — July 29, 2008 @ 1:10 am BST Jul 29,2008 | Reply

  3. Hi Jeff,

    Even I thought of Online Table Redifinition but redifinition of a table with 10 Million records would require same amount of space plus the redo and undo……

    This will be beneficial when you cannot afford downtime.

    Regards

    Comment by Asif Momen — July 29, 2008 @ 4:30 pm BST Jul 29,2008 | Reply

  4. A variation on the CTAS theme might be a reverse partition exchange load perhaps.

    Create a staging table which is the same structure as this table which needs updating, but is partitioned into one range partition on some arbitrary column such that every row could only fit in this one partition. Then insert all the appropriately modified data (using a decode in the same way as the CTAS would do) into this partitioned staging table, build any indexes and then, exchange the target table for the partition of the staging table including indexes.

    It would mean you don’t need to have any drops/renames to go wrong…it’s effectively doing these at the dictionary level with the PEL approach.

    Comment by Jeff Moss — July 29, 2008 @ 5:15 pm BST Jul 29,2008 | Reply

  5. If I assume that the 10,000,000 record “customer” table is in fact some type of customer “fact” table, you could just tweak the customer “master / reference” table and switch the cust_id’s over…

    then again, it doesn’t look like the cust_id is in fact a PK for the “customer” table. Sometimes the description of the problem can be simplified so much that you lose any idea of what they REALLY want to do!

    Comment by Guy — July 30, 2008 @ 2:57 pm BST Jul 30,2008 | Reply

  6. Well, probably it’s not the most efficient way, but I think that bulk update could be used.

    This method have a little advantage, you can commit every “n” rows, so you can avoid some undo and redo problems.

    Clearly, this can be done only if you can commit in between.

    Comment by lascoltodelvenerdi — July 31, 2008 @ 7:26 am BST Jul 31,2008 | Reply

  7. Or you could just create a view with the decode statement and forget about copying the data?

    Comment by Lee — August 18, 2008 @ 7:33 am BST Aug 18,2008 | Reply

  8. Jeff,

    I can’t think why I forgot the dbms_redefinition package, it’s been around long enough. It would be roughly equivalent to mixing the costs of the ‘insert/append’ approach with the continuity of the big update.

    I like the “exchange partition” thing, by the way. Much nicer than messing around with renaming base objects. That’s another one I should have thought of.

    Asif,
    The cost of the dbms_redefinition approach would probably be similar to the insert/append type of strategy since the first big step in dbms_redefinition is the insert/append, followed by incremental steps that use the technology of materialized views and materialized view logs to data that arrived during the redefinition.

    lascoltodelvenerdi,
    Be careful of the ‘commit every N rows’. This can result in extra undo and redo because the queries for the “next N” can end up generating some surprising side-effects of “delayed block cleanout”.

    Lee,
    Your suggestion would certainly be worth considering – although you then have to consider the requirement for changing code to access data correctly, and creating function-based indexes so that (for the other customer IDs) your code could still use indexed access paths.

    Comment by Jonathan Lewis — August 19, 2008 @ 6:58 pm BST Aug 19,2008 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.