This one’s so odd I nearly posted it as a “Quiz Night” – but decided that it would be friendlier simply to demonstrate it. Here’s a simple script to create a couple of identical tables. It’s using my standard environment but, apart from fiddling with optimizer settings, I doubt if there’s any reason why you need to worry too much about getting the environment exactly right.
rem rem Script: aggr_subq_redo_bug.sql rem Author: Jonathan Lewis rem Dated: Dec 2011 rem create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1000 -- > comment to avoid wordpress format issue ) select rownum id, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1e4 -- > comment to avoid wordpress format issue ; create table t2 as select * from t1; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); dbms_stats.gather_table_stats( ownname => user, tabname =>'T2', method_opt => 'for all columns size 1' ); end; / create index t2_i1 on t2(id);
All I’ve done is create a small amount of data, collect stats, and build one index. Now run the following (fairly similar) updates.
update t1 set small_vc = ( select max(small_vc) from t2 where t2.id = t1.id ) where mod(id,100) = 0 ; commit; update t1 set small_vc = ( select small_vc from t2 where t2.id = t1.id ) where mod(id,100) = 0 ; commit;
All I’m doing it an update of 100 rows through a correlated subquery. For consistency you might want to recreate the tables before running each of the updates, or perhaps run the test twice with the two updates in the opposite order the second time around. I’m not trying to do anything subtle or difficult to produce an anomaly, I’m just trying to demonstrate a strange difference between these two updates.
As you might expect, both updates (in my environment) operated through a simple subquery mechansim; Oracle didn’t attempt to transform them into some sort of flattened update join view, so the two execution plans were as follows:
Aggregate subquery plan: ----------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 27 | | 1 | UPDATE | T1 | | | | |* 2 | TABLE ACCESS FULL | T1 | 100 | 1500 | 27 | | 3 | SORT AGGREGATE | | 1 | 15 | | | 4 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 15 | 2 | |* 5 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 | ----------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(MOD("ID",100)=0) 5 - access("T2"."ID"=:B1) Simple query plan: ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 27 | | 1 | UPDATE | T1 | | | | |* 2 | TABLE ACCESS FULL | T1 | 100 | 1500 | 27 | | 3 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 15 | 2 | |* 4 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 | ---------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(MOD("ID",100)=0) 4 - access("T2"."ID"=:B1)
There really doesn’t seem to be much difference in the mechanics. For each of the 100 rows selected from table t1, we run a subquery doing a range scan against table t2, using the index t2_i1 to find corresponding rows. In every case there is only one row match: in one update I copy back the maximum value from the list that consists of of just one row, in the other update I copy back the value from every one of the matching rows I find (all one of them); luckily the second update only ever finds one matching row for each row in the driving table, so the update completes rather than crashing with error ORA-01427: single-row subquery returns more than one row.
So – would you expect the two updates to perform differently ? I wouldn’t; but they do; here are some of the session stats:
Aggregate subquery ------------------ Name Value ---- ----- db block gets 208 db block changes 407 redo entries 108 redo size 43,996 undo change vector size 14,944 Simple subquery --------------- Name Value ---- ----- db block gets 105 db block changes 203 redo entries 63 redo size 34,624 undo change vector size 11,248
Obviously there were many other statistics I could have reported – consistent gets, table scan rows gotten, and so on – but all the other statistics matched very closely. What you see here is just the small number of statistics that tell us that the amount of data change was different depending on whether on not you used a (redundant) aggregation to get the value you needed for the update. In passing, if you’re running 9i or lower, or if you’re running 10g and above with RAC, or supplemental logging enabled, or if you’re running this test in the SYS schema, you’ll see the number of redo entries at (or very close to) 202 and 101 respectively – the actual number of entries in my test is the effect of in-memory undo and private redo.
To find out what’s going on, the easiest thing to do is look at a symbolic dump of the redo log for this update which, stripped to a minimum, shows the following as the redo generated for one of the later rows updated in the batch of 100.
Aggregate subquery: ------------------- REDO RECORD - Thread:1 RBA: 0x0005c6.00000057.0030 LEN: 0x00c8 VLD: 0x01 CHANGE #1 TYP:0 CLS:30 AFN:2 DBA:0x00807017 OBJ:4294967295 SCN:0x0000.04f86220 SEQ: 2 OP:5.1 ktubu redo: slt: 30 rci: 99 opc: 11.1 objn: 98261 objd: 98261 tsn: 5 CHANGE #2 TYP:0 CLS: 1 AFN:5 DBA:0x01800130 OBJ:98261 SCN:0x0000.04f86192 SEQ: 2 OP:11.4 KDO Op code: LKR row dependencies Disabled REDO RECORD - Thread:1 RBA: 0x0005c6.00000057.00f8 LEN: 0x0100 VLD: 0x01 CHANGE #1 TYP:0 CLS:30 AFN:2 DBA:0x00807017 OBJ:4294967295 SCN:0x0000.04f86221 SEQ: 1 OP:5.1 ktubu redo: slt: 30 rci: 100 opc: 11.1 objn: 98261 objd: 98261 tsn: 5 CHANGE #2 TYP:0 CLS: 1 AFN:5 DBA:0x01800130 OBJ:98261 SCN:0x0000.04f86221 SEQ: 1 OP:11.5 KDO Op code: URP row dependencies Disabled Simple subquery: ---------------- REDO RECORD - Thread:1 RBA: 0x0005c7.00000044.0090 LEN: 0x0160 VLD: 0x01 CHANGE #1 TYP:0 CLS:26 AFN:2 DBA:0x00801610 OBJ:4294967295 SCN:0x0000.04f86288 SEQ: 1 OP:5.1 ktubu redo: slt: 36 rci: 42 opc: 11.1 objn: 98261 objd: 98261 tsn: 5 CHANGE #2 TYP:2 CLS: 1 AFN:5 DBA:0x0180012c OBJ:98261 SCN:0x0000.04f86224 SEQ: 1 OP:11.19 Array Update of 1 rows: KDO Op code: 21 row dependencies Disabled
When we update using the aggregate subquery Oracle generates two redo records per row: the first locks the row (op code 11.4), the second modifies it (op code 11.5). When we do the update through a simple subquery Oracle generates a single redo record per row, which is an “array update” (op code 11.19).
This anomaly may look familiar – but this time it’s nothing to do with the audit_trail, it just seems to be the way things work. At present I can’t think of any good reason for the difference, but maybe there’s something subtle I’m overlooking.
Update Mar 2021
Repeating this test recently on 19.3.0.0 and 12.2.0.1 I found that bothe update methods (simple update, and aggregate update) went through the double-action “lock/update” mechanism. A third option that I hadn’t previously published was to do the update through a simple updateable join view, and this didn’t use the “lock/update” method, so was about 40% more efficient that either of the correlated update methods. The cost of doing the update through a merge command remains as a future exercise.
Any trigger (statement or row level) will also cause the LKR + URP instead of Array update. But array update leads (in a large amount of undo) to restart with total rollback and LKR + URP/Array update (in different versions) in repeating.
Here is the test
And this is result from 11.2.0.1:
So here 22427 = 21(???) + 2406(Array update before restart (from trace)) + 10000(LCR after rollback) + 10000(Array update after locking).
Comment by Valentin Nikotin — December 19, 2011 @ 8:55 pm GMT Dec 19,2011 |
Valentin,
Thanks for the two observations – the one about the impact of triggers is one that I don’t think I’ve seen mentioned anywhere in the last 10 years or so, since Steve Adams first mentioned it. It’s definitely time for someone to take a closer look at it and expand on his short note on Trigger Performance.
The peculiarity of a long update (by tablescan, at least) resulting in a rollback and restart is one that appeared in 10.1, I think. It looks almost like an example of Tom Kyte’s “write consistency” problem – but I’ve never got around to looking at it closely. If you’ve written anything more about it (even if it’s in Russian), please feel free to supply a URL.
Comment by Jonathan Lewis — December 23, 2011 @ 8:21 am GMT Dec 23,2011 |
I have two post where discussed the problems of restart, caused by undo management and effect of this in 11g.
http://www.sql.ru/forum/actualthread.aspx?tid=833618
http://www.sql.ru/forum/actualthread.aspx?tid=843081
The certain mechanics of the restart isn’t important (I try to get dynamic snapshots for v$rollstat and dba_undo_extents). It’s only one important conclusion – the probability of restart tends to 1 with growth of undo.
Triggers in this context are only the method to disable “array update” with “heavy” restart for that.
And it is not the case of “write consistency” restart. It’s general code for restart that can be initialized by different way.
I know two type of the restart – for all statement and row-level. Undo-restart had row-level signs (without significant additional redo generation) until “array update” appeared in 11g.
Comment by Valentin Nikotin — December 24, 2011 @ 10:32 pm GMT Dec 24,2011 |
[…] Jonathan Lewis is demonstrating a correlation oddity in a friendly way. […]
Pingback by Log Buffer #251, A Carnival of the Vanities for DBAs | The Pythian Blog — December 23, 2011 @ 6:01 am GMT Dec 23,2011 |
If we add dummy “group by 1” into the subquery we will see “SORT GROUP BY NOSORT” in the plan and get array update as well:
Comment by Valentin Nikotin — December 25, 2011 @ 2:24 pm GMT Dec 25,2011 |
The simplest example is
And we can find in the trace that for the first update “array update” is used, while for the second – LKR+URP.
Comment by Valentin Nikotin — December 25, 2011 @ 2:41 pm GMT Dec 25,2011 |
Valentin,
Thanks for all the follow-ups and links. It’s fascinating how small variations in SQL can find different paths through the Oracle code for “no apparent reason”. I particularly like the way you’ve managed to strip the problem down to such a tiny example in the last post – have you tried sending it in as an SR to see what Oracle Corp. say about it ?
Comment by Jonathan Lewis — December 28, 2011 @ 7:20 pm GMT Dec 28,2011 |
About the last, I’ve not created SR yet, but I’ve asked the question. About the problem with restart – I know only internal bug number.
Comment by Valentin Nikotin — December 28, 2011 @ 8:10 pm GMT Dec 28,2011 |
[…] the side effects of the provlem. [Ed: a tweet from Timur Ahkmadeev has identified a conversation in the comments on an older post that could explain why I thought I’d written something about the […]
Pingback by Update restarts | Oracle Scratchpad — September 10, 2019 @ 2:39 pm BST Sep 10,2019 |