Oracle Scratchpad

September 8, 2019

Quiz Night

Filed under: 12c,Infrastructure,Oracle,redo — Jonathan Lewis @ 10:15 pm BST Sep 8,2019

Upgrades cause surprises – here’s a pair of results from a model that I constructed more than 15 years ago, and ran today on 12.2, then modified and ran again, then ran on 11.2.0.4, then on 12.1.0.2. It’s very simple, I just create a table, gather stats, then update every row.

rem
rem     Script:         update_nochange.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem
rem     Last tested 
rem             12.2.0.1 

create table t1
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                rownum <= 1e4  -- > comment to avoid wordpress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'x')             small_vc,
--      lpad(rownum,10,'0')             small_vc,
        'Y'                             flag
from
        generator       v1
where   rownum <= 1e3   -- > comment to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

execute snap_my_stats.start_snap

update t1 set small_vc = upper(small_vc);

execute snap_my_stats.end_snap

The calls to package snap_my_stats are my little routines to calculate the change in the session activity stats (v$sesstat, joined to v$statname) due to the update. Here are a a few of the results for the test using the code as it stands:


Name                                    Value
----                                    -----
redo entries                               36
redo size                             111,756
undo change vector size                53,220

You’ll notice, however, that the CTAS has an option commented out to create the small_vc column using lpad(rownum,‘0’) rather than lpad(rownum,‘x’). This is what the redo stats look like if I use ‘0’ instead of ‘x’:


Name                                    Value
----                                    -----
redo entries                              909
redo size                             223,476
undo change vector size                68,256

What – they’re different ?!  (and it’s reproducible).

Running the test on 12.1.0.2 or 11.2.0.4, both variants of the code produce the smaller number of redo entries (and bytes) and undo – it’s only 12.2.0.1 that shows a change. (I haven’t got around to testing 18.3 and 19.3 yet.)

Tonight’s quiz:

Figure out what’s happening in 12.2.0.1 to give two different sets of undo and redo figures.

If that problem is too easy – extrapolate the test to more complex cases to see when the difference stops appearing, and see if you can find any cases where this new feature might cause existing applications to break.

I’ll supply the basic answer in 48 hours.

Update (a few hours early)

The question has been answered in the comments – it’s an optimisation introduced in 12.2 that attempts to reduce the amount of undo and redo by minimising the work done for “no change” updates to data.  In principle – but we don’t yet know the rules and limitations – if an update does not change the column values Oracle 12.2 will not “save the old values in an undo record and log the new values in a redo change vector”, it will simply lock the row, to produce a minimal redo change vector.

Unfortunately Oracle goes into “single row” mode to lock rows, while it can do “block-level – i.e. multi-row/array” processing if it uses the “change” mechanism.  Inevitably there are likely to be cases where the 12.2 optimisation actually produces a worse result in terms of volume of redo, or contention for redo latches.

If we modify the code to dump the redo generated by the two different updates we can see more clearly what Oracle is doing:

alter session set tracefile_identifier = 'UPD';

column start_scn new_value m_start_scn
select to_char(current_scn,'999999999999999999999999') start_scn from v$database;

update t1 set small_vc = upper(small_vc);
commit;

column end_scn new_value m_end_scn
select to_char(current_scn,'999999999999999999999999') end_scn from v$database;

alter system dump redo scn min &m_start_scn scn max &m_end_scn;

Then, after running the test we can dump the list of redo op codes from the trace file:

First when we do the “no-change” update (with lots of repetitions deleted):

grep -n OP orcl12c_ora_21999_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

129:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
138:CHANGE #1  OP:11.4
147:CHANGE #2  OP:5.2
150:CHANGE #3  OP:11.4
159:CHANGE #4  OP:11.4
168:CHANGE #5  OP:11.4
177:CHANGE #6  OP:11.4
...
2458:CHANGE #189  OP:5.1
2474:CHANGE #190  OP:5.1
2490:CHANGE #191  OP:5.1
2506:CHANGE #192  OP:5.1
2525:CHANGE #1  OP:5.1
2541:CHANGE #2  OP:11.4
2553:CHANGE #1  OP:5.1
2569:CHANGE #2  OP:11.4
...
27833:CHANGE #1  OP:5.1
27849:CHANGE #2  OP:11.4
27861:CHANGE #1  OP:5.1
27877:CHANGE #2  OP:11.4
27889:CHANGE #1  OP:5.4

The dump starts with a large redo record (192 change vectors) that started life in the private redo buffer, and then switch to the standard “paired change vectors” in the public redo buffer. The 11.4 vectors are “lock row piece” while the 5.1 vectors are the “generate undo”. Counting the 11.4 and 5.1 lines there are exactly 1,000 of each – every row has been individually locked.

Now for the “real change” update:

grep -n OP orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
430:CHANGE #4  OP:11.19
576:CHANGE #5  OP:11.19
...
5469:CHANGE #41  OP:5.1
5573:CHANGE #42  OP:5.1
5726:CHANGE #43  OP:5.1
5879:CHANGE #44  OP:5.1
6035:CHANGE #1  OP:5.1
6188:CHANGE #2  OP:11.19
6337:CHANGE #1  OP:5.1
6490:CHANGE #2  OP:11.19
...
15029:CHANGE #2  OP:11.19
15101:CHANGE #1  OP:5.1
15177:CHANGE #2  OP:11.19
15249:CHANGE #1  OP:5.4

It’s a much smaller trace file (ca. 15,249 lines compared to ca. 27889 lines), and the table change vectors are 11.19 (Table array update) rather than 11.4 (table lock row piece). Counting the op codes we get 52 of each of the 11.19 and 5.1. If we want a little more information about those vectors we can do the following:


egrep -n -e "OP:" -e "Array Update" orcl12c_ora_22255_UPD.trc | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//' 

126:CHANGE #2 MEDIA RECOVERY MARKER  OP:17.28
135:CHANGE #1  OP:11.19
140:Array Update of 20 rows:
281:CHANGE #2  OP:5.2
284:CHANGE #3  OP:11.19
289:Array Update of 20 rows:
430:CHANGE #4  OP:11.19
435:Array Update of 20 rows:
576:CHANGE #5  OP:11.19
581:Array Update of 20 rows:
...
5469:CHANGE #41  OP:5.1
5481:Array Update of 13 rows:
5573:CHANGE #42  OP:5.1
5585:Array Update of 20 rows:
5726:CHANGE #43  OP:5.1
5738:Array Update of 20 rows:
5879:CHANGE #44  OP:5.1
5891:Array Update of 20 rows:
6035:CHANGE #1  OP:5.1
6047:Array Update of 20 rows:
6188:CHANGE #2  OP:11.19
6193:Array Update of 20 rows:
6337:CHANGE #1  OP:5.1
6349:Array Update of 20 rows:
...
14953:CHANGE #1  OP:5.1
14965:Array Update of 9 rows:
15029:CHANGE #2  OP:11.19
15034:Array Update of 9 rows:
15101:CHANGE #1  OP:5.1
15113:Array Update of 9 rows:
15177:CHANGE #2  OP:11.19
15182:Array Update of 9 rows:
15249:CHANGE #1  OP:5.4

As you can see, the 11.19 (table change) and 5.1 (undo) change vectors both report that they are are structured as array updates. In most cases the array size is 20 rows, but there are a few cases where the array size is smaller. In this test I found one update with an array size of 13 rows and three updates with an array size of 9 rows.

Summary

Oracle has introduced an optimisation for “no change” updates in 12.2 that tries to avoid generating quite so much undo and redo; however this may result in some cases where an “array update” change vector turns into several “single row lock” change vectors, so when you upgrade to 12.2 (or beyone) you may want to check any large update mechanism you run to see if your system has benefited or been penalised to any significant extent by this change. The key indicator will be an increase in the value of the session/system stats “redo entries” and “redo size”.

9 Comments »

  1. Huh, interesting that I did the same test 2 years ago and I’ve only just noticed such obvious difference on own screenshots :) http://orasql.org/2017/11/19/triggers-and-redo-changes-on-12-2/
    Btw, simple script to find the difference: https://gist.github.com/xtender/d73064cfb8b9dde2032fe7639bc328d0 :)

    Comment by Sayan Malakshinov (@xtner) — September 8, 2019 @ 10:59 pm BST Sep 8,2019 | Reply

    • Sayan,

      The behaviour with triggers (and, implicitly, materialized views and advanced queueuing) was the first area I thought of where there might be a problem with the change here.

      That’s a nice little trick to pick up the object id and SCNs you’re interesting in to minimise the dump. I usually do take the simplest approach – switch logfiles before the critical point, then dump the whole log file: https://jonathanlewis.wordpress.com/2018/11/27/dump-logfile/

      Comment by Jonathan Lewis — September 9, 2019 @ 9:55 am BST Sep 9,2019 | Reply

      • Hi Jonathan,

        I mean that I’ve noticed the difference between the output of “same values” on different versions on my screenshots only after your quiz:

        > I usually do take the simplest approach – switch logfiles before the critical point, then dump the whole log file
        Oh, yes, and my script contains 3 “alter system switch logfile” between the tests too.

        Comment by Sayan Malakshinov (@xtner) — September 9, 2019 @ 9:56 pm BST Sep 9,2019 | Reply

        • Sayan,

          It’s okay, I got the context, I just went off at a tangent to your blog notes.
          I’ve just rediscovered some tests I did on this last year which I had forgotten about, and noted that
          a) if there’s a before row trigger on the table that does a no-change change then you still get the undo/redo values for the column
          b) If you simply reference “:old.nochange_change” column in a before row trigger then the optimisation doesn’t happen
          c) because materialized view logs use triggers and reference :old values, they disable the feature
          d) if you enable minimal supplemental logging the whole feature is turned off.

          So much I didn’t know I knew – and so much to retest just in case I got it wrong!

          Comment by Jonathan Lewis — September 10, 2019 @ 9:27 am BST Sep 10,2019

  2. Normally, Oracle updates multiple rows (‘x’ case) and, therefore, packs the redo information for multiple rows in a single redo entry. In contrast, when the value doesn’t change, the database seems to generate a redo entry per row (‘0’ case). This might be a part of the optimisation introduced in 12.2 to avoid generating redo information when the updated column doesn’t change. The paradox of higher redo size when the value didn’t change is caused by the “lock row piece” (op code 11.4) entry which is now generated for each row, as opposed to “update array” (op code 11.19) which is generated only once for many rows – the overhead outweighs the benefit in this case.

    Comment by Nenad Noveljic (@NenadNoveljic) — September 9, 2019 @ 9:53 am BST Sep 9,2019 | Reply

    • Nenad,

      You’ve just saved me the effort of writing up the answer – that’s exactly it. Oracle does a row by row “lock row piece” when the column doesn’t change but an array update when the column does change. It’s interesting to note that I commented on exactly this locking requirement more than 12 years ago. (And look how much time it took Oracle Corp. to get there – clearly a non-trivial change.)

      For an idea of what further problems might show up, you need only read the rest of that article to see what else I’ll want to check. (Unless someone else has done all the tests already, of course).

      Comment by Jonathan Lewis — September 9, 2019 @ 10:05 am BST Sep 9,2019 | Reply

      • Jonathan,

        In conclusion, the benefit is proportional to the total amount of the changed information and inversely proportional to the number of updated rows. In other words, the optimization works best when a large column in a single row is updated. In contrast, the worst case is many updated rows with a single byte column, like status varchar2(1) – a frequent real word scenario. But knowing the size of the “row lock piece” entry, the size of the updated columns and having the estimation of the rows to update, it shouldn’t be difficult to calculate the break-even point – after which the optimization becomes harmful.

        Comment by Nenad Noveljic (@NenadNoveljic) — September 9, 2019 @ 10:20 am BST Sep 9,2019 | Reply

        • Nenad,
          I think it may be more complicated than that.

          First of all, I’d guess it’s aimed at batch processes, on the assumption that spending a little extra CPU on OLTP processing of a handful of rows isn’t going to be noticeable to the typical end-user, and a little extra “wastage” won’t make a lot of difference.

          So the question then becomes – when does Oracle decide on the strategy, and how. What happens if you want to update 3 or 4 columns in a row, but one of the columns is often updated with no change? What if it’s a condition about the statement that makes it run in single row mode ? What if it’s something about the first array-full of rows that triggers single-row mode ? A very small saving for a small number of rows might trigger single row processing for a very large update where the saving is small and rare.

          Example:
          I changed the code above to include both lpad() options, using columns named small_vcX and small_vc0. An update setting both to upper(column) went into single row processing even through EVERY row was subject to real change. (OP code 11.5 – update row piece, with a redo change vector showing only one column change, and a 5.1 showing an undo record for one column change).

          Comment by Jonathan Lewis — September 9, 2019 @ 12:55 pm BST Sep 9,2019

  3. […] update doesn’t actually change the column value (and in 12.2 there’s an interesting change to the Oracle code that comes into play in that case) but it will still generate redo and undo for […]

    Pingback by Update restarts | Oracle Scratchpad — September 10, 2019 @ 11:00 am BST Sep 10,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.