Oracle Scratchpad

September 10, 2019

Update restarts

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 11:00 am BST Sep 10,2019

Somewhere I think I’ve published a note about an anomaly that’s been bugging me since at least Oracle 10g – but if it is somewhere on the Internet it’s hiding itself very well and I can’t find it, though I have managed to find a few scripts on my laptop that make a casual reference to the side effects of the provlem. [Ed: a tweet from Timur Ahkmadeev has identified a conversation in the comments on an older post that could explain why I thought I’d written something about the issue.]

Anyway, I’ve decided to create some new code and write the article (all over again, maybe). The problem is a strange overhead that can appear when you do a simple but large update driving off a tablescan.

If you just want the conclusion and don’t feel like reading the detail, it’s this: a big update may roll itself back before it gets to the end of the tablescan and restart “invisibly”, doing a “select for update” (2nd tablescan, one row at a time) followed by the update (3rd tablescan, by array) massively increasing the redo and undo generated.

Here’s the script I’ve created to demonstrate the issue. As its header suggests the results I’ll be showing come from 12.1.0.2


rem
rem     Script:         update_restart.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Seo 2019
rem     Purpose:        
rem
rem     Last tested 
rem             12.1.0.2
rem

create table t1 
pctfree 80 
nologging
as
with g as (
        select rownum id 
        from dual 
        connect by level <= 500 -- > comment to avoid wordpress format issue
)
select
        rownum id, 
        lpad(rownum,10) v1
from    g,g
/

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

select  blocks, num_rows
from    user_tables 
where   table_name = 'T1'
/

I’ve created a table with 250,000 rows (500 * 500) and gathered stats on it (the table was 3,345 blocks) as  gathering stats tends to sort out any issues of block cleanout that might confuse the issue. Having done that I’m going to do an update of every row in the table:


alter system switch logfile;
execute snap_my_stats.start_snap

update t1 set id = id;

execute snap_my_stats.end_snap
execute dump_log

The update doesn’t actually change the column value (and in 12.2 there’s an interesting change to the Oracle code that comes into play in that case) but it will still generate redo and undo for every row in the table, and update ITL entries and do all the rest of the work we normally associate with an update.

Before the update I’ve issued a “switch logfile” and after the update I’ve called a procedure I wrote to do a symbolic dump of the current log file (“alter system dump logfile”). The calls to the snap_my_stats package allow me to see the changes in the session activity stats due to the update. A simple check of these statistics can often give you a good idea of what’s going on. So you might ask yourself what you might expect to see in the stats from this update. (As Isaac Asimov once said: The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’, but ‘That’s funny …’)

Having updated 250,000 rows by tablescan I think we might expect to see a tablescan of 250,000 rows; we might also expect to see (in the worst case) something like 250,000 redo entries although we might expect to see far fewer since Oracle can probably do array updates and create one redo change vector and one undo record per block changed rather than per row changed.

So here are some of the more interesting numbers that we actually see (though the specific results are not 100% reproducible):

Name                                                                     Value
----                                                                     -----
session logical reads                                                  270,494
db block gets                                                          263,181
consistent gets                                                          7,313
db block changes                                                       534,032
free buffer requested                                                    7,274
switch current to new buffer                                             3,534
redo entries                                                           265,128
redo size                                                           83,803,996
rollback changes - undo records applied                                  1,002
table scan rows gotten                                                 522,470
table scan blocks gotten                                                 6,881 
HSC Heap Segment Block Changes                                         264,135
Heap Segment Array Updates                                              14,135

A pair of figures that stand out as most surprising are the tablescan figures – I’ve got a table of 3,345 blocks and 250,000 rows, but my tablescans total 6,881 blocks and 522,370 rows – I seem to have scanned the table (slightly more than) twice! At the same time I’ve requested 7,274 free buffers (some of them probably for undo blocks) and “switched current to new buffer” 3,534 times – so I seem to have read every table block then cloned it in memory.

Side note: the “switched current to new buffer” is a common feature of a tablescan update and not particularly relevant to this article. I assume the intent is to “leave behind” a read-consistent copy for any other queries that might have started before the big update.

Another really strange statistic is the one that says “rollback changes – undo records applied” – I’ve used 1,002 undo records in a rollback operation of some sort. I haven’t show you the stats for “user rollbacks” or “transaction rollbacks” because, despite the undo records being applied, there were no rollbacks reported.

What about changes? We see 534,000 “db block changes” – and maybe it’s okay for that statistic to be roughly twice the number of rows we expect to change because for every row we change we have to generate some undo, so may have modify an undo block to create an undo record – but there are still 34,000 “db block changes” too many. The number of changes is fairly consistent with the number of “redo entries”, of course, allowing for a factor of 2 because redo entries often consist of a matching pair of changes, one for the “forward change” and one for the “undo block change”. Again, though, there a bit of an excess: the 34,000 extra “db block changes” seem to have resulted in an excess 15,000 “redo entries”. And we still have to ask: “Whatever happened to array processing ?”

In fact, we do see some array processing – we did 14,135 “Heap Segment Array Updates” which, at an average of about 18 rows per array would account for our 250,000 row table – but we also did 264,135 “HSC Heap Segment Block Changes”, and here’s a (“That’s funny …”) coincidence: 264,135 – 14,135 = 250,000: the number of rows in the table. Maybe that means something, maybe not. Another funny coincidence: 264,135 (HSC Heap Segment Block Changes) + 1,002 (rollback changes – undo records applied) = 265,137 which is remarkably close to the number of “redo entries”.

If, by now, you’re not beginning to suspect that something a little odd has happened you haven’t been reading enough Agatha Christie novels, or watching enough Sherlock Holmes (or House) videos.

I don’t always … dump trace files but when I do I like to go over the top. My update generated 83MB of redo so I made sure my log files were 100MB each, and when I dumped the current log file I got a trace file that was a little over 466MB and 11M lines. Clearly I had no intention of reading it all. A critical skill in reading ANY type of trace file is knowing in advance what you ought to be looking for and working out a painless way of finding it. (Practising by slugging your way through a very small example is a good starting point.)

In this case I started by using grep to extract all the lines containing “OP:”, stripping them down to just the “OP:” followed by the redo op code, and then sorting and summarising the use of op codes. Here (with some annotations) is what I got:

grep OP or32_ora_14652.trc | sed 's/.*OP/OP/' | sed 's/ .*$//' | sort | uniq -c | sort -n

      1 OP:11.3
      2 OP:10.4
      2 OP:11.5
      3 OP:23.1
      3 OP:4.1
      4 OP:14.4
      4 OP:22.5
      5 OP:13.22
      8 OP:11.2
      8 OP:22.2
     13 OP:10.2
     25 OP:5.4
    119 OP:5.11         Mark undo as applied during rollback
    883 OP:5.6          Applying undo - NB 883 + 119 = 1002
   3776 OP:5.2          Get undo segment header
  15139 OP:11.19        Array update of table
 250000 OP:11.4         Lock table row piece
 264190 OP:5.1          Update undo block

The interesting point here is that I seem to have locked every single row in the table separately, but I’ve also done array updates on the table, and I’ve done some rolling back. So I need to do a more detailed check to see when things happened. So let’s begin by examining a few thousand lines near the start of the trace:

head -10000 or32_ora_14652.trc | grep -n OP | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

68:CHANGE #1  OP:11.19
214:CHANGE #2  OP:5.2
217:CHANGE #3  OP:11.19
363:CHANGE #4  OP:11.19
509:CHANGE #5  OP:5.2
512:CHANGE #6  OP:11.19
658:CHANGE #7  OP:11.19
692:CHANGE #8  OP:11.19
838:CHANGE #9  OP:11.19
984:CHANGE #10  OP:11.19
1130:CHANGE #11  OP:11.19
1276:CHANGE #12  OP:11.19
1422:CHANGE #13  OP:11.19
1568:CHANGE #14  OP:11.19
1714:CHANGE #15  OP:5.2
1717:CHANGE #16  OP:11.19
1863:CHANGE #17  OP:11.19
2009:CHANGE #18  OP:11.19
2155:CHANGE #19  OP:11.19
2301:CHANGE #20  OP:11.19
2447:CHANGE #21  OP:11.19
2593:CHANGE #22  OP:11.19
2739:CHANGE #23  OP:11.19
2885:CHANGE #24  OP:5.2
2888:CHANGE #25  OP:11.19
3034:CHANGE #26  OP:11.19
3180:CHANGE #27  OP:5.1
3336:CHANGE #28  OP:5.1
3489:CHANGE #29  OP:5.1
3642:CHANGE #30  OP:5.1
3795:CHANGE #31  OP:5.1
3836:CHANGE #32  OP:5.1
3989:CHANGE #33  OP:5.1
4142:CHANGE #34  OP:5.1
4295:CHANGE #35  OP:5.1
4448:CHANGE #36  OP:5.1
4601:CHANGE #37  OP:5.1
4754:CHANGE #38  OP:5.1
4907:CHANGE #39  OP:5.1
5060:CHANGE #40  OP:5.1
5213:CHANGE #41  OP:5.1
5366:CHANGE #42  OP:5.1
5519:CHANGE #43  OP:5.1
5672:CHANGE #44  OP:5.1
5825:CHANGE #45  OP:5.1
5978:CHANGE #46  OP:5.1
6131:CHANGE #47  OP:5.1
6284:CHANGE #48  OP:5.1
6440:CHANGE #1  OP:5.1
6593:CHANGE #2  OP:11.19
6742:CHANGE #1  OP:5.1
6895:CHANGE #2  OP:11.19
7044:CHANGE #1  OP:5.1
7197:CHANGE #2  OP:11.19
7346:CHANGE #1  OP:5.1
7499:CHANGE #2  OP:11.19
7648:CHANGE #1  OP:5.1
7801:CHANGE #2  OP:11.19
7950:CHANGE #1  OP:5.2
7953:CHANGE #2  OP:5.1
8106:CHANGE #3  OP:11.19
8255:CHANGE #1  OP:5.1
8408:CHANGE #2  OP:11.19
8557:CHANGE #1  OP:5.1
8710:CHANGE #2  OP:11.19
8859:CHANGE #1  OP:5.1
9012:CHANGE #2  OP:11.19
9161:CHANGE #1  OP:5.1
9314:CHANGE #2  OP:11.19
9463:CHANGE #1  OP:5.1
9616:CHANGE #2  OP:11.19
9765:CHANGE #1  OP:5.1
9918:CHANGE #2  OP:11.19


Everything up to line 6284 (change# 48) is one big “private redo” record where we do several “11.19 (table array updates)” followed by a matching group of “(5.1 update undo block)”, before switching to the “public redo” mechanism used by long transactions where each redo record is a matched pair of “backward” and “forward” change vectors, i.e. one 5.1 with a matching 11.19.

So we start by doing array updates without doing any single row locking – so I want to know when the array updates stop. Here’s a small but continuous piece from the next grep I ran (which produced an output totalling 15,000 lines):

grep -n "OP:11.19" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 
...
446693:-OP:11.19
446736:-OP:11.19
446891:-OP:11.19
447045:-OP:11.19
447200:-OP:11.19
7461225:-OP:11.19    *** Big jump
7461527:-OP:11.19
7461829:-OP:11.19
7462131:-OP:11.19
7462321:-OP:11.19
7462511:-OP:11.19
...

The 11.19 opcodes were appearing roughly every 300 lines of trace, except they stopped after 2,004 occurrences then disappeared for about 7M lines before reappering and running evenly (ca. every 300 lines) to the end of the trace file. Clearly we need to know what happened somewhere around lines 447200 and 7461225. But before looking too closely, let’s find out when the “11.4 lock row piece” and “5.6 Apply undo” start to appear”

grep -n "OP:11.4" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 

447374:-OP:11.4
447405:-OP:11.4
447433:-OP:11.4
447461:-OP:11.4
447489:-OP:11.4
447517:-OP:11.4
...
... lots of evenly spread 11.4, every 28 lines
...
7460948:-OP:11.4
7460976:-OP:11.4
7461004:-OP:11.4
7461032:-OP:11.4
7461060:-OP:11.4

Note where the 11.4 opcodes begin and end – they fit exactly into the gap between the two sets of 11.19 opcodes. And where do the 5.6 op codes come into play ?

grep -n "OP:5.6" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//'

295919:-OP:5.6
296046:-OP:5.6
296201:-OP:5.6
296356:-OP:5.6
...
... a few hundred appearances of 5.6, common spaced by 155 lines 
...
446529:-OP:5.6
446684:-OP:5.6
446727:-OP:5.6
446882:-OP:5.6
447191:-OP:5.6

That was the entire set of “5.6 Apply undo” records – a short burst of activity which finishes just before the big gap in the 11.19 opcodes . This leads to the question: “does anything difference happen around the time that the 5.6 op codes start to appear?”  Time to look at a very small extract from a very big grep output:

grep -n "OP" or32_ora_14652.trc | sed 's/ CON_ID.*OP/:-OP/' | sed 's/ ENC.*$//' 
..
294895:CHANGE #1:-OP:5.1
295048:CHANGE #2:-OP:11.19

295197:CHANGE #1:-OP:5.1
295322:CHANGE #2:-OP:11.19

295443:CHANGE #1:-OP:5.1
295596:CHANGE #2:-OP:11.19

295745:CHANGE #1:-OP:22.2

295753:CHANGE #1:-OP:22.5

295760:CHANGE #1:-OP:14.4
295766:CHANGE #2:-OP:22.2

295773:CHANGE #1:-OP:11.19
295919:CHANGE #2:-OP:5.6

295928:CHANGE #1:-OP:11.19
296046:CHANGE #2:-OP:5.6

296055:CHANGE #1:-OP:11.19
296201:CHANGE #2:-OP:5.6

...

I’ve inserted blank lines between redo records in this tiny extract from around the point where the 5.6 opcodes appear. The first few records show Oracle doing the update, then there’s a little bit of space management (Op codes in the 22 and 14 layer) and then Oracle starts rolling back. (Note: whether updating or rolling back Oracle is applying “11.19 table array updates”, but the “5.6 undo applied” appear as change vector #2 in the redo record as we rollback, while the “5.1 create undo” appear as change vector #1 as we make forward changes).

If we count the number of 11.19 opcodes up to the point where the 11.4 opcodes appear we find that we’ve used 1,002 to update the table, then another 1,002 to rollback the update.

Summary

Without going into very fine detail examining individual redo change vectors, what we’ve seen is as follows:

  • On executing the very large update the session activity statistics show the number of rows accessed by tablescan was more than twice the expected number. This obvious oddity made me look more closely and notice the surprising “rollback changes – undo records applied” which just shouldn’t have been there at all, and this led me to more closely at the number of redo entries which averaged more than one per row when it should have been a much smaller number due to Oracle’s internal array update mechanism.
  • Extracting just the redo op codes from a dump of the redo log generated during this update we’ve extracted summary information that the work we’ve done includes: (a) locking every single row in the table separately, (b) performing a load of table array updates, (c) applying a significant number of undo records to rollback changes.
  • Digging in for just a little more detail we see (in order of activity):
    • A thousand array updates  (roughly 20,000 rows by tablescan – I did actually check a few redo records in detail, 20 rows was a common array size)
    • The thousand updates being rolled back (luckily a long time before we got near the end of the first tablescan)
    • 250,000 rows locked individually (250,000 rows by tablescan – the 2nd tablescan) (NB each LKR – lock row – undo record is 68 bytes of undo)
    • 13,000 array updates   (250,000 rows by tablescan – the third tablescan) – at roughly 100 bytes per row of “overhead” in the undo

I don’t know why this happens – and from memory I think it started happening in 10g. I have an idea it relates to changes in the global SCN as the update takes place and something (possibly related to space management updates) making the update think it has to go into “write consistent” mode as its “start SCN” gets too far away from the current SCN.

The bottom line on this, though, is that there may be cases where a large, tablescan-driven, update could actually do the tablescan three times, and generate far more undo and redo than it should. In theory I could have completed the update in about 13,000 pairs of (11.19, 5.1) change vectors. In practice this simple test produced far more work.

And it gets worse …

Update – prompted by a tweet from Tanel Poder

In one of my tests I had added the /+ monitor */ hint to see if the activity would be reported correctly. It didn’t look quite right – but what I hadn’t noticed was that the monitor reported the statement as executing 3 times (and, of course, the tablescan executing 3 times):

Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  TEST_USER (14:13842)
 SQL ID              :  9a166bpd7455j
 SQL Execution ID    :  16777216
 Execution Started   :  09/10/2019 09:41:54
 First Refresh Time  :  09/10/2019 09:41:54
 Last Refresh Time   :  09/10/2019 09:42:00
 Duration            :  6s
 Module/Action       :  MyModule/MyAction
 Service             :  orcl
 Program             :  sqlplus@vbgeneric (TNS V1-V3)

Global Stats
===================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
===================================================================
|    5.84 |    5.77 |     0.01 |     0.06 |     1M |   42 |  26MB |
===================================================================

SQL Plan Monitoring Details (Plan Hash Value=2927627013)
=========================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                      |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
=========================================================================================================================================
|  0 | UPDATE STATEMENT     |      |         |      |         4 |     +3 |     3 |        0 |      |       |          |                 |
|  1 |   UPDATE             | T1   |         |      |         6 |     +1 |     3 |        0 |      |       |   100.00 | Cpu (6)         |
|  2 |    TABLE ACCESS FULL | T1   |    250K |  432 |         4 |     +3 |     3 |     698K |   42 |  26MB |          |                 |
=========================================================================================================================================

As you see from Tanel’s tweet, he has a little script you can use to scan through the library cache to see if any (big) updates have suffered from this problem.

 

Addendum

I couldn’t quite manage to leave this problem alone, so I’ve run the test and dumped the redo log three more times and there’s an interesting coincidence. Every time I ran the test the switch from updating to rolling back had these 4 (annotated) redo opcodes separating them:


158766:CHANGE #1:-OP:22.2        ktfbhredo - File Space Header Redo
158774:CHANGE #1:-OP:22.5        ktfbbredo - File BitMap Block Redo:
158781:CHANGE #1:-OP:14.4        kteop redo - redo operation on extent map
158787:CHANGE #2:-OP:22.2        ktfbhredo - File Space Header Redo:

The group represents some allocation of space from a file to an object – so the question is: what object gets an extent map update. Here’s the full redo vchange vector for OP:14.4


REDO RECORD - Thread:1 RBA: 0x001cdd.000028a4.00d8 LEN: 0x00b0 VLD: 0x01 CON_UID: 0
SCN: 0x0b86.0db28b4c SUBSCN: 88 09/10/2019 15:06:13
CHANGE #1 CON_ID:0 TYP:0 CLS:29 AFN:2 DBA:0x00800688 OBJ:4294967295 SCN:0x0b86.0db28b4c SEQ:10 OP:14.4 ENC:0 RBL:0 FLG:0x0000
kteop redo - redo operation on extent map
   ADD: dba:0x802380 len:128 at offset:6
  ADDRET: offset:6 ctime:0
   SETSTAT: exts:9 blks:911 lastmap:0x0 mapcnt:0
   UPDXNT: extent:6 add:TRUE

The “ADD” is adding extent id 6, of 128 blocks, at file 2 block 9088. That happens to be the undo segment which my update statement is using. Further 14.4 change vectors that appear through the trace file take the size of the undo segment up to 35 extents, but in the (small number of ) tests I did it was always first 14.4 that seemed to trigger the rollback and restart.

Preliminary Hypothesis

When the transaction does sufficient work that it forces its undo segment to extend some feature of the resulting recursive transaction causes the transaction to rollback and restart.

This does fit with the observation that the number of rows updated before the rollback/restart occurs varies apparently at random. It also fits the observation that an update that doesn’t rollback and restart in 12.1 does rollback and restart in 12.2 because it gets a bit unlcky using the “single row locking” optimisation for “nochange update” optimisation.

2 Comments »

  1. Hi Jonathan,

    The underlying low-level cause of this issue is ORA-01551: extended rollback segment, pinned blocks released.
    I have written three posts on this subject and have an open SR with Oracle that moving along slowly ( SR 3-20662248811 : DML RESTART HAPPENING DESPITE ANY CONCURRENT SESSION):
    https://mvelikikh.blogspot.com/2019/08/that-old-restart-problem-strikes-back.html
    https://mvelikikh.blogspot.com/2019/08/that-old-restart-problem-strikes-back_2.html
    https://mvelikikh.blogspot.com/2019/08/that-old-restart-problem-strikes-back_9.html

    Regards,
    Mikhail.

    Comment by Mikhail Velikikh — September 10, 2019 @ 3:47 pm BST Sep 10,2019 | Reply

    • Mikhail,

      Thanks for the comment – it got stuck in moderation while I was writing up my further tests of the significance of op code 14.4, but it looks like you’ve already got an answer that confirms my preliminary hypothesis – so I’m about to read your three articles (and anything I can find related to the SR) to get a little more detail.

      Comment by Jonathan Lewis — September 10, 2019 @ 4:05 pm BST Sep 10,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.