Oracle Scratchpad

April 21, 2017

Undo Understood

Filed under: Infrastructure,Oracle,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 10:45 am BST Apr 21,2017

It’s hard to understand all the ramifications of Oracle’s undo handling, and it’s not hard to find cases where the resulting effects are very confusing. A recent post on the OTN database forum resulted in one response insisting that the OP was obviously updating a table with frequent commits from one session while querying it from another thereby generating a large number of undo reads in the querying session.

That’s a possible cause of the symptoms that had been described – although not the only possible cause, especially since the symptoms hadn’t been described completely. It’s actually possible to see this type of activity when there are no updates to, and no outstanding commits on, the target table. Unfortunately it’s quite hard to demonstrate this with a quick, simple, script in recent versions of Oracle unless you do some insanely stupid things to make the problem appear – but I’m pretty good at doing “insanely stupid” with Oracle so here we go; first some data creation:

rem
rem     Script:         undo_rec_apply_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t2(v1 varchar2(100));
insert into t2 values(rpad('x',100));
commit;

create table t1
nologging
pctfree 99 pctused 1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 8e4 -- > comment to bypass WordPress formatting issue
;

alter table t1 add constraint t1_pk primary key(id)
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

The t2 table is there as a target for a large number of updates from a session other than the one that’s going to display the problem. The t1 table has been defined and populated in a way that puts one row into each of 80,000 blocks (though, with ASSM and my specific tablespace definition of uniform 1MB extents, the total space allocation is about 80,400 blocks). I’ve got a primary key declaration that allows me to pick single rows/blocks from the table if I want to.

At this point I’m going to do a lot of updates to the big (t1) table using a hugely inefficient strategy to emulate the type of thing that can happen on a very large table with lots of random updates and many indexes to maintain:


begin
        for i in 1..800 loop
                update t1 set v1 = upper(v1) where id = 100 * i;
                execute immediate 'alter system switch logfile';
                execute immediate 'alter system flush buffer_cache';
                commit;
                dbms_lock.sleep(0.01);
        end loop;
end;
/

set transaction read only;

I’m updating every 100th row/block in the table with single row commits, but before each commit I’m switching log files and flushing the buffer cache. Because of the called to dbms_lock.sleep() the code will take something like 8 seconds to complete.

 

Warning: This is NOT an experiment to try on a production system, or even a development system if there are lots of busy developers or testers around – and if you’re running your dev/test in archivelog mode (which, for some of your systems you should be’ doing) you’re going to end up with a lot of (small) archived redo logs. I have to do this switch to ensure that the updated blocks are unpinned so that they will be written to disc and flushed from the cache by the flush buffer cache. (This extreme approach would not have been necessary in earlier versions of Oracle but the clever developers at Oracle Corp. keep adding “damage limitation” touches to the code – which I then have to work around to cause damage with small test cases.) Because the block has been flushed from memory before the commit the session will increment the session/instance activity statistic “commit cleanout failures: block lost” on each commit. By the time this loop has run to completion there will be 800 blocks from the table on disc needing a “delayed block cleanout”.

Despite the extreme brute force I use in this loop, there is a further very important detail that has to be set before this test will work (at least in 11.2.0.4, which is what I’ve used in my test runs). I had to start the database with the hidden parameter _db_cache_pre_warm set to false. If I don’t have the database started with this feature disabled Oracle would notice that the buffer cache had a lot of empty space and would “pre-warm” the cache by loading a few thousand blocks from t1 as I updated one row – with the side effect that the update from the previous cycle of the loop would be cleaned out on the current cycle of the loop. If you do run this experiment, remember to reset the parameter and restart the instance when you’ve finished.

I’ve finished this chunk of code with a call to “set transaction read only” – this emulates the start of a long-running query: it captures a point in time (through the current SCN) and any queries that run in the session from now on have to be read-consistent with that point in time. After doing this I need to use a second session to do a bit of hard work – in my case the following:

execute snap_rollstats.start_snap

begin
        for i in 1..10000 loop
                update t2 set v1 = upper(v1);
                update t2 set v1 = lower(v1);
                commit;
        end loop;
end;
/

execute snap_rollstats.end_snap

The calls to the snap_rollstats package simply read v$rollstat and give me a report of the changes in the undo segment statistics over the period of the loop. I’ve executed 10,000 transactions in the interval, which was sufficient on my system to use each undo segment header at least 1,000 times and (since there are 34 transaction table slots in each undo segment header) overwrite each transaction table slot about 30 times. You can infer from these comments that I have only 10 undo segments active at the time, your system may have many more (check the number of rows in v$rollstat) so you may want to scale up that 10,000 loop count accordingly.

At this point, then, the only thing I’ve done since the start of my “long running query” is to update different table from different session. What happens when I do a simple count() from t1 that requires a full tablescan ?

alter system flush buffer_cache;

execute snap_filestat.start_snap
execute snap_my_stats.start_snap

select count(v1) from t1;

execute snap_my_stats.end_snap
execute snap_filestat.end_snap

I’ve flushed the buffer cache to get rid of any buffered undo blocks – again an unreasonable thing to do in production but a valid way of emulating the aging out of undo blocks that would take place in a production system – and surrounded my count() with a couple of packaged call to report the session activity stats and file I/O stats due to my query. (If you’re sharing your database then the file I/O stats will be affected by the activity of other users, of course, but in my case I had a private database.)

Here are the file stats:


--------------
Datafile Stats
--------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max
File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------    ---      ------      ------   ---------    ---
    1          17          17       1.000        .065          17        .065           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_system_938s4mr3_.dbf
    3         665         665       1.000        .020         665        .020           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         631      80,002     126.786        .000           2        .045         629        .000      6           0           0        .000     17
/u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__cz1w7tz1_.dbf

As expected I’ve done a number of multiblock reads of my data tablespace for a total of roughly 80,000 blocks read. What you may not have expected is that I’ve done 665 single block reads of the undo tablespace.

What have I been doing with all those undo blocks ? Check the session activity stats (restricted here to a critical pair):


Session stats
-------------
Name                                                                     Value
----                                                                     -----
transaction tables consistent reads - undo records applied              10,014
transaction tables consistent read rollbacks                                10

We’ve been reading undo blocks so that we can create read-consistent copies of the 10 undo segment headers that were active in my instance. We haven’t (and you’ll have to trust me on this, I can’t show you the stats that aren’t there!) reported any “data blocks consistent reads – undo records applied”.

If you want to see a detailed explanation of what has happened you’ll need to read Oracle Core (UK source), chapter 3 (and possibly chapter 2 to warm yourself up for the topic). In outline the following type of thing happens:

  • Oracle gets to the first block updated in t1 and sees that there’s an ITL (interested transaction list) entry that hasn’t been marked as committed (we flushed the block from memory before the commit cleanout could take place so the relevant transaction is, apparently, still running and the row is still marked as locked).
  • Let’s say the ITL entry says the transaction was for undo segment 9, transaction table slot 11, sequence 999. Oracle reads the undo segment header block for undo segment 9 and checks transaction table slot 11, which is now at sequence 1032. Oracle can infer from this that the transaction that updated the table has committed – but can’t yet know whether it committed before or after the start of our “long running query”.
  • Somehow Oracle has to get slot 11 back to sequence 999 so that it can check the commit SCN recorded in the slot at that sequence number. This is where we see “undo records applied” to make the “transaction table read consistent”. It can do this because the undo segment header has a “transaction control” section in it that records some details of the most recent transaction started in that segment. When a transaction starts it updates this information, but saves the old version of the transaction control and the previous version of its transaction table slot in its first undo record, consequently Oracle can clone the undo segment header block, identify the most recent transaction, find its first undo record and apply the bit of the undo that unwinds the transaction table information. As it does so it has also wound the transaction control section backwards one step, so it can use that (older) version to go back another step … and so on, until it takes the cloned undo segment header so far back that it takes our transaction table slot back to sequence 999 – and the job is done, we can now check the actual commit SCN.  (Or, if we’re unlucky, we might receive an ORA-01555 before we get there).

So – no changes to the t1 table during the query, but lots of undo records read because other tables had been changed by other sessions.

Footnote:

In my example the tablescan used direct path reads – so the blocks that went through delayed block cleanout were in private memory, which means they weren’t in the buffer cache and didn’t get written out to disc. When I flushed the buffer cache (again to emulate aging out of undo blocks etc.) and repeated the tablescan Oracle had to go through all that work of creating read consistent transaction tables all over again.

Footnote 2:

Despite the grand claim implied by the title I hope everyone realises that this is just one little detail of how undo handling can interfere with their expectations of performance in extreme cases.

 

4 Comments »

  1. […] instance has been up for about 60 hours – and 95% of the work it has done has been trying to find the commit times for transactions affecting blocks that are in need of delayed block cleanout. If you look at the […]

    Pingback by Ouch! | Oracle Scratchpad — June 7, 2019 @ 8:44 am BST Jun 7,2019 | Reply

  2. Hi Jonathan, I have an issue with delayed block cleanout(in fact it’s not happening at all in my case – Oracle 19.11.0.0) causing snapshot too old error. The table has only 1200 rows and few rows got modified and committed every night at 10pm. The report which runs next day morning, always been failing with 1555. But a forced restart works. when i checked the disk copy of the block, the ITL entry still shows –U–(upper bound commit) with Lck 1 and corresponding undo segment header transaction slot is overwritten. I am trying to understand how the forced restart of the report works, when ITL entry is still showing lock byte and undo information is lost? Thanks..

    Comment by Tony — April 21, 2022 @ 10:44 pm BST Apr 21,2022 | Reply

    • Tony,

      Sorry about the very long delay in replying.

      I can’t explain why the report fails on the first run since it should be able to work out that update must have committed before it started running.

      The restart succeeds because by the time the ORA-1555 occurs Oracle’s internal code has worked out an upper bound commit (that still hasn’t – apparently – got far enough back to satisfy the first run of the report). Then, when the restart happens its start SCN is higher than upper bound SCN, so the report can say “I don’t know exactly when the commit happened, but it was in the past”.

      I can only think of two possibilities at present for the initial failure.
      a) The report is one of a long-running set that started before the update of the 1200 row table, and executed “set transaction read only” to make sure all the reports in the set were consistent with each other
      b) Some other process you don’t know about updated some other rows in the table at some time before the report started but didn’t commit for a very long time afterwards. You could try querying smon_scn_time with the SCNs that are flagged as upper bound to see what the most recent SCN was. The table holds one row every 5 minutes, so that’s a starting point for accuracy, then there are up to 100 entries in a raw column that capture time/scn pairs every 3 seconds.

      For a 5 minute window you could start with:

      select  to_char(time_dp,'dd-Mon-yyyy hh24:mi:ss') scn_time
      from    smon_scn_time
      where   scn = (select max(scn) from smon_scn_time where scn <= {decimal scn})
      or      scn = (select min(scn) from smon_scn_time where scn >= {decimal scn})
      

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — May 18, 2022 @ 10:33 am BST May 18,2022 | Reply

  3. […] jobs will sometimes need to do a lot of work to check read (and write) consistency, and to find “upper bound commit” SCNs: if you have a process executing a large number of transactions in a short period of time it […]

    Pingback by drop partition | Oracle Scratchpad — August 5, 2022 @ 8:24 pm BST Aug 5,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.