Oracle Scratchpad

OC 3 Transactions and Consistency

Addenda and Errata for Oracle Core Chapter 3 Transactions and Consistency

Back to Index

Addenda

p.28 Start and End of transaction, second paragraph: I’ve made a comment about a transaction ending with a redo change vector with Opcode 5.4, and made later comments (e.g. p.30 table 3-1 describing the transaction table) that a rollback actually ends with a commit. I haven’t demonstrated the similarity between commits and rollbacks, so here’s a couple of dumps from a redo log file (9.2, where it’s easier to see).In both cases I’ve created a single table with a single row and column holding the value ‘AAAAAAAAAA’ and updated this to lower case – then issued either a commit or rollback.

First the commit. We have two redo records, the second record is the “commit record” – i.e. the OP code = 5.4
The first record has 4 change vectors:
Change #1: Start transaction
Change #2: Modify undo block
Change #3: Modify table block
Change #4: Transaction audit

REDO RECORD - Thread:1 RBA: 0x0003a8.00000002.0010 LEN: 0x01a0 VLD: 0x01
SCN: 0x0000.04b99d9e SUBSCN:  1 02/26/2012 11:16:59

CHANGE #1 TYP:0 CLS:19 AFN:2 DBA:0x00800019 SCN:0x0000.04b99d1c SEQ:  1 OP:5.2
ktudh redo: slt: 0x0006 sqn: 0x000021f4 flg: 0x0012 siz: 112 fbi: 0
            uba: 0x00800072.11af.36    pxid:  0x0000.000.00000000

CHANGE #2 TYP:0 CLS:20 AFN:2 DBA:0x00800072 SCN:0x0000.04b99d1b SEQ:  3 OP:5.1
ktudb redo: siz: 112 spc: 588 flg: 0x0012 seq: 0x11af rec: 0x36
            xid:  0x0002.006.000021f4
ktubl redo: slt: 6 rci: 0 opc: 11.1 objn: 48331 objd: 48331 tsn: 12
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00800072.11af.2c
prev ctl max cmt scn:  0x0000.04b8d91d  prev tx cmt scn:  0x0000.04b8d920
KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: URP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
col  0: [10]  41 41 41 41 41 41 41 41 41 41

CHANGE #3 TYP:2 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.04b99d95 SEQ:  1 OP:11.5
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0002.006.000021f4    uba: 0x00800072.11af.36
Block cleanout record, scn:  0x0000.04b99d9e ver: 0x01 opt: 0x02, entries follow...
  itli: 1  flg: 2  scn: 0x0000.04b99d95
KDO Op code: URP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 1 nnew: 1 size: 0
col  0: [10]  61 61 61 61 61 61 61 61 61 61

CHANGE #4 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:  0 OP:5.20
session number   = 9
serial  number   = 547
transaction name =

REDO RECORD - Thread:1 RBA: 0x0003a8.00000003.0010 LEN: 0x0054 VLD: 0x01
SCN: 0x0000.04b99da0 SUBSCN:  1 02/26/2012 11:17:01
CHANGE #1 TYP:0 CLS:19 AFN:2 DBA:0x00800019 SCN:0x0000.04b99d9e SEQ:  1 OP:5.4
ktucm redo: slt: 0x0006 sqn: 0x000021f4 srt: 0 sta: 9 flg: 0x2
ktucf redo: uba: 0x00800072.11af.36 ext: 1 spc: 474 fbi: 0

Then the rollback. We have three redo records, the third record is the “commit record” – i.e. the OP code 5.4
The first record is as above (four change vectors)
The second record has two change vectors:
Change #1 – apply undo to the table
Change #2 – mark undo record as user applied

REDO RECORD - Thread:1 RBA: 0x0003a9.00000004.0010 LEN: 0x01a0 VLD: 0x01
SCN: 0x0000.04b99dd2 SUBSCN:  1 02/26/2012 11:18:20

CHANGE #1 TYP:0 CLS:27 AFN:2 DBA:0x00800059 SCN:0x0000.04b99dbb SEQ:  1 OP:5.2
ktudh redo: slt: 0x002f sqn: 0x000024f5 flg: 0x0012 siz: 112 fbi: 0
            uba: 0x00801b2e.0b17.30    pxid:  0x0000.000.00000000

CHANGE #2 TYP:0 CLS:28 AFN:2 DBA:0x00801b2e SCN:0x0000.04b99dba SEQ:  3 OP:5.1
ktudb redo: siz: 112 spc: 1216 flg: 0x0012 seq: 0x0b17 rec: 0x30
            xid:  0x0006.02f.000024f5
ktubl redo: slt: 47 rci: 0 opc: 11.1 objn: 48332 objd: 48332 tsn: 12
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00801b2e.0b17.26
prev ctl max cmt scn:  0x0000.04b8ecb3  prev tx cmt scn:  0x0000.04b8ecb7
KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: URP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
col  0: [10]  41 41 41 41 41 41 41 41 41 41

CHANGE #3 TYP:2 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.04b99dcd SEQ:  1 OP:11.5
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0006.02f.000024f5    uba: 0x00801b2e.0b17.30
Block cleanout record, scn:  0x0000.04b99dd2 ver: 0x01 opt: 0x02, entries follow...
  itli: 1  flg: 2  scn: 0x0000.04b99dcd
KDO Op code: URP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 1 nnew: 1 size: 0
col  0: [10]  61 61 61 61 61 61 61 61 61 61

CHANGE #4 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:  0 OP:5.20
session number   = 9
serial  number   = 549
transaction name =

REDO RECORD - Thread:1 RBA: 0x0003a9.00000005.0010 LEN: 0x00a4 VLD: 0x01
SCN: 0x0000.04b99dd3 SUBSCN:  1 02/26/2012 11:18:21

CHANGE #1 TYP:0 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.04b99dd2 SEQ:  1 OP:11.5
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: URP row dependencies Disabled
  xtype: XR  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
col  0: [10]  41 41 41 41 41 41 41 41 41 41

CHANGE #2 TYP:0 CLS:27 AFN:2 DBA:0x00800059 SCN:0x0000.04b99dd2 SEQ:  1 OP:5.11
ktubu redo: slt: 47 rci: 0 opc: 11.1 objn: 48332 objd: 48332 tsn: 12
Undo type:  Regular undo       Undo type:  User undo done    Begin trans    Last buffer split:  No
Tablespace Undo:  No
             0x00000000

REDO RECORD - Thread:1 RBA: 0x0003a9.00000005.00b4 LEN: 0x0044 VLD: 0x01
SCN: 0x0000.04b99dd4 SUBSCN:  1 02/26/2012 11:18:21
CHANGE #1 TYP:0 CLS:27 AFN:2 DBA:0x00800059 SCN:0x0000.04b99dd3 SEQ:  1 OP:5.4
ktucm redo: slt: 0x002f sqn: 0x000024f5 srt: 0 sta: 9 flg: 0x4
rolled back transaction

p.50 In the note, the script cleanout_3.sql demonstrates that direct path reads will not result in block cleanout. However the relative sizes of the table and the buffer cache is critical. If the table is too small (or, ocnversely, the cache too large) newer versions of Oracle may run the query in parallel but still load the data into the buffer cache (11g – in-memory parallel query), which will result in block cleanout taking place. Make sure that you see the statistic “physical reads direct” with a value of about 500.

Errata

p.38 Last two lines: record 3 says “clear slot (row) 0x01 from the block…”. should reference slot 0x00, not 0x01 (See comment 9 below)
p.40 Fifth line: 0x1f8d should be 0x1f8e
p.43 3rd and 4th lines from bottom of page: “the undo record that we eventually retrieved from the next transaction to use this ITL entry stored (C—,0,fsc).” The “fsc” should read “scn” – as shown in ITL entry 0x01 a few lines above the preceding Note section. (See comment 5 below)
p.49 2nd para (What’s happening …), fourth line: “consistent get – examination” should be “consistent gets - examination”.

3rd para from end, last line: “(kcmgrs, kcmgct)” should be “kcmgrs, ktugct)”

p.51 2nd para (What’s happening …), fourth line: “consistent get – examination” should be “consistent gets - examination”.

Typos/Grammar/Style/Punctuation

p.28 Last note, line 6: “puts it into the redo log buffer, it’s just …” should be “puts it into the redo log. The creation of a commit record is just …”
p.34 Second note, line 4: “it’s quite possible that you’ve …” should be “It’s quite possible that you’re …”
p.36 Last line of last paragraph: “two rows of information” is missing a full-stop.
p.41 Diagram 3-3 The ITL for the block shows two XID entries which have been typed in incorrectly: 3.1843.9 and 5.1843.2. These look like {segment}.{sequence}.{slot}, when the order is actually {segment}.{slot}.{sequence}. (It’s also a little unlikely that two completely unrelated transactions should happen to get the same sequence number on their undo slots.)
p.53 Last line of main text: “the uel value changes to 0x0017″ is missing a full-stop

Back to Index

63 Comments »

  1. Hi Jonathan,

    While going through(doing testing parallely) the chapter 3,I noticed that Oracle is dumping block from the disk in version 11.2.0.Did you notice it while writing this chapter?How does it affect following the concepts,if we checkpoint prior dumping the block?

    Thanks

    Comment by Antony — September 22, 2011 @ 6:51 pm GMT Sep 22,2011 | Reply

    • Antony,
      Yes, I had noticed that the behaviour of the dump command had changed. At one point I do mention that that I’ve deliberately issued checkpoint calls to make sure that I get a copy of the results I want to explain. This does affect the timing of some of the actions – dbwr causes private redo to be applied before it writes the block – but doesn’t really change the concepts I’m trying to explain.

      Comment by Jonathan Lewis — September 25, 2011 @ 10:15 am GMT Sep 25,2011 | Reply

  2. Also,I don’t see the Flag is getting updated with “U”,everytime after I commit.

    Itl entry before commit:

    Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x000a.01b.0000080d  0x00c00c35.02e9.2b  ----    1  fsc 0x0000.00000000
    0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
    

    Itl entry after commit:

     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x000a.01b.0000080d  0x00c00c35.02e9.2b  ----    1  fsc 0x0000.00000000
    0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
    

    But x$ktuxe always shows the correct value..

    When the transaction is active..

         INDX KTUXESTA         KTUXECFL                      WRAP#       SCNW       SCNB   DBA_FILE  DBA_BLOCK        NUB
    --------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ----------
           27 ACTIVE           NONE                           2061          0    2184727          3       3125          1
    

    after commit..

          INDX KTUXESTA         KTUXECFL                      WRAP#       SCNW       SCNB   DBA_FILE  DBA_BLOCK        NUB
    ---------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ----------
           27 INACTIVE         NONE                           2061          0    2186715          3       3125          1
    

    Comment by Antony — September 22, 2011 @ 7:02 pm GMT Sep 22,2011 | Reply

    • Antony,

      Historically (roughly 7.3 and before, I think) the only change on commit was the change to the undo segment header block (which is the thing displayed in x$ktuxe). When commit cleanout was introduced it applied only to a limited number of the blocks affected by a transaction – it’s possible that you’ve modified more blocks than Oracle is prepared to modify on commit, the number is a percentage of the buffered blocks.

      (It’s always possible that 11.2 has modified the algorithm in a way that didn’t happen to show up in my tests but shows up in yours for some reason related to configuration.)

      Comment by Jonathan Lewis — September 25, 2011 @ 10:21 am GMT Sep 25,2011 | Reply

  3. Hi Jonathan,

    On FAST COMMIT – the session wont remove the LOCK info from the ROW (header) – it just updates the status to ‘U’ in ITL entry – In that case how the LOCK info will be removed ? Is it done by next SELECT on that block ? in that case – it is nothing but again ‘Delayed commit cleanout’ only -right ?
    It seems this bit is not mentioned in the BOOK – Am I missing something here ?

    Comment by berusadla — October 14, 2011 @ 3:56 pm GMT Oct 14,2011 | Reply

    • Berusadla,

      Pavol has given you some answers already, and I believe the answer to your questions is a little further on in the book, but:
      There are two different strategies: delayed block cleanout, and delayed logging block cleanout.

        Delayed block cleanout appears when NOTHING has changed on the commit and the next session to read the block cleans up the entire mess – even on a select.

        Delayed logging block cleanout appears when the fast commit has set the status to ‘U’ – and the next session to modify the block cleans up the rest of the mess and generates the redo that “should have” happened on the fast commit.

      There are some notes here that might also be worth reading.

      Comment by Jonathan Lewis — November 13, 2011 @ 12:09 pm GMT Nov 13,2011 | Reply

  4. Berusadla,

    -U– flag in blok dump means the “upper bound commit flag” is set. When commit is issued, oracle performs block cleanout on modified buffers in the buffer cache (there are some limitations, however, as Tom Kyte suggested several years ago). The block is cleaned, but cleanout does not generate additional redo and that’s the purpose of the U flag. Next SELECT of the block generates redo and clears the U flag. So it is not exactly delayed block cleanout (it does not have to check the transaction table in relevant undo segment header etc.). I think I have already discussed this behaviour with Jonathan, but not able to find where

    Regards
    Pavol Babel

    Comment by Pavol Babel — November 5, 2011 @ 8:34 pm GMT Nov 5,2011 | Reply

    • Moreover, Jonathan’s new book is mentioning this behviour, of course. Unfortunately, I do not have much time for reading new excellent Oracle Core book, I didn’t get further than Chapter 3, Page 19 and Jonathan is mentioning commit cleanout on Page 25.

      Regards
      Pavol Babel

      Comment by Pavol Babel — November 6, 2011 @ 11:41 pm GMT Nov 6,2011 | Reply

  5. hi Jonathan,

    Wanna point out an error in below sentence near the start of the session “Consistent Doesn’t Mean Historic”
    Notice, by the way, that ITL 0x01 is showing the effects of a commit clean out (–U-, 1, fsc) even though the undo record that we eventually retrieved from the next transaction to use this ITL entry stored (C—,0,fsc). <– here at the last, should be "to use this ITL entry stored (C—,0,scn)"

    as you explain, as below
    "This is an example of delayed logging block clean out. The next transaction didn’t copy the information that was in the ITL entry, but rather stored the information that would have been in the ITL entry if Oracle had cleaned it out properly the first time around."

    here comes my test, below is undo block dump segment for the uba of the transaction comes from session 3, the op: L shows the "C— 0 scn"

    uba: 0x00800a58.092a.07 ctl max scn: 0x0000.008af284 prv tx scn: 0x0000.008af2aa
    txn start scn: scn: 0x0000.00000000 logon user: 56
     prev brb: 8391252 prev bcl: 0
    KDO undo record:
    KTB Redo
    op: 0x04  ver: 0x01
    op: L  itl: xid:  0x0006.010.00000691 uba: 0x008007c1.0a6b.1a
                          flg: C---    lkc:  0     scn: 0x0000.008af991
    KDO Op code: URP row dependencies Disabled
      xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x010426d5  hdba: 0x010426d3
    itli: 1  ispac: 0  maxfr: 4858
    tabn: 0 slot: 2(0x2) flag: 0x2c lock: 0 ckix: 95
    ncol: 2 nnew: 1 size: 0
    Vector content:
    col  1: [ 2]  c1 04
    

    Comment by Sid — November 30, 2011 @ 3:22 pm GMT Nov 30,2011 | Reply

  6. Hi Jonathan,

    I found the delayed block cleanout did happen on the Parallel query on 10.2.0.4 on OS X 10.6.6. when i test the core_cleanout_3.sql.
    I did use dbms_xplan.display to make sure that the query is parallel.
    The spool file shows that the first query generate redo, 500 call to ktugct, around around 1000 consistent reads
    the subsequent query did not generate redo, no calls to ktugct, 500 around consistent read.
    so i pretty sure that the parallel query do the delayed block cleanout as well in this version.
    below the whole spool file

    oe@CS10G> 
    oe@CS10G> alter system flush buffer_cache;
    
    System altered.
    
    oe@CS10G> commit;
    
    Commit complete.
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.start_snap
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> select
      2  	     /*+ parallel(t1 2) full(t1) */
      3  	     count(*)
      4  from
      5  	     t1
      6  ;
    
      COUNT(*)
    ----------
           500
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.end_snap
    ---------------------------------
    System stats:-  02-Dec 01:22:41
    Interval:-      0 seconds
    ---------------------------------
    Name                                                                     Value
    ----                                                                     -----
    logons cumulative                                                            2
    opened cursors cumulative                                                    8
    user calls                                                                  17
    recursive calls                                                             32
    recursive cpu usage                                                          2
    session logical reads                                                    1,095
    CPU used when call started                                                   5
    CPU used by this session                                                     2
    DB time                                                                     32
    user I/O wait time                                                          20
    session uga memory                                                     121,360
    session uga memory max                                                 448,352
    messages sent                                                                1
    messages received                                                            1
    session pga memory                                                   1,186,992
    session pga memory max                                               1,121,456
    enqueue requests                                                            11
    enqueue conversions                                                          4
    enqueue releases                                                            11
    physical read total IO requests                                             82
    physical read total multi block requests                                    76
    physical read total bytes                                            4,161,536
    physical write total IO requests                                             1
    physical write total multi block requests                                    1
    physical write total bytes                                               1,024
    db block gets                                                                4
    db block gets from cache                                                     4
    consistent gets                                                          1,091
    consistent gets from cache                                               1,091
    consistent gets - examination                                              507
    physical reads                                                             508
    physical reads cache                                                       508
    physical read IO requests                                                   82
    physical read bytes                                                  4,161,536
    db block changes                                                           504
    redo synch writes                                                            1
    free buffer requested                                                      508
    commit cleanouts                                                             1
    commit cleanouts successfully completed                                      1
    physical reads cache prefetch                                              426
    shared hash latch upgrades - no wait                                         1
    calls to kcmgas                                                              1
    calls to get snapshot scn: kcmgss                                           88
    redo entries                                                               502
    redo size                                                               36,716
    redo wastage                                                               320
    redo writes                                                                  1
    redo blocks written                                                          2
    redo subscn max counts                                                     501
    undo change vector size                                                    208
    no work - consistent read gets                                               3
    cleanouts only - consistent read gets                                      500
    immediate (CR) block cleanout applications                                 500
    deferred (CURRENT) block cleanout applications                               1
    commit txn count during cleanout                                           500
    cleanout - number of ktugct calls                                          500
    table scans (long tables)                                                   26
    table scans (rowid ranges)                                                  26
    table scan rows gotten                                                     500
    table scan blocks gotten                                                   503
    table fetch by rowid                                                         2
    rows fetched via callback                                                    2
    index fetch by key                                                           3
    session cursor cache hits                                                    2
    queries parallelized                                                         1
    DFO trees parallelized                                                       1
    Parallel operations not downgraded                                           1
    PX local messages sent                                                      62
    PX local messages recv'd                                                    62
    buffer is not pinned count                                                   4
    workarea executions - optimal                                                1
    parse time elapsed                                                           5
    parse count (total)                                                          8
    parse count (hard)                                                           1
    execute count                                                                9
    bytes sent via SQL*Net to client                                         1,274
    bytes received via SQL*Net from client                                   1,462
    SQL*Net roundtrips to/from client                                            5
    sorts (memory)                                                               1
    sorts (rows)                                                                 4
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.start_snap
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> select
      2  	     /*+ parallel(t1 2) full(t1) */
      3  	     count(*)
      4  from
      5  	     t1
      6  ;
    
      COUNT(*)
    ----------
           500
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.end_snap
    ---------------------------------
    System stats:-  02-Dec 01:22:41
    Interval:-      0 seconds
    ---------------------------------
    Name                                                                     Value
    ----                                                                     -----
    logons cumulative                                                            2
    opened cursors cumulative                                                    6
    user calls                                                                  17
    recursive calls                                                             11
    recursive cpu usage                                                          1
    session logical reads                                                      590
    CPU used when call started                                                   2
    CPU used by this session                                                     1
    DB time                                                                      3
    session uga memory                                                      55,952
    session uga memory max                                                 448,352
    session pga memory                                                   1,121,456
    session pga memory max                                               1,121,456
    enqueue requests                                                             8
    enqueue conversions                                                          4
    enqueue releases                                                             8
    consistent gets                                                            590
    consistent gets from cache                                                 590
    consistent gets - examination                                                6
    calls to get snapshot scn: kcmgss                                           85
    no work - consistent read gets                                             503
    table scans (long tables)                                                   26
    table scans (rowid ranges)                                                  26
    table scan rows gotten                                                     500
    table scan blocks gotten                                                   503
    table fetch by rowid                                                         2
    rows fetched via callback                                                    2
    index fetch by key                                                           2
    session cursor cache hits                                                    3
    cursor authentications                                                       1
    queries parallelized                                                         1
    DFO trees parallelized                                                       1
    Parallel operations not downgraded                                           1
    PX local messages sent                                                      62
    PX local messages recv'd                                                    62
    buffer is not pinned count                                                   4
    workarea executions - optimal                                                1
    parse count (total)                                                          6
    execute count                                                                7
    bytes sent via SQL*Net to client                                         1,302
    bytes received via SQL*Net from client                                   1,462
    SQL*Net roundtrips to/from client                                            5
    sorts (memory)                                                               1
    sorts (rows)                                                                 4
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.start_snap
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> select
      2  	     /*+ parallel(t1 2) full(t1) */
      3  	     count(*)
      4  from
      5  	     t1
      6  ;
    
      COUNT(*)
    ----------
           500
    
    oe@CS10G> 
    oe@CS10G> execute snap_stats.end_snap
    ---------------------------------
    System stats:-  02-Dec 01:22:41
    Interval:-      0 seconds
    ---------------------------------
    Name                                                                     Value
    ----                                                                     -----
    logons cumulative                                                            2
    opened cursors cumulative                                                    6
    user calls                                                                  17
    recursive calls                                                             11
    session logical reads                                                      590
    DB time                                                                      3
    session uga memory                                                      55,952
    session uga memory max                                                 448,352
    session pga memory                                                   1,121,456
    session pga memory max                                               1,121,456
    enqueue requests                                                             8
    enqueue conversions                                                          4
    enqueue releases                                                             8
    consistent gets                                                            590
    consistent gets from cache                                                 590
    consistent gets - examination                                                6
    calls to get snapshot scn: kcmgss                                           85
    no work - consistent read gets                                             503
    table scans (long tables)                                                   26
    table scans (rowid ranges)                                                  26
    table scan rows gotten                                                     500
    table scan blocks gotten                                                   503
    table fetch by rowid                                                         2
    rows fetched via callback                                                    2
    index fetch by key                                                           2
    session cursor cache hits                                                    4
    queries parallelized                                                         1
    DFO trees parallelized                                                       1
    Parallel operations not downgraded                                           1
    PX local messages sent                                                      62
    PX local messages recv'd                                                    62
    buffer is not pinned count                                                   4
    workarea executions - optimal                                                1
    parse count (total)                                                          6
    execute count                                                                7
    bytes sent via SQL*Net to client                                         1,302
    bytes received via SQL*Net from client                                   1,462
    SQL*Net roundtrips to/from client                                            5
    sorts (memory)                                                               1
    sorts (rows)                                                                 4
    
    PL/SQL procedure successfully completed.
    
    oe@CS10G> 
    oe@CS10G> 
    oe@CS10G> spool off
    

    Comment by Sid — December 1, 2011 @ 5:35 pm GMT Dec 1,2011 | Reply

    • Sid,

      That surprised me, and took me a few minutes to work out.

      If you look at the first set of statistics you can see that the query did run parallel: (DFO trees parallelized, Parallel operations not downgraded, PX local messages setna dn recv’d). However you didn’t do any ‘physical reads direct’ – so your parallel query went into the cache, which is why it was possible for Oracle to do the block cleanout.

      This will happen if the table is small compared to the buffer cache – and when I created the test I was using a small instance, probably with a buffer cache of less than 200MB. You could try repeating the test with a much small buffer cache, or you could modify the data generator to make the table at least 2% of the size of your cache. (The effect is dictated by the value of the parameter _small_table_threshold, which defaults to 2).

      Comment by Jonathan Lewis — December 1, 2011 @ 7:19 pm GMT Dec 1,2011 | Reply

  7. Hi Jonathan,

    Thanks for pointing out the parameter _small_table_threshhold.

    I set the sga_target to 200m, with buffer cache size around 108M, the parallel query statistics is as expected. Except there is one tricky, the first parallel query still generate 2 redo entry. After dump the redo log, I found it starts a transaction to update a record the table seq$, so i wonder if anything is related to sequence.

    i copy the first parallel statistics here, the redo entry for the change on seq$ as below.(the snap_*, and dump* procedure in the source code is very convenient, they save me a lot of time. thanks for that. is there any way to format the code in the comment, don’t want to bother you again and again)

       COUNT(*)
    ----------
           500
    
    ---------------------------------
    System stats:-	02-Dec 18:52:20
    Interval:-	0 seconds
    ---------------------------------
    Name									 Value
    ----									 -----
    logons cumulative							     2
    opened cursors cumulative						    12
    user calls								    17
    recursive calls 							    98
    recursive cpu usage							     4
    session logical reads							 1,079
    CPU used when call started						    12
    CPU used by this session						     6
    DB time 								   275
    concurrency wait time							     5
    user I/O wait time							    21
    session uga memory							55,952
    session uga memory max						       727,200
    messages sent								     3
    messages received							     3
    session pga memory						     1,645,744
    session pga memory max						     1,645,744
    enqueue requests							    17
    enqueue conversions							     4
    enqueue releases							    19
    physical read total IO requests 					    96
    physical read total multi block requests				    88
    physical read total bytes					     4,186,112
    physical write total IO requests					     1
    physical write total multi block requests				     1
    physical write total bytes						 1,024
    db block gets								     4
    db block gets from cache						     4
    consistent gets 							 1,075
    consistent gets from cache						   572
    consistent gets - examination						   515
    consistent gets direct							   503
    physical reads								   511
    physical reads cache							     8
    physical reads direct							   503
    physical read IO requests						    96
    physical read bytes						     4,186,112
    db block changes							     4
    redo synch writes							     1
    free buffer requested							     8
    commit cleanouts							     1
    commit cleanouts successfully completed 				     1
    shared hash latch upgrades - no wait					     3
    calls to kcmgas 							     1
    calls to get snapshot scn: kcmgss					    66
    redo entries								     2
    redo size								   680
    redo wastage								   312
    redo writes								     1
    redo blocks written							     2
    redo write time 							    13
    redo subscn max counts							     1
    undo change vector size 						   208
    no work - consistent read gets						     5
    cleanouts only - consistent read gets					   500
    immediate (CR) block cleanout applications				   500
    deferred (CURRENT) block cleanout applications				     1
    commit txn count during cleanout					   500
    cleanout - number of ktugct calls					   500
    table scans (long tables)						    26
    table scans (rowid ranges)						    26
    table scans (direct read)						    26
    table scan rows gotten							   500
    table scan blocks gotten						   503
    table fetch by rowid							     6
    rows fetched via callback						     5
    index fetch by key							     6
    index scans kdiixs1							     1
    sql area evicted							    25
    session cursor cache hits						     3
    session cursor cache count						     1
    cursor authentications							     1
    queries parallelized							     1
    DFO trees parallelized							     1
    Parallel operations not downgraded					     1
    PX local messages sent							    62
    PX local messages recv'd						    62
    buffer is not pinned count						    12
    workarea executions - optimal						     1
    parse time cpu								     1
    parse time elapsed							    22
    parse count (total)							    11
    parse count (hard)							     3
    execute count								    13
    bytes sent via SQL*Net to client					 1,302
    bytes received via SQL*Net from client					 1,462
    SQL*Net roundtrips to/from client					     5
    sorts (memory)								     1
    sorts (rows)								     4
    
    PL/SQL procedure successfully completed.
    
    
    
    CHANGE #3 TYP:2 CLS: 1 AFN:1 DBA:0x004001f2 OBJ:68 SCN:0x0000.008cad93 SEQ:  1 OP:11.5
    KTB Redo
    op: 0x11  ver: 0x01
    op: F  xid:  0x0006.002.0000069c    uba: 0x00800b5d.0a6c.1e
    Block cleanout record, scn:  0x0000.008cb165 ver: 0x01 opt: 0x02, entries follow...
      itli: 1  flg: 2  scn: 0x0000.008cad90
      itli: 2  flg: 2  scn: 0x0000.008cad93
    KDO Op code: URP row dependencies Disabled
      xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x004001f2  hdba: 0x004001f1
    itli: 1  ispac: 0  maxfr: 4863
    tabn: 0 slot: 9(0x9) flag: 0x2c lock: 1 ckix: 0
    ncol: 10 nnew: 9 size: 0
    Vector content:
    col  1: [ 2]  c1 02
    col  2: [ 2]  c1 02
    col  3: [ 5]  c4 05 1e 32 44
    col  4: [ 2]  c1 02
    col  5: [ 1]  80
    col  6: [ 1]  80
    col  7: [ 2]  c1 56
    col  8: [32]
     2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d
     2d 2d 2d 2d 2d 2d 2d
    col  9: [ 1]  80
    
    
    oe@CS10G&gt; select owner, object_type, object_id, data_object_id, object_name, last_ddl_time
      2  from all_objects
      3  where object_id = &amp;1
      4  /
    
    OWNER			       OBJECT_TYPE	    OBJECT_ID DATA_OBJECT_ID OBJECT_NAME		    LAST_DDL_TIME
    ------------------------------ ------------------- ---------- -------------- ------------------------------ -------------------
    SYS			       TABLE			   68		  68 SEQ$			    2009-04-02 11:11:31
    

    Comment by Sid — December 2, 2011 @ 11:38 am GMT Dec 2,2011 | Reply

  8. Sid,

    If you enabled SQL_trace for this test, you would have seen that the session executed the following statements just before running the parallel query:

    SELECT ORA_TQ_BASE$.NEXTVAL FROM DUAL
    
    update seq$ 
    set
    	increment$=:2, minvalue=:3, maxvalue=:4, cycle#=:5,
    	order$=:6, cache=:7, highwater=:8, audit$=:9, flags=:10 
    where 
    	obj#=:1
    

    This is getting a sequence value relating to the use of table queues (TQ) for parallel queries.

    I think I said a couple of times when reporting stats that I was ignoring “minor variations”, partly this was to avoid throwing in too many distractions as I told the story. Of course, it’s perfectly okay to use a medium like a blog to add in a few extra details around the edges.

    For formatting: ‘sourcecode’ and ‘/sourcecode’, but in square brackets rather than single quotes, deals with the fixed font. If you want to get rid of the numbering then gutter=”false” after the first sourcecode does that. However, the numbering is quite useful here, as it makes it easy for me to refer to specific lines if I need to.

    If you wanted to show off, by the way, you could have taken the DBA from line 103, and the slot from line 113, and got the sequence name by executing:

    select 
    	obj.name 
    from 
    	seq$	seq,
    	obj$	obj
    where 
    	seq.rowid = (
    		select
    			dbms_rowid.rowid_to_extended(
    				'000001f2.0009.0001',	-- block.rowslot.file
    				'SYS',
    				'SEQ$',
    				0
    			)
    		from dual
    	)
    and	obj.obj# = seq.obj#
    ;
    
    
    

    Comment by Jonathan Lewis — December 2, 2011 @ 1:01 pm GMT Dec 2,2011 | Reply

  9. Jonathan,

    At the end of the section “The Interested Transaction List”, the last setence:
    finally, record 3 says “clear slot (row) 0x01 from the block and change ITL entry 1 to ‘no previous use.'”
    it should be slot(row)0x00 as the first changed row.

    after finishing the first round reading, i pick up the book and read randomly. it’s fun to find still much to test and think about it.

    *-----------------------------
    * Rec #0x29  slt: 0x2f  objn: 53227(0x0000cfeb)  objd: 53227  tblspc: 7(0x00000007)
    *       Layer:  11 (Row)   opc: 1   rci 0x00
    Undo type:  Regular undo    Begin trans    Last buffer split:  No
    Temp Object:  No
    Tablespace Undo:  No
    rdba: 0x00000000
    *-----------------------------
    uba: 0x00800275.087c.26 ctl max scn: 0x0000.0099d3a3 prv tx scn: 0x0000.0099d3eb
    txn start scn: scn: 0x0000.009a30d6 logon user: 56
     prev brb: 8389232 prev bcl: 0
    KDO undo record:
    KTB Redo
    op: 0x03  ver: 0x01
    op: Z
    KDO Op code: DRP row dependencies Disabled
      xtype: XA flags: 0x00000000  bdba: 0x0180018a  hdba: 0x01800189
    itli: 1  ispac: 0  maxfr: 4863
    tabn: 0 slot: 0(0x0)
    
    

    Comment by Sid — December 13, 2011 @ 4:26 pm GMT Dec 13,2011 | Reply

    • Sid,

      That’s another one for the Errata list, thank you.
      You’re doing a very good job of repeating my examples – allowing for differences between the content of 10g and 11g undo records, and different values for the block and record addresses, your trace dump is virtually identical to mine.

      Comment by Jonathan Lewis — December 14, 2011 @ 12:13 pm GMT Dec 14,2011 | Reply

  10. Jonathan,

    Perhaps typo on p51 Table 3-3, and p52 Table 3-4,

    should be:

    Session 2 “…transactions against table t1″

    Comment by Merrill B Lamont Jr — December 20, 2011 @ 11:32 am GMT Dec 20,2011 | Reply

    • Merrill,

      No. Applying the transactions to an alternative table is a necessary part of these examples, because it demonstrates the fact that the amount of work you have to do to deal with “transaction table read consistency” can be affected by things going on in a completely different part of the database.

      Comment by Jonathan Lewis — December 23, 2011 @ 8:51 am GMT Dec 23,2011 | Reply

  11. Jonathan,
    Few non important typos in this Chapter
    Page 40: 5th line starting from the top: 0x1f8d I think should be 0x1f8e
    Page 38: 3rd line in the Note §: which records the SCN as at the most recent…. Either as or at
    Page 34: 4th line from the bottom: It’s quite possible you’ve using one of… you’ve been using

    There are other minors typos that I believe are not worth mentioning

    It’s fantastic to know that the step that completes the rollback is a commit!!! You’ve mentioned this in Chapter 2 where you wrote that you will spend more time on this in Chapter 3. In what page of Chapter 3 you’ve shown this evidence?

    Comment by hourim — February 16, 2012 @ 12:44 pm GMT Feb 16,2012 | Reply

    • Mohamed,

      Thanks for pointing out the errors. I’ve added a couple of corrections to the “typos” list. The “as at”, however, is deliberate – though I have to admit that it is probably an indication of my slightly quaint and old-fashioned use of English. The phrase “as at {a certain point in time}” is valid use. (Though the one thing I certainly wouldn’t say is “as at the current moment in time” when I meant “at present”.)

      Feel free to mention other little typos in the comments. When I have time I’ll add them to the list above and delete the comments.

      I suspect that I never got around to including that evidence of the rollback ending with a commit in Chapter 3, though there is a comment on page 42 about redo change vectors with an Op Code of 5.4, and an explanatory note on page 126 of chapter 6. I’m just running up a little example now.

      Comment by Jonathan Lewis — February 26, 2012 @ 11:09 am GMT Feb 26,2012 | Reply

  12. Hi Sir,

    I’m on page 48,49 reading ‘Delayed block Cleanout’, one thing i’m not able to understand is:

    You changed 500 blocks
    You cleaned the buffer cahce with alter system
    You make commit

    Since you have made commit but you also flushed all blocks (i’m assuming undo blocks too) from buffer cache. So how you commit record is generated? there were no undo blocks(undo block header) also in buffer cache which could have updated their Transaction Table with commit record. correct?

    But later on Page 49 at Para 3 you wrote “Our session reads the correct undo segment header block to check the state of the transaction (its possible it finds that it checks v$transaction before visiting the block,but that’s only a conjecture), at which point it finds that transaction has committed ” but undo blocks (undo header block) where not in memory and how could the process updated the Transaction table to know whether transaction is committed?

    I’m assuming when commit happens in this case, undo blocks is read from disk and updates its Transaction Tabe slot and generates the commit record.

    Please help me in clearing this thought.

    Regards
    Ranjit

    Comment by Ranjit Nagi — April 6, 2012 @ 12:07 pm GMT Apr 6,2012 | Reply

    • Ranjit,

      Your deduction is correct. If the undo segment header has been flushed from memory then it has to be read back from disc into memory when the session commits – it’s the only way that the transaction table slot could be updated.

      Comment by Jonathan Lewis — April 7, 2012 @ 8:11 pm GMT Apr 7,2012 | Reply

  13. This question was originally posted on the home page for the book, before I copied it to the appropriate chapter:


    Jonathan,
    Greetings from France.
    I am reading the Oracle Core book I purchased a few days ago ( with the 50% discount, thanks ! ).
    I have been working as a oracle dba for 7 years in production.
    I am trying to read carefully your book and I need to ask you something before going further.
    1) On page 42 ( point 3 ), you explain how Oracle reverses changes from any uncommitted transaction.
    I was wondering if this reversing is systematic or is it because there is a session which issued a “set transaction read only” command ?
    In case someone commits immediately after the reversing, is Oracle obligated to “re reverse” the changes ?
    2) What do you mean by the target SCN on the same page ( point 4 ) ? Where is it kept at that particular time ?
    Thank you very much
    I appreciate reading such an accurate book on the inner mechanisms of Oracle.
    It gives me another approach of the database and will surely help me understand better other parts of the software.

    Regards
    Jean-michel, Nemours, FRANCE

    Comment by Jonathan Lewis — May 24, 2012 @ 10:41 am GMT May 24,2012 | Reply

    • Jean-Michel.

      1) At this point you are trying to make all blocks consistent with committed data at the point in time where your query started. Consequently you do not want to see any data that is (a) currently uncommitted, or (b) is committed from a transaction that started after your query started. So, in your copy of the block, you reverse out any uncommited changes – and if the transaction commits, or makes further changes, subsequently that will happen to the original block not your copy. The reversal is systematic.

      2) The target SCN is the SCN at which you started the query (or transaction if you’ve done a set transaction read only, or gone into isolation level serializable). You will have copied this from the central SCN variable in the SGA to your local memory – possibly to the local state object for the specific query.

      Comment by Jonathan Lewis — May 24, 2012 @ 10:47 am GMT May 24,2012 | Reply

  14. Thank you Jonathan. That’s ok for me now.
    I am going to keep on reading.
    Jean-michel

    Comment by JEAN-MICHEL — June 1, 2012 @ 4:29 pm GMT Jun 1,2012 | Reply

  15. Hello,

    While reading chapter 3 I came across two questions which I believe I can now answer myself after reading Jonathan’s post “Clean it up”, which is referenced above in the comment from 12:09 pm UTC Nov 13,2011. I decided to post the questions (and my presumed answers) here with two ideas in mind. First to ask Jonathan to review my answers and give me feedback in case I misunderstood anything and second in case it may be useful for other readers who ask themselves the same questions.

    The crucial point I read in the post “Clean it up” and which I missed in the book is that the work left over by a commit cleanout is completed during a delayed logging block cleanout by the next modification of (not access to) the block. This is different from a delayed block cleanout (if no commit cleanout happened), which is performed by any subsequent access to the block.

    Now my questions were:

    1) In the sequence of events listed in paragraph “Creating Consistency”, why isn’t the ITL for the committed transaction 0x000a.00e.00001b93 cleaned out, as stipulated in point 2 on page 42?

    2) The note on page 46 says “..There’s also the point that the full cleanout would have to be logged…” – why doesn’t the commit cleanout (nevertheless a change of the block) have to be logged while the complete cleanout (which basically only differs from the commit cleanout by resetting the lock bytes to 0 for the corresponding rows) needs to be logged?

    I came to the following answers:

    1) The commit of transaction 0x000a.00e.00001b93 performed a commit cleanout on the block (flag is set to –U-, scn is set). Thus the cleanout will be completed during a delayed logging block cleanout during the next modification of the block. The scenario in the paragraph “Creating Consistency” does however not include such a subsequent modification (but only a query reading the block). It might therefore be more exact to rephrase point 2 on page 42 as “Apply cleanout to any committed transactions that did not perform a commit cleanout on the block”.

    2) Commit cleanout does not need to be logged, because in case of a crash after the commit cleanout happened, the block will be restored to a state as if no commit cleanout had happened (as it may be the case e.g. if the buffer had been flushed from the cache before the commit). Thus a subsequent access to the block will do a delayed block cleanout and tidy up the mess.

    If a complete cleanout would not be logged however, imagine the following sequence of events

    i) transaction T1 using ITL slot S1 has locked one row R1 in the block.

    ii) transaction T1 commits and does a complete unlogged cleanout, i.e. the following unlogged changes happen to the block
    – in slot S1 the flag is set to C—
    – the SCN is set in slot S1
    – the number of locked rows (lck) is set to 0 (from 1) in slot S1
    – the lock byte is set to 0 for row R1

    iii) a new transaction T2 reuses slot S1 and locks row R2 in the block

    iv) the system crashes (or somebody issues shutdown abort).
    Now after restart of the instance and during recovery the transaction T1 will be replayed on the block, in particular the row R1 will have its lock byte point to ITL slot S1 – because it was not logged, the cleanout of transaction T1 will however not be replayed. Subsequently transaction T2 will be replayed on the block, overwriting slot S1 and setting the lock byte to point to ITL slot S1 for row R2. This leads to the number of locked rows in ITL slot S1 being 1, while rows R1 and R2 have their lock byte point to ITL slot S1. This is an inconsistency (unless R1=R2).

    I’d appreciate any corrections and/or comments on this.

    thank you, kind regards
    Martin

    Comment by Martin Maletinsky — September 5, 2012 @ 9:08 pm GMT Sep 5,2012 | Reply

    • Martin,

      I’ve edited a few blank lines into your comment as I think this makes it a little easier to read. I’ve also added in the URL to “clean it up”.

      It takes a little time to read through and work out the exact details – even though I wrote the original – so there may be a pause before my comment on question 2 appears, but your deduction relating question 1 is correct. The copy of the block we are creating is NOT a current copy, so it doesn’t require any logging as we modify it.

      Your question also highlights a couple of tiny details in use of words and sequence of actions:

      a) I used the expresion “applies cleanout” in my original expression, and your addition is basically saying that this would only be needed in the cases where “commit cleanout” had not occured. Technically my error was that I used the term “cleanout” at all – the only thing Oracle needs to acquire is the commit SCN for commited transactions. So my description (in the paragraph text) that its first step is to “find out if the transaction is still active” really ought to split hairs a little more and say “find out if the transaction is still active and acquire its commit SCN if it isn’t”.

      b) it’s possible that when Oracle reads this block and sees that ITL 1 has been subject to commit cleanout (status U) it does go and find the locked rows for that transaction in the block and finish the cleanout (without logging, of course); but this is redundant work so I doubt if it happens. Nevertheless it’s a point I could have suggested made when trying to give a complete description of the process. There is a more important (and complex) detail associated with this U status, though, that doesn’t apply in this case. The U is the “upper bound commit” – it’s possible that at some point in time where the degree of concurrency has been higher and some long-running transactions have been active, that another process has done some work on this block to determine approximately when a transaction committed and got to the point where it has said “the transaction committed before X, and that’s good enough, I don’t need to know exactly when.” In this case the read-consistency work related to finding an ITL in status U might be more complex.

      Comment by Jonathan Lewis — September 6, 2012 @ 8:55 am GMT Sep 6,2012 | Reply

      • Hello Jonathan,

        Thanks a lot for your comments. Do I understand you correctly, that you are saying in a) the session reading the block will not perform any cleanout even in case no commit cleanout happened before, i.e. the session will not modify the current block’s ITL and not clear the affected row’s lock bytes?

        My understanding so far was, that if no commit cleanout happened, the next session reading this block will perform delayed block cleanout, i.e. write the commit SCN (or an upper bound thereof) into the ITL, update the flag of the ITL, reset the number of locked rows in the ITL to 0 and clear all row lock bytes in the block. I presumed that this modifications will happen on a current copy of the block, so that they are subsequently written into the data file and that they will be logged.
        I understood this from your post “Clean it up”/ paragraph “Block Cleanout / Delayed Block Cleanout” where you write “…At some later point in time another session may read one of those blocks […] This session will read the commit SCN from the transaction table slot, tidy up the block’s ITL entry, and clear all the related lock bytes …”. I understood the same from your example in the book, paragraph “Delayed Block Cleanout” starting on page 48, where a query after committing without commit cleanout generates 500 db block changes and 500 redo entries (resulting from the cleanout of the 500 table blocks as I understood). Also the note on page 50 suggests, that unless there is a direct path read, the delayed cleanout has to be performed only once per block and transaction (by the first session reading that block after the transaction commits), which in my opinion means that the changes related to cleanout must happen on a current copy of the block.
        Based on this understanding I would rephrase the first two steps in the list on page 42 as follows:
        1. Check if there are any committed transactions in the ITL for which no commit cleanout happened and if so, perform delayed cleanout for these transactions
        2. If there are any uncommitted transactions or transactions with a commit SCN higher than the target SCN, clone the block into another buffer, using the clone as the target for the next three steps.

        In b) you write that the “–U-” might result from a previous upper bound estimate of the commit SCN. Although you don’t write that explicitly in the book, I understood from the available examples, that “–U-” in the ITL flag results from a commit cleanout, while “C-U-” is written into the ITL flag when some process determines an upper bound SCN for the transaction and writes it into the ITL.

        Finally I am not quite sure about your comment in b), that the row lock bytes might be cleared without this change being logged. Do you mean they might be cleared in a current copy of the block and if so, wouldn’t the fact that this change isn’t logged lead to the recovery problem I described in my orignial post (example in question 2)? If you mean they are cleared in the read consistent copy being created by the query, what would be the benefit of clearing the lock bytes there?

        Thank you
        kind regards
        Martin

        Comment by Martin Maletinsky — September 6, 2012 @ 1:17 pm GMT Sep 6,2012 | Reply

    • Martin,

      Your deductions relating to question 2 are also correct.

      In effect, the “commit cleanout” was a “friendly” extra added in to minimise RAC (or OPS as it then was) problems on delayed block cleanout. It’s the delayed block cleanout that is the final arbiter, and the one that HAS to be logged for the type of reason that your example demonstrates.

      Comment by Jonathan Lewis — September 6, 2012 @ 9:10 am GMT Sep 6,2012 | Reply

  16. Hello Jonathan,

    I have a question regarding the note on page 34 “When Oracle is about to reuse an undo block, it doesn’t care about the previous content, so it doesn’t bother to read it from disk before reusing it …”.

    I understood that the incarnation number (“seq: “) of the block is incremented every time the block is reused (newed). How can Oracle increment this number without reading the undo block from disk – does it store a mapping undo block -> current incarnation number somewhere in order to do so?

    thank you
    kind regards
    Martin

    Comment by Martin Maletinsky — September 6, 2012 @ 3:00 pm GMT Sep 6,2012 | Reply

    • Martin,

      That’s a very good question – and one that hadn’t crossed my mind.
      I can demonstrate the truth of the claim (or so I believe) – but I’ve never looked into how the mechanism can be made to work.

      A simple startng hypothesis is that part of the extent management information for the undo segment can hold the current sequence numbers of the entire segment – and since blocks from an extent are allocated in order we need only know that we have incremented the sequence number as we re-use the first block of the extent.

      This starting hypothesis leads to several more questions, though, such as: what happens (exactly) when one undo segment steals an extent from another, and what happens when the first segment steals it back – are there any scenarios where a block’s sequence number could appear to go backwards because it had been passed back and fore between a couple of segments ? (Some problems are independent of how Oracle knows what the next sequence number should be – so there’s plenty of scope for anyone who wants to start investigating.)

      Comment by Jonathan Lewis — September 6, 2012 @ 5:56 pm GMT Sep 6,2012 | Reply

      • Hello Jonathan,

        Do I understand correctly that the sequence numbers of blocks belonging to one undo segment (or to one extent?) differ at most by 1? One undo block might however be used by a long running transaction for an arbitrarily long time (without being newed) and thus keep a constant sequence number. This would then prevent all other blocks belonging to the same segment (or extent) from being re-used – wouldn’t that be a restriction with little benefit?

        If sequence numbers of an undo block could go backwards, wouldn’t that introduce a risk of wrong data being returned by a query? What I have in mind is that the sequence number of a block goes from N to M<N. By re-using (newing) the block repeatedly after that, the sequence number will eventually reach N again. At this time however, the block has a completely different content (and even belongs to a different undo segment).
        If there are any old references to the undo block in the form .=N. dating from the time before the block was stolen from iẗ́’s original segment, then it might not be recognized, that the original content has been overwritten (as the sequence number is N) and the new content might be used erroneously (e.g. to create a read consistent image of a block).

        kind regards
        Martin

        Comment by Martin Maletinsky — September 6, 2012 @ 8:06 pm GMT Sep 6,2012 | Reply

        • Martin,
          I’ve changed a couple of words in my comment to make the meaning clearer. The blocks in each EXTENT are used in order, so you need only know the current sequence number of each extent, and this information could be kept in the extent management section of the segment.

          A long running transaction DOES eventually stop all other blocks in the extent – and ultimately all other blocks in the segment – from being reused until the transaction commits. Before automatic undo management (which allows new segments to be created on demand) it used to be easy to lock up a system by executing – but not committing – a very small transaction and just leaving it uncommitted for a few hours.

          The point about sequence number appearing to go backwards was about hypothesis testing. If the sequence can appear to go backwards then the hypothesis probably has to be wrong for exactly the reason you describe. So having proposed the hypothesis I raised that question (as a thought experiment for other people to pursue) as one of the many questions you would have to address before you could claim it was a good hypothesis.

          Comment by Jonathan Lewis — September 6, 2012 @ 9:37 pm GMT Sep 6,2012

  17. Hello Jonathan,

    After finishing chapter 3, I tried to do the maths for the example in paragraph “Transaction Table Rollback”. Assuming a regular usage (sequential or round robin) of the available UNDO segments I expected the number of undo records applied to the transaction table to be over 1600 rather than 1395. Unless my assumption on a even usage of the available UNDO segments is wrong, I probably slightly misunderstood something in the concept. Let me therefore sketch how I calculate the number should be over 1600 so that you can tell me why I am wrong:

    Let’s denote by Slot1, Seg1 the slot and undo segment used by the initial transaction at time t1. Let further denote SCN1 the commit SCN of that initial transaction and SCN2 the target SCN for the final query, i.e. the SCN that was current at the moment t2 when session 1 set its isolation level to read only.
    When the 17’000 transactions executed by session 2 start, Slot1 in Seg1 is overwritten by the 341st of those transactions (10×34 available transaction table slots). At this moment SCN1 is written into the transaction control of Seg1. Assuming an even usage of the undo segments there are at most 9×34 (=306) further transactions, before another transaction uses a slot in Seg1 again. At this moment (ie. at the latest with the 648th transaction out of the 17’000 executed by session 2), SCN1 is overwritten in the transaction control of Seg1 with a value > SCN2 (because at this moment all slots in the transaction table of Seg1 have been used by one of the transactions executed by session 2). Because the transaction control of Seg1 now contains a value > SCN2, the final query will have to reverse the effects of this last transaction as well as of all subsequent transactions that will use slots in Seg1. There are still at least 16’352 transactions to go, out of which at least some 1600 will use slots in Seg1.
    This is why I expected to see 1600 undo records being applied to the transaction table of Seg1.

    Based on your comment (in the book and in your reply from September 6, 9:37) that undo segments may be created on demand, I suppose that might explain that less than 1/10 of the transactions finally use slots in Seg1, in case additional undo segments where created during execution of the 17’000 transactions. Can you confirm that or is there a flaw in my reasoning?

    Thank you
    kind regards
    Martin

    Comment by Martin Maletinsky — September 7, 2012 @ 10:26 am GMT Sep 7,2012 | Reply

    • Martin,

      Your basic argument is correct – and on earlier versions of Oracle you’d probably be able to predict the number (plus or minus 1, perhaps) because older versions tend to use manual undo management.

      However, with automatic undo management in place Oracle is allowed to take undo segments offline – which means it can’t do a round-robin through the undo segments otherwise every segment would be as busy as every other segment and there would be no candidates to go offline.

      Here’s an example from 11.2.0.3 of a snapshot of the rollback segment stats (v$rollstat) after going 5,000 small updates (with commit) in a pl/sql loop:

      USN   Ex Size K  HWM K  Opt K      Writes     Gets
      ----  -- ------  -----  -----      ------     ----
         1   1     64      0      0       77160      853
         2   0      0      0      0       98694     1086
         3   0      0      0      0      101436     1116
         4   1     64      0      0       77152      853
         5   1     64      0      0       98404     1087
         6   0      0      0      0      101110     1114
         7   1   1024   1024      0       77134      853
         8   0      0      0      0       98598     1086
         9   2    128      0      0      100952     1118
        10   0      0      0      0       77430      852
      

      The “Gets” column (for a case like this) is basically going to be 2 x transactions (one get to start the transaction, one get to complete it). As you can see the gets sum to 10,000 (and a tiny bit) but the number of gets per undo segment ranges from 850 to 1120, which is a pretty large percentage and in exactly the right ballpatk to explain the difference between the (17,000/10 – 34 = 1,666) that we might expect to see and the 1,395 that we actually got. (I got similar results, going as low as 840, on 10.2.0.3 and some really extreme variation on 9.2.0.8).

      Comment by Jonathan Lewis — September 7, 2012 @ 3:33 pm GMT Sep 7,2012 | Reply

  18. Hello Jonathan,

    I have some questions regarding the note on page 38, where you write there are two other SCNs (beside the ones recorded in the ITL entries), the “cleanout SCN” and the “last change SCN” as well as a sequence number for the last change SCN.
    1) In which situations is the “last change SCN” (and by consequence the sequence number) updated? Does it only track changes to the actual table data or to the meta data in the block as well – i.e. is it updated only when table rows are deleted/inserted/updated or for any physical changes to the block (e.g. application of undo, delayed block cleanout, commit cleanout, …)?
    2) What does Oracle use this three numbers for? I can imagine that the “last change SCN” might be used by DBWR to determine if it needs to post LGWR before writing the block into the data file? Do you know any other use of the “last change SCN” and do you know for what purpose the sequence number and the “cleanout SCN” are needed?

    Thank you
    kind regards
    Martin

    Comment by Martin Maletinsky — September 10, 2012 @ 7:14 pm GMT Sep 10,2012 | Reply

    • Martin,

      One of the difficulties of writing the book was deciding what not to write. It doesn’t matter how much I describe in how much detail there will always be someone asking a few more questions. This isn’t made any easier by the fact that tiny details change with versions (or releases) of Oracle.

      The sequence number associated with the last change SCN is simply there so that ANY changes made to a block while the instance is at that SCN can be seen to be applied in the correct order. If the sequence rolls over the single-byte limit (255) the instance SCN increases. There is a special value for the sequence number (and I can never remember whether it’s 0 or 255 – I think it’s zero) which is used to indicate corrupt block (ORA-01578). There have been changes in recent versions of Oracle (related in some way to private redo thread, and possibly not in all environments) that result in a block’s SCN changing much more frequently than it used when you update it repeatedly.

      The Cleanout SCN probably helps Oracle to limit the amount of work it does during read-consistent checks; for example, a check to the CSC may be sufficient to allow Oracle to record a “consistent get – no work done”; but I’d have to start working through a number of scenarios to figure out why this could work and convince myself that this was the case.

      Comment by Jonathan Lewis — September 11, 2012 @ 9:09 am GMT Sep 11,2012 | Reply

  19. In Commit Clean out and Delayed Block Cleanout , Commited buffers not written out by DBWR will undergo Commit Cleanout and Modified buffers which are written out by DBWR will undergo Delayed Block Cleanout by next process which reads them .

    In both the cases Redo will be generated because of db block change . Pls correct me if i am wrong .

    Thanks,
    Prashnath

    Comment by Prashanth — September 20, 2012 @ 8:18 am GMT Sep 20,2012 | Reply

    • Prashnath,

      Sorry about the delay in answering this question. You’re heading in the right direction, but your comment isn’t quite right.
      It’s important to remember that when a buffer is written to disc it isn’t cleared from memory (unless dbwr has been called to write it because a session needs some free buffers). With this though in mind:

      a) commit cleanout can occur if the block is in the buffer – whether or not it has been written to disc. In fact if I update the block, then something causes it to be flushed from memory, then you re-read it, and then I commit, I can still visit that (back) in-memory copy of the block and apply commit cleanout.

      b) delayed block cleanout occurs if commit cleanout hasn’t occured, and the block has been flushed from memory (which means it must have been written to disc); and it occurs the next time someone reads the block into memory. If I only committed after the block was flushed, that would be one reason why commit cleanout hadn’t happened; but possibly I didn’t even commit until the block had been flushed, possibly some other process was pinning the buffer at the moment that I committed and wanted to do a commit cleanout.

      Comment by Jonathan Lewis — December 31, 2012 @ 11:18 am GMT Dec 31,2012 | Reply

  20. Jonathan,

    Customer’s application hit performance issue due transaction table rollbacks after upgrading to 11.2.0.3 (there are some oddities to check in particular query which was slowed down by TTR). TTR took me few days/weeks to workout in 9iR2 5 or 6 years ago (unfortunately long time before you wrote oracle Core).
    I would like to check those oddities in 11.2.0.3, but my attempts to simulate TTR in simple way are failing in 11.2.0.3. Did you try to run your example in 11.2.0.3? (Page 52, Table 3-4). The “transaction table consistent read rollback” statistic is hardly ever increased on my site.

    Regards
    Pavol Babel

    Comment by Pavol Babel — October 6, 2012 @ 11:15 pm GMT Oct 6,2012 | Reply

    • Oh I made silly mistake in my test case, hence I can tell your script core_cleanout_4.sql will work on 11.2.0.3 too. No matter what errors I made, I would like to ask question. Oracle session creates private copy of transaction table in buffer cache (I guess it could keep it pinned), later applying undo record as needed. Does it keep private version of TT also cross subsequent calls?

      Interestingly, when oracle performs single session direct path read on table and repeat same query again (at least in same read only transaction), it won’t perform TTR any more, although there is still plenty of data blocks with dirty ITL slots placed on disks (because direct path read didn’t clear them).

      Regards
      Pavol Babel

      Comment by Pavol Babel — October 7, 2012 @ 10:49 pm GMT Oct 7,2012 | Reply

  21. On Page 41 in Figure 3-3 in the ITL List: The order of both XID entires is showing as undo segment.undo sequence number.undo slot (3.1843.9) . Should it be (3.9.1843)undo segment.undo slot. undo sequence number ? Thanks

    Comment by Mike Kinkade — October 12, 2012 @ 2:28 pm GMT Oct 12,2012 | Reply

    • Mike,

      Correct – I’ve got the two columns the wrong way round.
      (It’s also extremely unlikely that two slots from different segment would be at exactly the same sequence at the same time;)

      Comment by Jonathan Lewis — December 13, 2012 @ 1:43 pm GMT Dec 13,2012 | Reply

  22. Jonathan, could you explain the meaning of v$undostat columns in the context of this chapter?

    Comment by Wojciech — October 24, 2012 @ 2:17 pm GMT Oct 24,2012 | Reply

    • What’s not in the documentation that you think you need to know ?

      http://docs.oracle.com/cd/E11882_01/server.112/e25513/dynviews_3118.htm

      Comment by Jonathan Lewis — December 13, 2012 @ 1:57 pm GMT Dec 13,2012 | Reply

      • I think I am confused with columns UNXPSTEALCNT, UNXPBLKRELCNT, UNXPBLKREUCNT and EXPSTEALCNT, EXPBLKRELCNT, EXPBLKREUCNT. What this stealing of expired and non-expired blocks is about? I haven’t found it in the chapter.

        And regarding ACTIVEBLKS, UNEXPIREDBLKS and EXPIREDBLKS, what exactly are active, unexpired and expired blocks in the context of transaction managment?

        Comment by Wojciech — January 6, 2013 @ 12:35 pm GMT Jan 6,2013 | Reply

        • Wojciech,

          I don’t think v$undostat really fits in chapter 3 with transactions, it would be more appropriate in the context of chapter 2 (redo and undo) but I didn’t make any comments about how much space had to be allocated for keeping undo and how Oracle managed it.

          Here’s an outline that may help answer your questions. We can set (or Oracle can automatically adjust) and “undo retention” time with the intention that no undo block should be overwritten until the transaction(s) that generated the undo records in that undo block have been committed for at least that amount of time. Although this retention time is often spoken of in terms of keeping blocks the actually granularity of the check occurs at the extent level.

          Assume a session is changing load of data in a single transaction and the undo_retention is 900 seconds. Assume our transaction is quite large and generates a lot of undo very quickly. At some point we happen to fill the last available block in the current extent and move on to the next extent in the chain that makes up our undo segment, and find that the commit SCN for the most recently committed undo record in that extent happened only 12 minutes ago. This is the scenario where most of these columns are relevant.

          We can’t immediately overwrite the next extent, so we have to grow the segment – which is easy if there is free space in the undo tablespace, but if there isn’t we first check to see if there are any other segments which have “expired” extents, i.e. extents with no undo records that were committed less than 900 seconds ago. If we find one (and I believe we pick the oldest such extent we can find) then we have “stolen” an “expired” extent, (EXPSTEALCNT goes up by one) while the other segment has “released” “expired” blocks (EXPBLKRELCNT) – and we record the block count because UNDO extents can be of different sizes.

          If we find that no other segment has any expired blocks, we look for the extent with the oldest commit. This isn “unexpired” extent, and we steal it because we have to continue working — this gives us the UNXPSTEALCNT and UNXPBLKRELCNT. One bit of timing I’m not sure about (and would have to research) is whether we overwrite unexpired extents from our own segment before stealing extents from other segments – this overwriting of our own unexpired extents is the UNXPBLKREUCNT (counting blocks from the extent). Of course if we had had some expired extents in our own segment when we got to the end of the current segment we would simply have run the transaction on into the next (i.e. oldest) extent – and this would have shown up under EXPBLKREUCNT.

          As for ACTIVEBLKS, UNEXPIREDBLKS and EXPIREDBLKS, I haven’t looked closely at them – I’ve only glanced at the definition and guessed at what they mean and haven’t had to work out whether or not my guess is correct. I suspect that they are snapshots taken either at the beginning or end of the snapshot interval (10 minutes).

          Expired blocks are probably blocks in extents where the entire extent has been used and all transactions committed far enough in the past to be outside the retention time.

          Unexpired blocks are probably blocks in extents where the entire extent has been used but the extent has some transactions in it that committed in the recent past – i.e. less than “retention time” seconds ago.

          Active blocks are probably blocks in extents where there is a transaction that has used blocks in that extent and has not yet committed.

          Comment by Jonathan Lewis — January 6, 2013 @ 3:34 pm GMT Jan 6,2013

  23. Hello Jonathan! Greetings from Russia!

    On page 38 it is said “Finally, you can see from the uba that if you go to record 5 of block 11,976 of file 6 (which should have sequence number 0x0543)”. My question is where does “block 11,976″ as well as “file 6″ come from ? If you look at uba – 0x01802ec8.0543.05:

    05 – undo record number (comes from transaction table)
    0543 – sequence number

    0x01802ec8 to decimal gives us 25177800 (which is SCN I guess..). How do you manage to get that block and file number info?

    Thank you very much in advance!

    Comment by Roman Muschinskiy — April 30, 2013 @ 2:56 pm GMT Apr 30,2013 | Reply

    • … Answering my own question, apparently from Xid.. And 0x01802ec8 is data block address…

      Comment by Roman Muschinskiy — April 30, 2013 @ 5:28 pm GMT Apr 30,2013 | Reply

    • Roman,

      i only know from where 11976 came, if you open “calc” and use programmer calculator then choose HEX and put “2ec8″ then choose DEC to covert it to decimal it will give you 11976.

      Moreover you can find out segment id from xid and query dba_rollback_segs to find out the data file id for segment id of your transaction.

      Comment by vipankumarsharma — October 1, 2013 @ 10:51 am GMT Oct 1,2013 | Reply

  24. Hi Jonathan,

    I have few questions on read consistency.

    1. where is read consistent block image is created. In buffer cache or in PGA?

    2. while building read consistent block image, from where does oracle rebuild consistent block ITL information like Scn/Fsc?

    3. If read consistent block image is created in buffer cache, then my question is:
    i) How are the changes applied to block again which were done by some active transation?

    Thanks in advance!

    Comment by vipankumarsharma — September 30, 2013 @ 10:35 am GMT Sep 30,2013 | Reply

  25. Thanks Sir!, It really helps.

    Comment by vipankumarsharma — October 2, 2013 @ 12:27 pm GMT Oct 2,2013 | Reply

  26. Hi Jonathan,

    Few more questions on read consistency.

    1. How oracle knows from Rollback segment header if a transaction is committed or not?

    2. If another transaction is allocated the same slot in Rollback segment which was used by some transaction earlier but the data block to which this slot belonged to, was not tidied up by any transaction.
    Then how does Oracle protect Commit information from being overwritten by this new transaction?

    Regards,
    Vipan

    Comment by vipan — October 17, 2013 @ 7:22 am GMT Oct 17,2013 | Reply

    • Vipan,

      1) Page 28: “Similarly, when the transaction completes (typically through a commit; call), the session sets the state back to “free” (value 9)”
      2) Page 45: “Finally we come to the third linked list that runs through the undo records—the history of commit SCNs—and this is a topic that covers a lot of ground.”

      Comment by Jonathan Lewis — October 17, 2013 @ 8:29 am GMT Oct 17,2013 | Reply

  27. Hi Jonathan,

    From the undo segment blk dump, i could not find how oracle rollback ITL entries in data block?

    Like Uban “0x00c0309e.0114.04″ from ITL pointed me to the following undo record.

    *—————————–
    * Rec #0x4 slt: 0x10 objn: 76940(0x00012c8c) objd: 76940 tblspc: 4(0x00000004)
    * Layer: 11 (Row) opc: 1 rci 0x03
    Undo type: Regular undo Last buffer split: No
    Temp Object: No
    Tablespace Undo: No
    rdba: 0x00000000
    *—————————–
    KDO undo record:
    KTB Redo
    op: 0x02 ver: 0x01
    compat bit: 4 (post-11) padding: 0
    op: C uba: 0x00c0309e.0114.03
    KDO Op code: DRP row dependencies Disabled
    xtype: XA flags: 0x00000000 bdba: 0x01000257 hdba: 0x01000252
    itli: 1 ispac: 0 maxfr: 4858
    tabn: 0 slot: 2(0x2)

    Now from the above record, how oracle would come to know what ITL entry it has to change?

    Thanks for your time.

    Regards,
    Vipan

    Comment by vipankumarsharma — November 15, 2013 @ 9:48 am GMT Nov 15,2013 | Reply

    • I think this is described in the book somewhere (check the meaning of “op: C”)

      This looks like it wasn’t the undo for the first change this transaction made to the block, so it hasn’t had to record the previous content of an ITL entry; however you can see the penultimate line of your dump showing “itli” – this undo relates to the transaction occupying ITL slot number 1.

      Comment by Jonathan Lewis — November 29, 2013 @ 9:11 am GMT Nov 29,2013 | Reply

  28. Dear Sir,

    The TRN TBL slot for a transaction in undo segment header block is always updated by the commit no matter if oracle has to reread the undo segment header block again from disk…Is this right?

    Regards,
    Vipan

    Comment by vipankumarsharma — November 18, 2013 @ 6:32 am GMT Nov 18,2013 | Reply

    • Vipan,

      That is correct – it’s the only way that everyone else can know that the transaction has committed.
      It would be very rare for the undo segment header block not to be in memory, though; you’d probably have to “flush buffer_cache” to confirm my claim.

      Comment by Jonathan Lewis — November 29, 2013 @ 9:12 am GMT Nov 29,2013 | Reply

  29. Dear Sir,

    For SCN approximation, does oracle compare wrap#? If so then how oracle constructs wrap# from old transaction table slot information stored in first undo record for a transaction.

    Regards,
    Vipan

    Comment by vipankumarsharma — November 18, 2013 @ 8:49 am GMT Nov 18,2013 | Reply

    • Vipan,

      The SCN that Oracle stores is the complete SCN – i.e. the base# and the wrap#. Here’s an example of a transaction table slot where the wrap# is already large:

         0x07    9    0x00  0x010f  0xffff  0x0b86.07047451  0x008007c7  0x0000.000.00000000  0x00000001   0x00000000  1385716580
      

      and here’s a “first undo record of transaction” taken moments later, showing the saved ITL entry and the saved transaction slot entry with the full SCN

      uba: 0x00800018.0133.03 ctl max scn: 0x0b86.0704670a prv tx scn: 0x0b86.07046971
      txn start scn: scn: 0x0b86.070474bd logon user: 30
       prev brb: 8388626 prev bcl: 0
      KDO undo record:
      KTB Redo 
      op: 0x04  ver: 0x01  
      op: L  itl: xid:  0x0007.00c.000000c5 uba: 0x0080006e.00f0.13
                            flg: C---    lkc:  0     scn: 0x0b86.070474b9
      
      

      Comment by Jonathan Lewis — November 29, 2013 @ 9:23 am GMT Nov 29,2013 | Reply

  30. Hi Jonathan,
    p49 typo:
    “… at which point it finds that the transaction has committed and can see the commit time (kcmgrs, kcmgct).”
    s/b
    …. (kcmgrs, ktugct). [per your reference to this on the same page]
    Thanks, Merrill

    Comment by Merrill B Lamont Jr — December 4, 2013 @ 7:00 pm GMT Dec 4,2013 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,521 other followers