Oracle Scratchpad

January 3, 2011

Redo

Filed under: Infrastructure,redo — Jonathan Lewis @ 7:40 pm BST Jan 3,2011

A couple of days ago I published a link to some comments I had made on OTN about differences in redo generation between 10g and earlier versions of Oracle. This raised a few questions that suggested a follow-up (or perhaps “prequel”) note might be a little helpful. So I’ve created a simple SQL script to help demonstrate the differences and show how some of the pieces hang together.

The whole posting is a little lengthy, but a large fraction of it is just the continuous output from a couple of trace files from which I’ll be highlighting just a few lines.

drop table t1;

create table t1 (
	n1	number,
	n2	number,
	n3	number,
	n4	number
);

create index i1 on t1(n1);
create index i2 on t1(n2);

insert into t1 values (1,1,1,1);
commit;

alter system switch logfile;

insert into t1 values (2,2,2,2);
insert into t1 values (3,3,3,3);

commit;

execute dump_log

All I’ve done is create a simple table with two indexes. I insert a row and commit because the first insert of a row into a table results in all sorts of funny bits of recursive activity (mostly related space management) and I want to get that out of the way so that it doesn’t obscure the more typical activity. Then I “switch logfile”, do a little work, and use a little procedure I’ve written to dump the current log file. The log file switch is just a convenient way to keep the dump small – but there are strategies for being selective about the parts of the redo log file you dump.

The work I’ve done is just two inserts into the table – this means I expect to find something like the following bits of activity listed in the redo log file:

    start transaction
    modify block in table
    modify block in index 1
    modify block in index 2
    modify block in table
    modify block in index 1
    modify block in index 2
    end transaction

What I’m going to show you next is the redo log file dump from Oracle 9.2.0.8 and then the corresponding redo log file dump from 10.2.0.3.

In the 9.2.0.8 dump we’ll see the “traditional” sequence of redo generation, which follows very closely to the steps listed above and also shows that Oracle pairs a “redo change vector” for a table or index block with the matching “redo change vector” for an undo record that describes how to reverse the table (or index) change – an undo record, of course, is just a piece of information that gets stored in an undo block. In general, each pair of redo change vectors will be gathered into a single “redo record”.

In the 10.2.0.3 dump we’ll see that the same redo change vectors still exist but they’ve all been lumped into a single redo record, and their ordering in that record is completely different from the 9.2.0.8 ordering, demonstrating (or at least supporting) the point that the “private redo/in-memory undo” mechanism uses two separate buffers, one for redo change vectors related to table/index blocks (the private redo bit) and the other for redo change vectors related to undo records (the in-memory undo bit).

The key expressions to watch out for in the dump are: “REDO RECORD” and “CHANGE #” (I’ve inserted a few blank lines  into the trace file to give a little white space between records and vectors):

Redo log file dump from 9i

REDO RECORD - Thread:1 RBA: 0x000360.00000002.0010 LEN: 0x0170 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:23 AFN:2 DBA:0x00800039 SCN:0x0000.03ec99bf SEQ:  1 OP:5.2
ktudh redo: slt: 0x0016 sqn: 0x00001a69 flg: 0x0012 siz: 80 fbi: 0
            uba: 0x00800040.075d.26    pxid:  0x0000.000.00000000

CHANGE #2 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99be SEQ:  1 OP:5.1
ktudb redo: siz: 80 spc: 2708 flg: 0x0012 seq: 0x075d rec: 0x26
            xid:  0x0004.016.00001a69
ktubl redo: slt: 22 rci: 0 opc: 11.1 objn: 45648 objd: 45648 tsn: 12
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00800040.075d.25
prev ctl max cmt scn:  0x0000.03ebf084  prev tx cmt scn:  0x0000.03ebf106
KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: DRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1)

CHANGE #3 TYP:2 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.03ec99f2 SEQ:  1 OP:11.2
KTB Redo
op: 0x01  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.26
KDO Op code: IRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 03
col  1: [ 2]  c1 03
col  2: [ 2]  c1 03
col  3: [ 2]  c1 03

CHANGE #4 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:  0 OP:5.20
session number   = 9
serial  number   = 6
transaction name =

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000002.0180 LEN: 0x0108 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  1 OP:5.1
ktudb redo: siz: 96 spc: 2626 flg: 0x0022 seq: 0x075d rec: 0x27
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 38 opc: 10.22 objn: 45649 objd: 45649 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x000a.012.00002552 uba: 0x0080009c.09cb.0c
                      flg: C---    lkc:  0     scn: 0x0000.03ec99f2
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00409 block=0x02c0040a
purge leaf row
key: (10):  02 c1 03 06 02 c0 00 0a 00 01

CHANGE #2 TYP:2 CLS: 1 AFN:11 DBA:0x02c0040a SCN:0x0000.03ec99f2 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.27
Block cleanout record, scn:  0x0000.03ec99f5 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.03ec99f2
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 02 c0 00 0a 00 01

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000003.0098 LEN: 0x0108 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  2 OP:5.1
ktudb redo: siz: 96 spc: 2528 flg: 0x0022 seq: 0x075d rec: 0x28
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 39 opc: 10.22 objn: 45650 objd: 45650 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x000a.012.00002552 uba: 0x0080009c.09cb.0d
                      flg: C---    lkc:  0     scn: 0x0000.03ec99f2
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00489 block=0x02c0048a
purge leaf row
key: (10):  02 c1 03 06 02 c0 00 0a 00 01

CHANGE #2 TYP:2 CLS: 1 AFN:11 DBA:0x02c0048a SCN:0x0000.03ec99f2 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0004.016.00001a69    uba: 0x00800040.075d.28
Block cleanout record, scn:  0x0000.03ec99f5 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.03ec99f2
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 02 c0 00 0a 00 01

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000003.01a0 LEN: 0x00f4 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  3 OP:5.1
ktudb redo: siz: 64 spc: 2430 flg: 0x0022 seq: 0x075d rec: 0x29
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 40 opc: 11.1 objn: 45648 objd: 45648 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.26
KDO Op code: DRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2)

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0000a SCN:0x0000.03ec99f5 SEQ:  1 OP:11.2
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.29
KDO Op code: IRP row dependencies Disabled
  xtype: XA  bdba: 0x02c0000a  hdba: 0x02c00009
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 04
col  1: [ 2]  c1 04
col  2: [ 2]  c1 04
col  3: [ 2]  c1 04

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000004.00a4 LEN: 0x00c8 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  4 OP:5.1
ktudb redo: siz: 80 spc: 2364 flg: 0x0022 seq: 0x075d rec: 0x2a
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 41 opc: 10.22 objn: 45649 objd: 45649 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.27
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00409 block=0x02c0040a
purge leaf row
key: (10):  02 c1 04 06 02 c0 00 0a 00 02

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0040a SCN:0x0000.03ec99f5 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.2a
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 02 c0 00 0a 00 02

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000004.016c LEN: 0x00c8 VLD: 0x01
SCN: 0x0000.03ec99f5 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:24 AFN:2 DBA:0x00800040 SCN:0x0000.03ec99f5 SEQ:  5 OP:5.1
ktudb redo: siz: 80 spc: 2282 flg: 0x0022 seq: 0x075d rec: 0x2b
            xid:  0x0004.016.00001a69
ktubu redo: slt: 22 rci: 42 opc: 10.22 objn: 45650 objd: 45650 tsn: 12
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.28
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=96880916 indexid=0x2c00489 block=0x02c0048a
purge leaf row
key: (10):  02 c1 04 06 02 c0 00 0a 00 02

CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0048a SCN:0x0000.03ec99f5 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x00800040.075d.2b
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 02 c0 00 0a 00 02

--------------

REDO RECORD - Thread:1 RBA: 0x000360.00000005.0044 LEN: 0x0070 VLD: 0x01
SCN: 0x0000.03ec99f6 SUBSCN:  1 12/30/2010 14:36:06

CHANGE #1 TYP:0 CLS:23 AFN:2 DBA:0x00800039 SCN:0x0000.03ec99f5 SEQ:  1 OP:5.4
ktucm redo: slt: 0x0016 sqn: 0x00001a69 srt: 0 sta: 9 flg: 0x12
ktucf redo: uba: 0x00800040.075d.2b ext: 0 spc: 2200 fbi: 0

The dump from 9.2.0.8 consists of seven redo records – and most of those records consist of two redo change vectors. We’ll start by looking at the fourth redo record (starting at line 112) which is a typical “old-style” redo record. It consists of two redo change vectors.

In the chosen record, change #1 (line 115) describes how to change an undo block at address (DBA) 0x00800040 which is operating at sequence number (seq: ) 0x075d. The change is to insert a new undo record, with record number (rec: ) 0x29. If you look at lines 126 – 129 you see that this undo record is an instruction to Delete a Row Piece (DRP) from data block (bdba) 0x02c0000a; the row piece is identified as the pointed to by slot 2 of table 0 in the block’s row directory.

Change # 2 (line 131) describes the change needed to create that row piece (note the IRP – Insert Row Piece at line 135). Again we see references to the block, table number and slot number.

The next two redo records are very similar: they describe how to insert and delete a key entry for each of the two indexes that have to be maintained as the row is inserted into the table – again the redo change vector for the “undo” appears before the redo change vector for the “apply”.

The first three redo records form a similar group – the first one is about inserting/removing a table row, the second and third are about the insert/remove pairs for the two indexes – but the first record of the three is more complex, consisting of four redo change vectors rather than just two.

The first redo record happens to be the first record of the transaction – so it contains (for 9i and below) two extra redo change vectors. The standard pair for the table row appear as change #2 and change #3, but in change #1 we see the redo change vector describing a change to the undo segment header block (line 5: ktudh) as we acquire a “transaction table slot” (see my glossary, and “cleanout”) specifically slot 0x0016, which we stamp with sequence number 0x00001a69. The final change vector in this record (change #4 at line 41) is marked as a “Media Recovery Marker” – it’s actually a type of audit label – and the example here is the short form; there is a much longer version used for the first transaction of a session’s lifetime, which is also used after each log file switch that takes place in a session’s lifetime (and in 8i there was a bug that meant the full vector appeared for every transaction start – with could result in a nasty volume of excess redo).

The last redo record is the so-called “commit record”. It consists of just one change vector (line 209). This change vector describes the update to the undo segment header as we commit (ktucm) the transaction; the status (sta: ) of transaction table slot 0x0016 is set to 9 to show that the slot is now free, and the “free pool” list (ktucf) is updated to show that the last undo block we were using has some space available in it that could be used by the next transaction that acquires a slot in this undo segment.

So that’s the redo generated by 9i. The basic pattern is that a typical redo record consists of a pair of redo change vectors, the first describing an undo record and the second describing how to make the required data change.

An important scalability detail (or threat, rather) that goes with this implementation is that the session has to allocate memory in the redo log buffer and copy each redo record in turn so, for our table with two indexes, every row we insert requires us to acquire the redo allocation latch and redo copy latch three times.

Now we move on to see what shows up in the log file when we run our code on 10.2.0.3

Redo log file dump from 10g

REDO RECORD - Thread:1 RBA: 0x00034b.00000002.0010 LEN: 0x0638 VLD: 0x0d
SCN: 0x0000.00d60a85 SUBSCN:  1 12/30/2010 14:24:01

CHANGE #1 TYP:2 CLS: 1 AFN:6 DBA:0x01804d8a OBJ:70577 SCN:0x0000.00d60a81 SEQ:  5 OP:11.2
KTB Redo
op: 0x01  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.27
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 03
col  1: [ 2]  c1 03
col  2: [ 2]  c1 03
col  3: [ 2]  c1 03

CHANGE #2 TYP:0 CLS:31 AFN:2 DBA:0x00800079 OBJ:4294967295 SCN:0x0000.00d60a5f SEQ:  1 OP:5.2
ktudh redo: slt: 0x0025 sqn: 0x00003632 flg: 0x0012 siz: 104 fbi: 0
            uba: 0x008031d3.07f1.27    pxid:  0x0000.000.00000000

CHANGE #3 TYP:2 CLS: 1 AFN:6 DBA:0x01804e0a OBJ:70578 SCN:0x0000.00d60a81 SEQ:  2 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.28
Block cleanout record, scn:  0x0000.00d60a84 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.00d60a81
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #4 TYP:2 CLS: 1 AFN:6 DBA:0x01804e8a OBJ:70579 SCN:0x0000.00d60a81 SEQ:  2 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x11  ver: 0x01
op: F  xid:  0x0008.025.00003632    uba: 0x008031d3.07f1.29
Block cleanout record, scn:  0x0000.00d60a84 ver: 0x01 opt: 0x02, entries follow...
  itli: 2  flg: 2  scn: 0x0000.00d60a81
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 14
insert key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #5 TYP:0 CLS: 1 AFN:6 DBA:0x01804d8a OBJ:70577 SCN:0x0000.00d60a85 SEQ:  1 OP:11.2
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2a
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2) size/delt: 15
fb: --H-FL-- lb: 0x2  cc: 4
null: ----
col  0: [ 2]  c1 04
col  1: [ 2]  c1 04
col  2: [ 2]  c1 04
col  3: [ 2]  c1 04

CHANGE #6 TYP:0 CLS: 1 AFN:6 DBA:0x01804e0a OBJ:70578 SCN:0x0000.00d60a85 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2b
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #7 TYP:0 CLS: 1 AFN:6 DBA:0x01804e8a OBJ:70579 SCN:0x0000.00d60a85 SEQ:  1 OP:10.2
index redo (kdxlin):  insert leaf row
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.2c
REDO: SINGLE / -- / --
itl: 2, sno: 2, row size 14
insert key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #8 TYP:0 CLS:31 AFN:2 DBA:0x00800079 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  1 OP:5.4
ktucm redo: slt: 0x0025 sqn: 0x00003632 srt: 0 sta: 9 flg: 0x2
ktucf redo: uba: 0x008031d3.07f1.2c ext: 38 spc: 1718 fbi: 0

CHANGE #9 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a5d SEQ:  1 OP:5.1
ktudb redo: siz: 104 spc: 2250 flg: 0x0012 seq: 0x07f1 rec: 0x27
            xid:  0x0008.025.00003632
ktubl redo: slt: 37 rci: 0 opc: 11.1 objn: 70577 objd: 70577 tsn: 5
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x008031d3.07f1.26
prev ctl max cmt scn:  0x0000.00d5fa91  prev tx cmt scn:  0x0000.00d5fb64
txn start scn:  0x0000.00d60a81  logon user: 52  prev brb: 8401354  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x03  ver: 0x01
op: Z
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1)

CHANGE #10 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  1 OP:5.1
ktudb redo: siz: 96 spc: 2144 flg: 0x0022 seq: 0x07f1 rec: 0x28
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 39 opc: 10.22 objn: 70578 objd: 70578 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0005.01e.000035e3 uba: 0x008034a5.083a.1d
                      flg: C---    lkc:  0     scn: 0x0000.00d60a81
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e09 block=0x01804e0a
(kdxlpu): purge leaf row
key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #11 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  2 OP:5.1
ktudb redo: siz: 96 spc: 2046 flg: 0x0022 seq: 0x07f1 rec: 0x29
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 40 opc: 10.22 objn: 70579 objd: 70579 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0005.01e.000035e3 uba: 0x008034a5.083a.1e
                      flg: C---    lkc:  0     scn: 0x0000.00d60a81
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e89 block=0x01804e8a
(kdxlpu): purge leaf row
key: (10):  02 c1 03 06 01 80 4d 8a 00 01

CHANGE #12 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  3 OP:5.1
ktudb redo: siz: 64 spc: 1948 flg: 0x0022 seq: 0x07f1 rec: 0x2a
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 41 opc: 11.1 objn: 70577 objd: 70577 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.27
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01804d8a  hdba: 0x01804d89
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 2(0x2)

CHANGE #13 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  4 OP:5.1
ktudb redo: siz: 80 spc: 1882 flg: 0x0022 seq: 0x07f1 rec: 0x2b
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 42 opc: 10.22 objn: 70578 objd: 70578 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.28
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e09 block=0x01804e0a
(kdxlpu): purge leaf row
key: (10):  02 c1 04 06 01 80 4d 8a 00 02

CHANGE #14 TYP:0 CLS:32 AFN:2 DBA:0x008031d3 OBJ:4294967295 SCN:0x0000.00d60a85 SEQ:  5 OP:5.1
ktudb redo: siz: 80 spc: 1800 flg: 0x0022 seq: 0x07f1 rec: 0x2c
            xid:  0x0008.025.00003632
ktubu redo: slt: 37 rci: 43 opc: 10.22 objn: 70579 objd: 70579 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x008031d3.07f1.29
Dump kdilk : itl=2, kdxlkflg=0x1 sdc=0 indexid=0x1804e89 block=0x01804e8a
(kdxlpu): purge leaf row
key: (10):  02 c1 04 06 01 80 4d 8a 00 02

The entire transaction is wrapped up into a single redo record with fourteen redo change vectors. (The 9i dump showed a total of fifteen redo change vectors – we’ve lost the “audit” vector in our 10g setup.) In order, the vectors describe how to…

    insert first row into table
    acquire transaction table slot (start transaction – slightly odd order)
    insert first row into first index
    insert first row into second index
    insert second row into table
    insert second row into first index
    insert second row into second index
    release transaction table slot (commit transaction)
    create undo record showing how to remove first row from table
    create undo record showing how to remove first row from first index
    create undo record showing how to remove first row from second index
    create undo record showing how to remove second row from table
    create unto record showing how to remove second row from first index
    create undo record showing how to remove second row from second index

It is reasonable to infer from this that Oracle is generating a list of “forward” changes (i.e. the private redo) in one buffer and a list of “reverse” changes (i.e. the in-memory undo) in another buffer – and simply concatenates the two lists on the commit.

On the plus side, if we’ve only got one redo record we only have to allocate and copy into the (public) redo buffer once – which reduces contention on a single important hot spot in the Oracle database; on the other hand we have to keep a number of data blocks pinned in memory until the commit, and we have to modify all those data blocks at the same time as we copy the two private buffers into the (public) redo log buffer, and this requirement introduces a different concurrency threat.

Because of this alternative threat there is a fairly tight limit on how much change a single transaction is allowed to make using this new mechanism . For 32-bit Oracle the buffers are about 64Kb each, and for 64-bit Oracle they are about 128KB – and if a session fills either of the two buffers it has acquired then Oracle flushes the contents to the public log buffer and the session then carries on using the public log buffer in the “traditional” way (while still keeping “ownership” of the pair of private buffers that it had allocated.)

A final thought – if you issue a rollback instead of a commit, the “commit record” (change #8 in the above) disappears from the redo record, and the redo stream gets a further six redo records showing how each of the previous undo records have been applied at the user’s request; these records are followed by one last redo record which describes the release of the transaction table slot i.e. a “commit record”.

25 Comments »

  1. This is one of the best posts I have ever read here, on you blog. Excelent!!!

    Comment by Pavol Babel — January 5, 2011 @ 11:36 pm BST Jan 5,2011 | Reply

  2. This post made me question my understanding of commit processing so I thought I’d ask a question/clarification.

    Here’s my understanding, for a synchronous commit – the default commit behaviour;

    1) A session initiates a commit
    2) A “commit marker” is created in the redo log buffer
    3) All redo in the log buffer generated up to and including this “commit marker” is written to disk by LGWR, and LGWR waits on log file parallel write until this write is complete
    4) The session initiating the commit must wait for LGWR to return, and this time is recorded against log file sync
    5) Other sessions may continue generating redo and therefore initiating writes to the log buffer while this first commit is waiting to complete
    6) Other sessions may initiate further commits while this first commit is waiting to complete, the time that these sessions waits is recorded against log file sync
    7) When the first commit has complete, LGWR then batches up and writes all the pending commits (and any associated redo) to the redo log

    Throughout all processing, the following holds:

    Database blocks (which includes changes to undo blocks) are never written to disk before the corresponding redo log that describes the latest changes has been written to disk.

    Questions:
    ————-

    • Is my understanding outlined above correct?
    • Is it true that it is possible for more than just the “commit marker” to be written to redo on commit? It has to be else, the recover mechanism cannot work.
    • Can redo generated after the “commit marker” has been created in step 2) also be written in the first commit?

    While reading your post I realised that I had always understood that a commit was the write of only the “commit marker”. I think this is faulty, but wanted to ask the question here, as I’m bound to receive some other clarifications as a result.

    Regards,

    Comment by Mathew Butler — January 7, 2011 @ 3:24 pm BST Jan 7,2011 | Reply

    • Mathew,

      That’s fairly accurate, but needs a few refinements to detail – and this is just for the “simple”, old-fashioned case.

      (1) and (2) – there is NO commit marker. I think the manuals (11.2) may still say that LGWR creates a “commit record”, but it’s not true. When a session issues a commit two things happen. First, it updates the transaction table slot in the unto segment header – and since this is a block change it is described by a redo change vector which gets written into the redo log buffer; second, it sends a “synch write” message to LGWR telling it to write the log buffer up to the point of that last change vector.

      (3) – (7) are then essentially correct, as is your comment about data blocks never being written before the associated redo records.

      Questions
      a) Basically correct
      b) Everything up to AT LEAST that “commit marker” has to be written
      c) Yes – LGWR always writes complete (redo) blocks, and multiple processes could be filling the log buffer simultaneously, so you could allocate space in the buffer that partly fills a redo block, then I could allocate space that starts where you’re planning to end and carries on for a couple more blocks, then you could fill your space and issue a commit, but LGWR can’t write your redo buffer until I have filled the bit of block AFTER your “commit marker”. (In this case you will see LGWR waiting on “LGWR wait for redo copy”). When I fill that block LGWR can write your transaction – so it’s also going to write a bit of my transaction. (It may actually write as much of the redo buffer up to the end of the last fully used block as this point.) At the opposite extreme, if no-one has previously allocated redo buffer space after your space then LGWR still writes up to your “commit marker” and on to the end of the block – and the extra space is reported as “redo wastage”.

      Comment by Jonathan Lewis — January 7, 2011 @ 8:06 pm BST Jan 7,2011 | Reply

      • Thanks for the response.

        You state

        “First, it updates the transaction table slot in the unto segment header – and since this is a block change it is described by a redo change vector which gets written into the redo log buffer; second, it sends a “synch write” message to LGWR telling it to write the log buffer up to the point of that last change vector.”

        The commit wouldn’t be deemed complete until the completion of the “synch write”, so a failure could conceivable happen immediately after the change to the undo segment header, but before any writes to the log buffer. In this case, recovery processing, based only on the contents of the redo log applies all redo (roll-forward) and then rolls back any uncommitted transactions.

        Is it during the rollback that the undo segment header would then be reset so that the transaction table slot contains the correct values?

        Also, at what point in the commit processing do all the datafile headers get updated with the new SCN? How is this multi file update achieved such that the aggregate changes are ATOMIC?

        Cheers,

        Mat

        Comment by Mathew Butler — January 11, 2011 @ 1:11 pm BST Jan 11,2011 | Reply

        • Mat,
          Just to correct a little detail about order: Oracle writes the change vectors into the log buffer before modifying the blocks. But it would still be possible for a crash to occur “on the commit” at the point where the log buffer and the data buffers (including undo segment header) had been modified but not written to disc.

          If this occurs the recovery (roll forward) will apply the last complete redo record that got into the log file. At this point the database opens and rollback can start. SMON will check the undo segment headers and find the transaction hadn’t committed and roll it back – and the last change applied by a rollback is an update to the undo segment header (following by a “redo synch write” call (i.e. a commit).

          The datafile headers are not updated in commit processing. I believe they are only updated (by CKPT, I think) when a log file checkpoint has completed. The datafile header need only indicate the redo log file from which recovery has to take place. (I’ve written a comment about this on OTN quite recently). On the other hand, CKPT updates the control files every three seconds (approx.) with the rolling checkpoint as DBWR writes dirty blocks from the tail of the checkpoint queue to disc – and this may act as a method of giving Oracle a more precise recovery point withing the control file in the event of a crash. (Here’s a note that lists the various checkpoints)

          Comment by Jonathan Lewis — January 11, 2011 @ 4:45 pm BST Jan 11,2011

  3. “…When a session issues a commit two things happen….”. I think you have ommited one another thing. When a session issues COMMIT, it can also update (clean) ITL slots of modifyed data blocks, which are still present in BUFFER CACHE to prevent dealyed block cleanout, and it can create more redo change vectors. Has something changed in this behavaviour since 9i?

    Comment by Pavol Babel — January 7, 2011 @ 10:53 pm BST Jan 7,2011 | Reply

    • Pavol,

      True, I probably should have mentioned “commit cleanout” but it slipped my mind at the time. Interestingly, although 10g doesn’t apply any changes to data blocks until the commit (for “short” transactions), we still seem to see the changed blocks being left in the “commit cleanout” state (with the “upper bound commit flag” set) rather than in the fully cleaned state (with the “commit SCN” set). Presumably sub-optimal code reuse is seen as a much safer option that risking a huge volume of rewrite to cater to tweak all possible performnace benefits from the newer mechanism.

      Commit cleanout doesn’t generate extra redo, though, which is why it is also described as “delayed logging block cleanout” (see also: http://jonathanlewis.wordpress.com/2009/06/16/clean-it-up/ ) nor does it generate any extra “db block gets” despite the need to get and pin current buffers in exclusive mode briefly (see http://jonathanlewis.wordpress.com/2009/06/12/consistent-gets-2/ for a brief mention).

      Comment by Jonathan Lewis — January 8, 2011 @ 10:48 am BST Jan 8,2011 | Reply

      • Jonathan,

        now I finally understand -U– flag in trace made by “alter system dump datafile nnn block …”. The commit clenaout and delayed block logging cleanout is excellent.
        I have nice bug related to commit cleanout, read only tablespaces and foregin keys. If you insert one row in child table and the master one is read only tablespace (or most common, some partitions of master table and it’s local partitioned indexes, too), it can fail on ORA-00372: file string cannot be modified at this time. When row is inserted into the child table, oracle checks presence of the master key in master table and it will try to make dealyed block logging cleanout on block in read only tablespace. I dind’t understand why was block in master table not fully cleaned (it was created by transaction modifying only one row in one table). The bug is reproducible on 9.2.0.8 and 10.2.0.4 (I didn’t try 11g), but it didn’t bother me so much to force logging an Service Request in MOS.

        Comment by Pavol Babel — January 8, 2011 @ 10:55 pm BST Jan 8,2011 | Reply

        • Pavol,

          Your bug looks like 1606869 (Jan 2001, 8.1.6 – “not a bug”), with a variation in 9095091 (Nov 2009) which is fixed in 12.1

          I’d guess you can reproduce your problem at will by:

            insert row into parent;
            insert related row into child;
            flush buffer cache
            commit;
            make parent tablespace read only

          It would be harder to do in 10g (or maybe it’s 11g) with the private redo because the blocks affected by private redo might be pinned in memory and therefore not subject to flushing. But if you can flush them before the commit a delayed block cleanout will take place on the next read.

          Comment by Jonathan Lewis — January 9, 2011 @ 4:27 pm BST Jan 9,2011

      • Jonathan,

        10g doesn’t apply any changes to data blocks until the commit (for “short” transactions),

        I have a question about this. How do other sessions know the changes made by uncommitted transactions when they do “db block get”, that is, how to guarantee the CUR block includes latest changes? This question puzzled me for a while.

        Thanks

        Comment by Haibin Sun — August 15, 2012 @ 6:28 pm BST Aug 15,2012 | Reply

        • Haibin Sun,

          There is a reference to this on page 16 of Oracle Core. The v$bh structure has a bitmap for various flags, including a bit for “has private redo”. If a session was to view a block that is flagged with private redo it forces the owning session to flush all its in-memory undo and private redo into public memory – which includes the steps to modify the buffered block. I think this event is recorded in x$ktiff under statistic “Contention flushes”.

          Comment by Jonathan Lewis — August 19, 2012 @ 4:34 pm BST Aug 19,2012

  4. Jonathan,

    I’ve checked my library and found following testcase (date 10-FEB-09):

    CREATE TABLESPACE ro DATAFILE '&datafile_name' SIZE 10 m;
    
    
    CREATE TABLE MASTER(ID NUMBER, txt VARCHAR2(20)) PARTITION BY RANGE(ID)
        (PARTITION p1 VALUES LESS THAN (1000) TABLESPACE ro,
         PARTITION p2 VALUES LESS THAN(2000)
        )
    /
    ALTER TABLE MASTER ADD CONSTRAINT master_pk PRIMARY KEY(ID) USING INDEX LOCAL
    /
    INSERT INTO MASTER
         VALUES (1, 'AAA')
    /
    COMMIT
    /
    ALTER TABLESPACE ro READ ONLY
    /
    
    CREATE TABLE CHILD(child_id NUMBER, ID NUMBER, txt VARCHAR2(30))
      PARTITION BY RANGE(child_id)
      (PARTITION p1 VALUES LESS THAN (1000) ,
       PARTITION p2 VALUES LESS THAN (2000) )
    /
    ALTER TABLE CHILD ADD CONSTRAINT child_pk PRIMARY KEY(child_id) USING INDEX LOCAL
    /
    ALTER TABLE CHILD ADD CONSTRAINT child_master_fk FOREIGN KEY(ID) REFERENCES MASTER(ID)
    /
    
    --WILL FAIL
    INSERT INTO CHILD
         VALUES (100, 1, 'CCC')
    /
    

    I have no problems reproduce at will on 10.2.0.5, i had run it again few minutes before I wrote this post. So the order of steps is a little bit different

    Comment by Pavol Babel — January 9, 2011 @ 10:22 pm BST Jan 9,2011 | Reply

    • Pavol,

      Thanks for that – the problem is even less subtle than I thought.
      I’ve just repeated it on 11.1.0.6 – with the same error; and it doesn’t have to be partitioned tables either.

      Comment by Jonathan Lewis — January 11, 2011 @ 4:30 pm BST Jan 11,2011 | Reply

      • Jonathan,

        you are right, partitioned tables don’t have to be used to reproduce this issue. My example used part tables because it is following real-life problem, that we faced. The master table is table of request_id of messages and child table are responses for them, referenced by FK on request_id. It was just an application bug, that somebedy requested response on message which had been created half year before (message was so old, that tablespace was switched to RO), it was thing which shuldn’t happen. But it took me pretty long time to find reason for unexpected ORA-00372 :-)

        Thank you for update on 11g and thanks one more time for clarifying commit cleanout. I’m really happy that I understand, why is DB block cleaned by “commit cleanout” not in fully cleaned state.

        Comment by Pavol Babel — January 11, 2011 @ 10:56 pm BST Jan 11,2011 | Reply

  5. Jonathan,
    Do I understand this correctly :
    1) In 9i, the Redo Change for the UNDO precedes the Redo Change for the actual DML (whether INSERT,UPDATE or DELETE) in the Redo Log.
    2) In 10g, the Redo Change for the DML now precedes the Redo Change for the UNDO ?
    3) In both versions, the UNDO and DML are within the same Redo Record.
    Hemant

    Comment by Hemant K Chitale — February 24, 2011 @ 9:10 am BST Feb 24,2011 | Reply

    • Hemant,

      1) Yes

      2) Yes – BUT it’s important to remember that this mechanism only applies to “small” transactions – as soon as a transaction fills its private redo thread or private undo pool it writes one big redo record and then falls back to the 9i mechanism. There are also a number of reasons why the switch from the new method to the old method can happen before either private memory area is full (e.g. the session has to execute a recursive transaction.)

      3) Yes

      Comment by Jonathan Lewis — February 24, 2011 @ 6:37 pm BST Feb 24,2011 | Reply

  6. [...] log file and find the relevant redo entries easily after the update. I won’t bore you with an extract from the log file dump, I’ll just point out that it showed me two redo records per row – op code 11.4 followed [...]

    Pingback by Audit Ouch! « Oracle Scratchpad — May 27, 2011 @ 5:39 pm BST May 27,2011 | Reply

  7. Thanks for good explaination but i want to know what is used of undo part which is associate with redo vector?means how this undo part is used by oracle

    Comment by Nilesh — May 28, 2011 @ 6:57 am BST May 28,2011 | Reply

    • Nilesh,

      Every block change is described in the redo – even the changes which are the changes to the undo blocks.
      On recovery (instance or media) every redo change vector that needs to be applied to the data files is applied – including the redo changes vectors that describe changes to the undo blocks – that’s how Oracle bring the database up to date on recovery, and the changes to the undo blocks mean that after (forward) recovery any uncommitted transactions that Oracle finds in the undo segments can be rolled back properly.

      Comment by Jonathan Lewis — June 2, 2011 @ 8:30 pm BST Jun 2,2011 | Reply

  8. Jonathan,

    A couple of general questions about undo. Firstly, I am intrigued that the undo change vectors for index entries seem to include the column data (e.g. line 130 in 10g example above). This seems to contradict my previous understanding that undo for inserts would be minimal. Indeed, for a large IOT this might be onerous. I can’t see why this data is needed, can you?

    The second question relates to the total size of the redo data. In the 10g example above, the length of the redo record is given on line 0 as 0x0638. 1592 bytes seems a lot of overhead; it is an average of over 100 bytes per change vector. I can’t work out where it is all going. Can you explain please?

    Comment by tonyhasler — June 20, 2011 @ 5:48 pm BST Jun 20,2011 | Reply

  9. Tony,
    A table row has an absolute location – its rowid which is the object id, file id, block id, row number in the row directory; so when you insert a row the undo operation need only be: “unlink” the row directory.

    An index entry doesn’t have an absolute location – the row directory is adjusted as each entry is inserted and deleted, and leaf blocks can split, moving index entries to a completely different block; so when you insert an index entry the only way you can be sure of finding it to undo the entry is to store the key value. (This also means that when you undo an index insert you have to walk through the root and branch blocks before you can get to the correct leaf block – which makes index undo application more expensive than table undo application.)

    I’ve never felt the urge to try and track down the function of every single byte in an undo change vector, so I can’t tell you what all the overheads are for. I’ve just picked out the important bits when I’ve needed to solve particular problems.

    Comment by Jonathan Lewis — June 20, 2011 @ 8:14 pm BST Jun 20,2011 | Reply

  10. Thanks Jonathan. Another “myth busted” (or at least a personal misunderstanding corrected :-))

    My second question was not specific to undo and more of an observation than a question: it just seems amazing that to insert two tiny rows with two tiny indexes you generate 1592 bytes of redo. On the other hand, it would seem appropriate to focus the redo design on keeping redo generation and recovery simple and efficient rather than optimising space usage on a circular file. So perhaps I shouldn’t be that surprised.

    Comment by tonyhasler — June 20, 2011 @ 10:49 pm BST Jun 20,2011 | Reply

    • Tony,
      I think it’s more of a “common oversight” than a myth or mythunderstanding.
      The manuals (and all the books that copied them) used to say something about needing very little undo to reverse an insert because you only had to mark the table entry as non-existent: however the focus was alway on the table, and never made any comment about the indexes – so I don’t think the average DBA thought about the effects on indexes.

      The same sort of oversight happens with the redo and undo, People don’t think about the cost of maintaining the indexes. How many times do you see complaints on OTN from people saying: “My batch job inserts 1,000,000 rows into a table – why is it taking 3 hours and doing 3.5M physical reads?”

      There’s probably a very good reason for all those “extra” bytes. One day I may look at them more closely.

      Comment by Jonathan Lewis — June 21, 2011 @ 5:49 am BST Jun 21,2011 | Reply

  11. hi,sir
    how to prove the data block and the undo block is updated when the session commit ? DUMP BUFFER?

    Comment by DBA 蜗牛 — December 27, 2011 @ 5:51 am BST Dec 27,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,014 other followers