Oracle Scratchpad

OC 2 Redo and Undo

Addenda and Errata for Oracle Core Chapter 2 Redo and Undo

Back to Index

Addenda

n/a Here’s a link to an article I wrote some time ago about undo segments growing to extreme sizes. It’s a useful example demonstrating an important feature of how undo segments work.
p.9 Paragraph before heading Debriefing, I state: “The most significant detail, for our purposes, is the DBA: (data block address) in line 1, which identifies block 0x0080009a: if you know your Oracle block numbers in hex, you’ll recognize that this is block 154 of data file 2.”I should, perhaps, have written a note in the appendix about how to work this out and directed the reader to the appendix at this point. Here, however, is a method for working out the necessary detail:

select
	dbms_utility.data_block_address_file(to_number('0080009a','XXXXXXXX'))  file_no,
	dbms_utility.data_block_address_block(to_number('0080009a','XXXXXXXX')) block_no
from
	dual
;

Informally, however, and only for simple database with a small number (less than 64) of relatively small (less than 1M blocks) data files, you can often break a database block address into two parts based on the fact that the block component is the low 22 bits, with the file number mangled into the higher bits.

If you know that your data files are less than 1M blocks, then the last 5 bytes of the data block address will be the block number within file. (0xFFFFF = 1,048,575). If you know the number of data files is less than 64 then the first three bytes divided 4 (i.e. right-shifted twice) is the file number. In this example we get: block = 0x9a = 154, and file = 0x008 / 4 = 2.

It’s a quick and dirty algorithm for small test systems – but breaks when the file number goes above 63 because of the encoding that Oracle introduced as the maximum number of datafiles increased in the change from v6 to v7 (and again from v7 to v8), and gets messy when you have to allow for the top 2 bits when the number of blocks is so large that it runs into byte that is shared with the file number.

p.19 Last two lines: “The advantage of keeping undo information inside the database in “ordinary” data files is that the blocks are subject to …”. Removing a little ambiguity, this looks better as: “The advantage of keeping undo information inside the database in “ordinary” data files is that the undo blocks are subject to …”
p.21 Paragraph after Note, eighth line: “… the record pointed at by the irb: entry.” There is no previous explanation, or even mention, of the irb entry; this is an item that appears in the header section of an undo block (see page 32 for an extract from a block dump). The header section contains a count (cnt) of the number of records in the block, but also contains a pointer (irb – index to roll back ?) to which record should be used first in the event of a rollback call being made to terminate, or clean up, a transaction.There’s a brief mention of the irb (and Christian Antognini’s correction to my initial understanding of its meaning) at this URL.

Errata

p.13 In the Sidebar on Isolation levels lines 1 and 4 read “select from t1” – this wasn’t intended as a code example, but technically you would have to select something if you don’t want the SQL to fail. In a new release I might put: “select {list of columns} from t1”

Typos/Grammar/Style/Punctuation

p.6 End of first paragraph: “there are”, style change“the steps are:”
p.12 I didn’t give the table of ACID requirements a heading, so the editor has used the first requirement (atomicity) as if it were the heading.

Back to Index

40 Comments »

  1. Sir, I am going through your new book (alpha versions)- in the 2nd chapter ‘redo-undo’ on page no :14
    While discussing the ‘private redo’ and ‘in memory undo’ – you said the below :

    1 Start transaction – acquire a matching pair of the private memory structures
    2 Flag each affected block as ‘has private redo’ (but don’t change the block)
    3 Write each undo change vector into the selected in-memory undo pool
    4 Write each redo change vector into the selected private redo thread
    5 End transaction – concatenate the two structures into a single redo change record
    6 Copy the redo change record into the redo log and applu the changes to the blocks

    I am bit confused here ..in this case does the Server Process reads each block 2 times ..
    First time – just to prepare the redo entries and 2nd time to make the change to data/undo block ..

    My assumption is :
    Let us say :

    Update employee set sal=1000 where emp_id in (23,24) ;

    and let’s say there are 2 rows and they are in 2 blocks (one in each block)..

    In 10g –
    Server Process will take a first block (if not found in buffer cache, will get it from disk)-
    And will get a Undo block into buffer cache (if not found in buffer cache)
     Prepare a change vector for UNDO block in ‘ in memory undo’
     Prepare a change vector for DATA block in ‘private redo’
     Change the UNDO block
     Change the DATA block
    Serer process will now take a second block and follow the above steps ..
    And Finally combine the above 2 redo entries for 2 blocks and copy to ‘public redo buffer’ and completes the transaction..

    Is it wrong ? here the Server process reads each block only one time ..

    Sorry sir – am bit new to ORACLE and trying seriously to learn the things in detail..

    and also can you please describe bit more – what is meany by ‘matching pair of the private memory structures’ in this context ? Sorry sir if this is not the correct forum to ask questions about your book ..

    Comment by berusadla — September 7, 2011 @ 3:20 pm BST Sep 7,2011 | Reply

    • berusadla,
      I moved your comment to the pages I’m setting up for Errata – but I haven’t managed to change the owner id on the comment to match the name.

      in this case does the Server Process reads each block 2 times

      You need to be careful how you describe details. To me “read” implies access to disks, I try to use some other word such as “visit” to indicate that I mean accessing a block in the buffer.

      To answer your questions – particularly the point about two updates in the transaction- the sequence is

      1. prepare redo vector for undo record 1 in “in memory undo”
      2. prepare redo vector for table change 1 in “private redo”
      3. prepare redo vector for undo record 2 in “in memory undo”
      4. prepare redo vector for table change 2 in “private redo”

      At this stage Oracle does not modify the table blocks or the undo blocks. (I have omitted any comments about the undo segment header block in the above, though, there will be a redo vector for the transaction start put into the “private redo”).

      On the commit, Oracle will generate a redo change vector for the update to the undo segment header block to show that the transaction is committed and put it into “private redo”, and then re-visit the three other blocks to apply the redo vectors (and then apply the change vector to the undo segment header block). So yes – Oracle does visit the blocks more than once, although it’s likely that it has pinned the blocks, making the second visit much cheaper than the first visit. (Before updating any blocks, though, Oracle will have copied the private redo, followed by the in-memory undo, into the public redo log buffer.)

      matching pair of the private memory structures

      This is just pointing out that the session needs to have one “in memory undo” and one “private redo” area to use this option.

      Comment by Jonathan Lewis — September 7, 2011 @ 3:35 pm BST Sep 7,2011 | Reply

  2. Many thanks Jonathan for your reply but sorry – It has raised couple of doubts again my poor mind..

    1. You mentioned -”
    then re-visit the three other blocks to apply the redo vectors (and then apply the change vector to the undo segment header block) …”

    Changing (updating) a data block means – applying redo vector on that block ..??

    2.We change the data blocks after we commit ?? I dont understand here ..Can you please explain with an example if you have time , let us say :
    update emp set salary=1000 where emp_id in (23,24)
    and there are 2 rows , one row in each block ..
    (let us ommit redo for undo segment header stuff)

    1.Get (read) the first datablock into buffer cache (if not found )
    2.Get the Undo block into buffer cache (if not found)
    3.Prepare a change vercot for undo and put in ‘in -memory undo’
    4.Prepare a change vector for data block and put in ‘private redo’

    5.Get the second block
    6.Prepare a change vercot for undo and put in ‘in -memory undo’
    7.Prepare a change vector for data block and put in ‘private redo’

    Here are the DATA blocks wont be modified (updated with new Sal ) untill we COMMIT?

    am I misunderstanding here ?

    Sorry but many thanks for your help and time

    Comment by berusadla — September 7, 2011 @ 4:46 pm BST Sep 7,2011 | Reply

    • berusadla,

      Changing (updating) a data block means – applying redo vector on that block ..??

      That is correct (for normal processing). To change a data block Oracle applies a redo vector to that block.


      Here are the DATA blocks wont be modified (updated with new Sal ) untill we COMMIT?
      am I misunderstanding here ?

      No, you are not misunderstanding – Oracle does not apply the changes you have requested until it starts processing the commit; call. Remember two things, though: I pointed out in the book that Oracle has flagged the buffered copies of the block as “has private redo” so other sessions can know that the blocks are subject to change that has not yet been made visible, and this strategy changes if the amount of change exceeds a specific limit.

      Comment by Jonathan Lewis — September 8, 2011 @ 7:24 am BST Sep 8,2011 | Reply

  3. sorry one final question ..

    say :

    update t set x=x+10 ;

    select * from t ; (in the same session)

    I could see the new values (though I did not commit) – where are these value come from ??
    becuase the blocks are not yet changed .. and also you said ‘ instance generally dont read redo vectors again .. just ‘write-forget’ policy except in some special cases like recovery etc’ ..

    Cheers

    Comment by berusadla — September 8, 2011 @ 7:56 am BST Sep 8,2011 | Reply

    • berusadla,

      That is a good question, and one that is not answered in the book; before I answer it, though, I’d like to pick a few lines from the chapter you’re reading.

      Your original question was based on Page 13.

      The comment you cited in this question came from page 10, and said:
      ” … once a redo record has gone into the redo log (buffer) we don’t (normally) expect the instance to re-read it … “

      Page 11 has the line:
      “Prior to 10g, Oracle would insert a redo record (typically consisting of just one pair of redo change vectors) into the redo log stream for each change a session made to a block”

      Later on the same page we see:
      “So a new mechanism known as “private redo / in-memory undo” appeared in 10g.”

      Then a possible answer to your question comes from page 16. which says
      ” … then continues using the public redo thread in the old way for the rest of the transaction. But there are other events that cause this switch prematurely.”

      The topic is not trivial, so you do need to be careful how you state things, and how you present questions. It is also important to make sure that you don’t connect different pieces of information in the wrong context. With that in mind, here’s a partial answer to the interesting question you asked.

      a) the session is NOT re-reading the log buffer – it’s not even one of the exceptions to the “generally” comment – because the private redo has not yet been copied into the log buffer.
      b) by checking the session statistics and the contents of x$bh, and doing a block dump from memory, and checking the imu flush statistics (x$ktiff, also on page 16) I can see that the private redo has not been applied to the buffer copy to get the correct answer.

      At this point I would work on the assumption that the session either has the block pinned (in share mode – but that’s not covered until chapter 6), or revisits the block and notes that there is private redo, and that the session is the owner of the private redo; at which point the session may build the correct result in private memory by using the private redo. (I’ve only just examined this on a 10.2 system I had at hand, so I may see more information in the statistics, or a change in behaviour, if I get to an 11.2 system).

      Comment by Jonathan Lewis — September 9, 2011 @ 10:24 am BST Sep 9,2011 | Reply

  4. Hi,

    On page 17, when you present the results of the core_imu_01.sql, the value of each private memory area for the session is about 64KB. But, you also state that on a 64bit system it’s going to be 128KB each. I don’t get it! Have 64KB on a 64bit system a different meaning than on a 32bit system?

    Thanks!

    Comment by talek — December 23, 2011 @ 3:45 pm GMT Dec 23,2011 | Reply

    • Talek,

      I have no idea why Oracle uses 64KB areas on 32-bit Oracle and 128KB on 64-bit Oracle. I could make a couple of guesses, for example:

      a) the number of areas available is transactions / 10, and they come from the SGA; since 32-bit Oracle allows a maximum of roughly 2GB for the SGA taking 2 * 128KB * transactions/10 could be a fairly significant chunk of the available SGA.

      b) the way the memory is used seems to be much greedier than the final usage when the content goes into the log buffer and undo blocks; I think there may be lots of pointers in the in-memory areas, and if these are full-size pointers (rather than 16-bit offsets) you lose a lot more memory to the pointers when you switch from 32-bit to 64-bit.

      Comment by Jonathan Lewis — December 23, 2011 @ 5:29 pm GMT Dec 23,2011 | Reply

      • AHA! Many thanks for your answer! My head is still spinning trying to figure out the whole concept of “undo and redo” but, little by little, I think the big picture becomes much clearer with every single AHA moment which happens reading the book.

        Comment by talek — December 23, 2011 @ 8:00 pm GMT Dec 23,2011 | Reply

  5. Got your long awaited book :)
    I just wanted to share some early experiences, while trying to crack the internals.
    Chapter 2 Redo and Undo, I use 11.2.0.1 , I seem to only produce a dump with syntax: alter system dump datafile ‘explicit_file_name’ min block max block, even if it is only one block.
    Then secondly there is some magic with alter session set tracefile_identifier. I seems I need to define it, and then subsequently I get 2 empty files with the identifier showing and one file without where the .trc dump is contained.
    Don’t know if this is standard behaviour or just my Oracle version. Wonder if you are others had similar quirks.
    Then last: sometimes you write about a symbolic dump and then about a block dump. I am puzzled what the difference is or what you mean by it.
    Thanks,
    F. de Vries

    Comment by Frank de Vries — January 3, 2012 @ 8:08 pm GMT Jan 3,2012 | Reply

    • Frank,

      Sorry you had to wait so long for the copy – it’s a side effect, alas, of being in Europe.

      Thanks for the feedback.

      I don’t have a copy of 11.2.0.1 available to see if it does anything odd with the datafile dump command, I only have access to 11.1.0.7 and 11.2.0.3 at present. Can you check v$datafile for the file with the name that works, and see if there’s anything odd about file# and rfile#.

      When you start dumping trace files in 11g, they come in pairs, one ending trc (the “traditional” trace file) and one ending in .trm, which is a metadata file for the trace file that lets the ADCI program understand something about the content of the trace file. Can you do an ls -l, or dir, or whatever the o/s command is to show us the files, and a cut-n-paste from the connect onwards to show us what you did to get them. Maybe the effects (timing, in particular) vary a bit with o/s.

      Comparing “block dump” with “symbolic dump”: technically some commands dump pure hex data with no interpretation, some commands give a lot of interpretative detail; the former tend to be called raw dump files, the latter symbolic. In recent versions of Oracle the dump of a data block actually includes both the raw and the symbolic components, and since all the other dumps I’ve mentioned in the book are symbolic dumps I didn’t really need to use the word symbolic.

      Comment by Jonathan Lewis — January 3, 2012 @ 8:21 pm GMT Jan 3,2012 | Reply

      • Hi Jonathan,
        Thx for clarifying the jargon.
        I had another look at those output files after your comment. (I defined tracefile_identifier to ‘oe’)

        02/01/2012 23:28 16.776.804 iti2_ora_2996.trc
        02/01/2012 23:29         80 iti2_ora_2996_oe_block.trm
        02/01/2012 23:29      3.920 iti2_ora_2996_oe_block.trc
        

        And what I see today is that those ‘empty’ trace files from before yesterday , did get filled-up a minute later. (I missed that one minute somehow)
        Moreover , when I check the contents of the latter minute file: iti2_ora_2996_oe_block.trc , it says it continues from the first : iti2_ora_2996.trc
        Bizar ?

        Anyway, It does confirm I dumped a single block with min and max (for t1). I had not expected this behaviour though.

        Trace file d:\oracle\dba\diag\rdbms\iti2\iti2\trace\iti2_ora_2996_oe_block.trc
         Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – Production
         With the Partitioning, OLAP, Data Mining and Real Application Testing options
         Windows NT Version V6.1 Service Pack 1
         CPU : 4 – type 586, 2 Physical Cores
         Process Affinity : 0x0x00000000
         Memory (Avail/Total): Ph:944M/2037M, Ph+PgF:2107M/4075M, VA:1200M/2047M
         Instance name: iti2
         Redo thread mounted by this instance: 1
         Oracle process number: 25
         Windows thread id: 2996, image: ORACLE.EXE (SHAD)
         
        *** 2012-01-02 23:28:59.683
         *** SESSION ID:(74.3) 2012-01-02 23:28:59.683
         *** CLIENT ID:() 2012-01-02 23:28:59.683
         *** SERVICE NAME:(SYS$USERS) 2012-01-02 23:28:59.683
         *** MODULE NAME:(sqlplus.exe) 2012-01-02 23:28:59.683
         *** ACTION NAME:() 2012-01-02 23:28:59.683
         
        *** TRACE CONTINUED FROM FILE d:\oracle\dba\diag\rdbms\iti2\iti2\trace\iti2_ora_2996.trc ***
         
        Start dump data block from file D:\ORACLE\DBA\ORADATA\ITI2\USERS01.DBF minblk 536 maxblk 536
         V10 STYLE FILE HEADER:
         Compatibility Vsn = 186646528=0xb200000
         Db ID=1395921064=0x533410a8, Db Name=’ITI2′
         Activation ID=0=0×0
         Control Seq=1787=0x6fb, File size=640=0×280
         File Number=4, Blksiz=8192, File Type=3 DATA
         Dump all the blocks in range:
         buffer tsn: 4 rdba: 0×01000218 (4/536)
         scn: 0×0000.00106d4d seq: 0×02 flg: 0×04 tail: 0x6d4d2002
         frmt: 0×02 chkval: 0xefd3 type: 0×20=FIRST LEVEL BITMAP BLOCK
         Hex dump of block: st=0, typ_found=1
         Dump of memory from 0x0E958200 to 0x0E95A200
         E958200 0000A220 01000218 00106D4D 04020000 [ .......Mm......]
         E958210 0000EFD3 00000000 00000000 00000000 [................]
         E958220 00000000 00000000 00000000 00000000 [................]
         Repeat 1 times
         E958240 00000000 00000000 00000000 00000004 [................]
         E958250 FFFFFFFF 00000002 00000003 00000008 [................]
         E958260 00010001 00000000 00000000 00000000 [................]
         E958270 00000000 00000006 00000000 00000000 [................]
         E958280 00000000 00000000 00000000 00000000 [................]
         E958290 01000219 00000000 00000000 00000006 [................]
         E9582A0 00000008 0100021E 00000000 00000000 [................]
         E9582B0 00000000 00000006 00000000 00000001 [................]
         E9582C0 00012353 000E5BB2 00000000 01000218 [S#...[..........]
         E9582D0 00000008 00000000 00000000 00000000 [................]
         E9582E0 00000000 00000000 00000000 00000000 [................]
         Repeat 9 times
         E958380 00000000 00000000 00000000 00111111 [................]
         E958390 00000000 00000000 00000000 00000000 [................]
         Repeat 485 times
         E95A1F0 00000000 00000000 00000000 6D4D2002 [............. Mm]
         Dump of First Level Bitmap Block
        ——————————–
        nbits : 4 nranges: 1 parent dba: 0×01000219 poffset: 0
         unformatted: 2 total: 8 first useful block: 3
         owning instance : 1
         instance ownership changed at
         Last successful Search
         Freeness Status: nf1 0 nf2 0 nf3 0 nf4 0 
        
        Extent Map Block Offset: 4294967295
         First free datablock : 6
         Bitmap block lock opcode 0
         Locker xid: : 0×0000.000.00000000
         Inc #: 0 Objd: 74579
         HWM Flag: HWM Set
         Highwater:: 0x0100021e ext#: 0 blk#: 6 ext size: 8
         #blocks in seg. hdr’s freelists: 0
         #blocks below: 6
         mapblk 0×00000000 offset: 0
        ——————————————————–
        DBA Ranges :
        ——————————————————–
        0×01000218 Length: 8 Offset: 0 
        
        0:Metadata 1:Metadata 2:Metadata 3:FULL
         4:FULL 5:FULL 6:unformatted 7:unformatted
        ——————————————————–
         End dump data block from file D:\ORACLE\DBA\ORADATA\ITI2\USERS01.DBF minblk 536 maxblk 536
        

        Another side comment: I ran your pl/sql script c_dump_undo_block and that produced a single file with all the segments.!
        I was surprised because on command line this syntax with just file_id and block_id does not produce any output files, no matter how often I try.

        I start to think there is some ‘setting’ in my sqlplus env that explains for the multiple dumps as shown above , when running alter system dump ‘datafile’ min block max block ,or

        another thought maybe that I use a laptop with Window7 ‘threads’ not so flex with the alter session set tracefile_identifier ?

        Happy to be Oracle-core enlighted
        , looking forward to your reply.

        Frank

        Comment by Frank de Vries — January 10, 2012 @ 2:17 pm GMT Jan 10,2012 | Reply

        • Frank,

          The delay in the files appearing doesn’t surprise me – that’s a Windows thing, the writes can be delayed until a buffer has been filled. The “continued in …” line at the top of the trace file is also typical behaviour when using the tracefile_identifier, although it suggests that some tracing had already been enabled before the alter command had been given.

          I still have no idea why the dump command with file numbers doesn’t work.

          Comment by Jonathan Lewis — January 26, 2012 @ 9:28 pm GMT Jan 26,2012

  6. Dear Jonathan,

    While I was busy doing dumps following your chapter in this wonderfull book , I stumbled on something , that surprised me.

    I created a small table, inserted some rows to prepare for dump. Then I noticed that Oracle (in my situation anyway) created the table with a minium of 8 blocks. Or so it states in dba_segments and dba_extents. (I did more tests and created various more tables and yes, always 8 blocks).

    However when looking at dba_tables it says one block an no empty blocks. (true, this info can only be obtained after running stats, that does not make it less valid, though ?). I would have expected to see 7 empty blocks. Instead is says 0. Really puzzled here. Are we talking different blocks here ?

    If it is 8 blocks. (I reason that an extent by default has 8 blocks due tablespace is locally managed, Oracle takes 8 blocks in one go, right) Then I presume the block_id is relating to the first block of the extent. Am I right. Still Why ? Why not block_id_range if it is 8.

    This suddenly spawns a lot a questions in my head:
    Q: Why the difference between blocks in dba_tables and dba_segments and dba_extents (where dba_segments and dba_extents show the same)
    Q: What if I wanted to to dump the 8 blocks, I just take the the block_id and the next 7 blocks.
    Q: Then can I somehow tweak it so the extent consists of only one block ans subsequently the table consists of one block.
    Q: Do I see in the dump of the first block (which are the examples in your book, right) see evidence of the existence of the 7 other blocks.
    Q: Are the undo extents than also grouped in set of 8 blocks ?

    This simple example illustrates the block issue:

    SQL> select table_name , blocks, empty_blocks from dba_tables where table_name = 'L1';
    
    TABLE_NAME                         BLOCKS EMPTY_BLOCKS
    ------------------------------ ---------- ------------
    L1                                      1            0
    
    1 rij is geselecteerd.
    
    SQL> select segment_name , block_id, blocks from dba_extents where segment_name like 'L1';
    
    SEGMENT_NAME                                                                        BLOCK_ID     BLOCKS
    --------------------------------------------------------------------------------- ---------- ----------
    L1                                                                                     97200          8
    
    1 row selected
    
    SQL> select segment_name, blocks, initial_extent, extents, bytes from dba_segments where segment_name = 'L1';
    
    SEGMENT_NAME                                                                          BLOCKS INITIAL_EXTENT    EXTENTS      BYTES
    --------------------------------------------------------------------------------- ---------- -------------- ---------- ----------
    L1                                                                                         8          65536          1      65536
    
    1 row selected.
    
    SQL> desc l1;
     Naam                                                                     Null?    Type
     ------------------------------------------------------------------------ -------- -------------------------------------------------
     X                                                                                 NUMBER(38)
    
    SQL> select * from l1;
    
             X
    ----------
             2
             2
             3
             4
             5
             6
    
    6  rows selected
    
    SQL>
    

    To show it is really a small table, so one block should suffice.

    Comment by afvd (@afvdba) — February 1, 2012 @ 10:07 am GMT Feb 1,2012 | Reply

    • afvd,

      The minimum of 8 blocks (as recorded in dba_extents and dba_segments) is about space allocation in the tablespace. I’d guess that you have created a tablespace with segment_space_management set to “AUTO” – which uses 64KB as the size of the first few extents of a small object. When you collect stats about a table, though, the block count is about blocks below the “high water mark”. It is, for example, possible for Oracle to allocate 1MB (128 blocks at 8KB) of space in an extent but initially format only the first 16 blocks to accept data.

      I’m not sure what you mean by “the block_id is relating to the first block of the extent” – the block_id identifies the block’s position within the data file, not within the segment or extent.

      This suddenly spawns a lot a questions in my head:
      I often get the same symptom when reading things that other people have written – every observation leads to more question.

      Q: Why the difference between blocks in dba_tables and dba_segments and dba_extents (where dba_segments and dba_extents show the same)
      See above

      Q: What if I wanted to to dump the 8 blocks, I just take the the block_id and the next 7 blocks.
      Within limits, this type of strategy would work but you can’t just add an arbitrary number (rather than 7) you need to check which blocks belong to the segment by querying dba_extents.

      Q: Then can I somehow tweak it so the extent consists of only one block and subsequently the table consists of one block.
      The minimum size is two blocks (and then only if you are using dictionary managed tablespaces) since you have to have a segment header, and there are various other restrictions that make it necessary for the first extent to have more blocks.

      Q: Do I see in the dump of the first block (which are the examples in your book, right) see evidence of the existence of the 7 other blocks.
      If you dumped the segment header block you could see some information about the rest of the blocks – including (for ASSM tablespaces) the number of second and first level bitmap blocks that precede the segment header block. Again, looking at dba_extents, you can find out whether the segment header block is really the first block of the first extent, and where it really is if it’s not the first block.

      Q: Are the undo extents than also grouped in set of 8 blocks ?
      An undo tablespace is defined with segment_space_management set to AUTO, so when an undo segment is created the first few extents follow the 64KB rule but, as with all AUTO segment space management tablespaces, as the segments grow the extent sizes are allowed to become larger (1MB, 8MB, 64MB …)

      Comment by Jonathan Lewis — February 26, 2012 @ 10:28 am GMT Feb 26,2012 | Reply

  7. Page 30 Table 2-1 ACID

    Atomicity – A transaction must be invisible or complete.

    Should this read as “indivisible” instead of “invisible”? Though invisible makes sense too.

    Another nitpick – The table is laid out with the Atomicity description as the header and the other 3 items as table items. Makes for a tiny bit of confusion.

    Comment by Rahul — February 3, 2012 @ 1:20 pm GMT Feb 3,2012 | Reply

    • I don’t think that indivisible makes any sense at all in this context. A work done by a transaction is not visible(invisible) outside this transaction (for other users) until this one completes (commit). And hence the two correct words invisible or complete.

      Comment by Houri Mohamed — February 3, 2012 @ 1:36 pm GMT Feb 3,2012 | Reply

    • Rahul,

      Although “indivisible” is a good word to use in describing a transaction (as in “a transaction is an indivisible set of changes to the database”), my choice of the word “invisible” was deliberate. My intention was to make it clear (in a sound-bite) that other users of the system do not see any of the changes I make until they are allowed to see allthe changes.

      Thanks for the note about the heading going wrong – I’ve added it to the errata page.

      Comment by Jonathan Lewis — February 26, 2012 @ 10:07 am GMT Feb 26,2012 | Reply

  8. expecting ORA-01456 but unable to. am i missing something.

    SQL*Plus: Release 11.2.0.1.0 Production on Thu Apr 19 17:43:48 2012
    
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    
    
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
    With the Partitioning, Oracle Label Security, OLAP, Data Mining,
    Oracle Database Vault and Real Application Testing options
    
    SQL> set transaction read only;
    
    Transaction set.
    
    SQL> select * from t1;
    
            ID         N1
    ---------- ----------
             1         10
    
    SQL> insert into t1 select * from t1;
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> select * from t1;
    
            ID         N1
    ---------- ----------
             1         10
             1         10
    
    SQL> insert into t2 select * from t1;
    
    2 rows created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL>
    

    Comment by raova — April 19, 2012 @ 4:52 pm BST Apr 19,2012 | Reply

  9. sorry. please ignore this. i mixed up the sessions.

    Comment by raova — April 19, 2012 @ 4:54 pm BST Apr 19,2012 | Reply

  10. here is the correct sequence.

    Session 1.
    SQL> set time on
    18:13:02 SQL>
    18:13:03 SQL> set transaction read only;
    Transaction set.
    18:13:10 SQL>
    18:13:11 SQL> select * from t1;
            ID         N1
    ---------- ----------
             1         10
    18:13:19 SQL>
    18:13:20 SQL>
    
    session 2.
    SQL> set time on
    18:13:35 SQL> insert into t1 select * from t1;
    1 row created.
    18:13:46 SQL>
    18:13:49 SQL> commit;
    Commit complete.
    18:13:51 SQL>
    18:13:52 SQL>
    
    session 1.
    18:14:05 SQL> select * from t1;
            ID         N1
    ---------- ----------
             1         10
             1         10
    18:14:12 SQL>
    18:14:14 SQL> insert into t2 select * from t1;
    2 rows created.
    18:14:23 SQL>
    18:14:25 SQL> commit;
    Commit complete.
    18:14:27 SQL>
    18:14:28 SQL>
    

    Comment by raova — April 19, 2012 @ 5:19 pm BST Apr 19,2012 | Reply

    • I don’t have a copy of 11.2.0.1 so I can’t check if this is a bug in that version, but unless you’ve managed to enable some sort of “autocommit” in session that’s not expected behaviour.

      Comment by Jonathan Lewis — April 19, 2012 @ 8:18 pm BST Apr 19,2012 | Reply

  11. Thanks Jonathan.
    I havechecked the default setting for autocommit and it is OFF.
    SQL> show all
    appinfo is OFF and set to “SQL*Plus”
    arraysize 15
    autocommit OFF
    autoprint OFF
    autorecovery OFF

    MOS DOC ID 8338582.8 points to Bug 8338582 in 11.1.0.7 but that is with index access and says it has been fixed in 11.2.0.1
    Bug 8338582 SET TRANSACTION READ ONLY may see wrong data for index access.

    Comment by raova — April 20, 2012 @ 9:03 am BST Apr 20,2012 | Reply

  12. Hello Jonathan,

    I have three question regarding the note on page 14 “…a session can read the online redo log files when it discovers the in-memory version of a block to be corrupt…”.
    1) What is the intended purpose of this mechanism, what kind of corruptions does this mechanism address, data corruptions caused by Oracle bugs, data corruptions caused by hardware (RAM) failures, any other data corruptions or a combination of those?
    2) At which points during the processing does Oracle check in-memory blocks for data corruption (every time a session visits them, before the DBWR writes them to disk, …)?
    3) Is there any metric (e.g. session statistic) showing how many times in-memory blocks were checked for corruption and/or how many times online redo logs had to be used in order to rectify such corruptions?

    Thanks a lot
    kind regards
    Martin

    Comment by Martin Maletinsky — September 1, 2012 @ 10:01 am BST Sep 1,2012 | Reply

  13. Martin,
    1) It’s aimed at blocks that are flagged as modified in the cache but which contain some sort of error. In principle I think this means that they would have to have been corrupted by a hardware bug or an Oracle code bug.

    2) There are various points where the block (or part thereof) may be checked. Some checks can be enabled or disabled by parameter settings (e.g. db_block_checksum) some are simply built-in as implicit checks (e.g. code to update a row that finds that one row over-runs into a area of memory pointed at by another rowindex entry).

    3) I don’t really know. There are a couple statistics about “db block corrupt” and “gc block corrupt” that might be relevant (one of the checks applies when block are sent across the RAC interconnect, another as they arrive). It’s also possible that some of the stats about block recovery and redo applied for recovery (which will be modified during instance recovery) wil also be modified during block recovery.

    Possible statistics (may really be only for database startup)

    redo blocks read for recovery
    redo k-bytes read for recovery
    db corrupt blocks detected
    db corrupt blocks recovered
    gc blocks corrupt
    

    Possible wait events

    recovery buffer pinned
    recovery read
    

    Comment by Jonathan Lewis — September 2, 2012 @ 9:46 pm BST Sep 2,2012 | Reply

  14. Jonathan, I would appreciate you clarification.

    Page 15: Note […] In 10g you get at least two public log buffers (redo threads) if you have more than one CPU.

    I thought that multiple threads are only in RAC databases where each instance has its own thread. How big public log buffers are and what do they have in common with the buffer defined by log_buffer parameter? Is there any sort of redo record sorting when the log writer flushes the public log buffers to disk or are they managed independently?

    How can I check how many public log buffers are used in my database? Any init parameter responsible for that? I see that log_parallelism dissappeared in 10g.

    Comment by Wojciech — September 17, 2012 @ 6:24 pm BST Sep 17,2012 | Reply

    • Wojciech,

      The reference object for buffers is x$kcrfstrand, and there’s a little script listing important addresses in the Appendix D, and I’ve just posted a variant on the script to show how you can get the sizes (and a little extra information) for your system. As a general principal, Oracle minimises the use of multiple log buffers (public and private). In the case of the private buffers this is done by picking the lowest numbered buffer that is available, in the case of the public buffers Oracle will only use more than one thread if there are indications of latch contention (on the public redo allocation latches.)

      On a log writer flush, the log writer writes each of the public redo threads in turn before returning. Tony Hasler has an interesting post on this topic.

      In general you shouldn’t be setting the log_buffer parameter, but if you do you may find that the value you set is (roughly) divided equally between the number of public redo threads – but the behaviour is not very consistent.

      Comment by Jonathan Lewis — September 17, 2012 @ 8:41 pm BST Sep 17,2012 | Reply

  15. Hi Jonathan,

    In page 21, you wrote:

    “”There is an important difference between read consistency and rolling back, of course. For read consistency we make a copy of the data block in memory and apply undo records to that block, and it’s a copy of the block that we can discard very rapidly once we’ve finished with it……”

    What do you mean by “and it’s a copy of the block that we can discard very rapidly”? I might not have clearly understood this. Does this mean that a new session could not benefit from an exact reconstructed read consistency image of a block done by a preceding session because the copy of the block used for that purpose has been rapidly discarded?

    Or does the work done on the copy of the block is kept in the buffer cache(for reuse) while the copy of the block is thrown away?

    My question came up here after having participated to the following otn thread :

    https://forums.oracle.com/forums/message.jspa?messageID=10712705#10712705

    Thanks in advance

    Mohamed Houri

    Comment by hourim — November 28, 2012 @ 10:38 am GMT Nov 28,2012 | Reply

    • Mohamed,

      The comment is simply pointing out that when we make read-consistent copies of a block it’s okay to discard them within a short period of time.
      They can be useful, and the way the copy is used may mean that it stays in memory for some time – but it’s not a copy that HAS to be protected for the sake of database correctness. (Other sessions may be able to use the CR copy we’ve created, either to read it directly, or to clone it and take their clone further back in time.).

      Comment by Jonathan Lewis — December 13, 2012 @ 7:01 pm GMT Dec 13,2012 | Reply

  16. hello Jonathan

    when we rollback a transaction ,we will apply the undo record in reverse order,and the itl in undo record will replace the itl in the data block …… finally,to get the correct version of the block.
    the process of rollback can also produce redo record and undo record,and what is its own itl(i think the itl here is different from the itl replaced by the undo record)? the block have more than one itl ,when undo record cover one itl ,if the rollback session use another ltl to point to its own new undo?

    thanks!!

    Comment by yushangfu — December 25, 2012 @ 3:42 am GMT Dec 25,2012 | Reply

    • “the process of rollback can also produce redo record and undo record,and what is its own itl”

      When you roll back a transaction, you will generate redo because you are making changes to data blocks (which includes changed to undo blocks) – but you do not generate new undo records, you are only marking existing records as “user applied”. Rolling back does not create a new transaction so there is no need for further ITL entries to come into play.

      Comment by Jonathan Lewis — December 28, 2012 @ 10:58 am GMT Dec 28,2012 | Reply

  17. Hello Jonathan

    I have a couple of questions related to the private redo buffers and in-memory undo which you describe starting at page 14.

    1) From the figures on page 16 it seems, that a transaction hits the private redo allocation latch once (and presumably “reserves” the corresponding private redo buffer for exclusive usage until the transaction flushes its contents to a public redo buffer) while the transaction hits the in-memory undo latch once per change, which suggests that one in-memory undo pool may be used simultaneously by many transactions.
    a) Are my conclusions about the usage patterns correct? I doubt a bit, as you state on page 16 “1. Start the transaction by acquiring a matching pair of the private memory structures”, which might be interpreted as “acquiring them exclusively” – however if this is the case, why would the transaction have to hit the in-memory undo latch once for every change?
    b) If so, what justifies the different usage patterns? I.e. why are the private redo buffers used exclusively by a transaction while the in-memory undo pools are shared by concurrent transactions. It seems that the number of private redo buffers equals the number of in-memory undo pools (see http://www.teamycc.com/rmoug_2008_conference/PDF/Haisley.pdf, slide 20). To me this makes the different usage pattern /allocation schema even more surprising.
    c) If my assumption on concurrent usage of the in-memory undo pools is correct, then upon commit a transaction has to search its undo change vectors from the in-memory undo pool as they may be mingled with undo change vectors from other transactions in the same pool. This seems as quite an overhead to me – are there any auxiliary structures mitigating that effect?

    2) What is the reason to apply the changes to the data and undo blocks only upon commit (rather than right after creation of the change vectors)? Are there any performance benefits from this proceeding or is it just to ensure the write-ahead protocol is respected when DBWR writes data and undo blocks to disk (which I believe could also be achieved if the DBWR just looked at the block’s “has private redo” flag).

    3) Did you write any additional blog on the in-memory undo / private redo or do you know any other resources describing the mechanism in more details?

    thank you
    kind regards
    Martin

    Comment by Martin — March 14, 2013 @ 9:59 pm GMT Mar 14,2013 | Reply

  18. Hello Jonathan

    one in memory undo pool,one private log strand。

    if we use parameter “_in_memory_undo”to disable the use of in memory undo pool,now ,if the private log strand will be used by further session

    is there any parameter can disable the use of private log strand ? or ,the parameter “_in_memory_undo” can disable two?

    thank you

    Comment by yushangfu — July 8, 2013 @ 3:39 am BST Jul 8,2013 | Reply

    • yushangfu,

      The single parameter seems to switch off both features simultaneously.

      Given the very tight link between the two pieces I don’t think it’s really surprising that there’s one on/off switch – though the choice of name doesn’t make it obvious that this is the case.

      Comment by Jonathan Lewis — July 10, 2013 @ 8:37 pm BST Jul 10,2013 | Reply

  19. Hi Jonathan,

    Thanks for your time.

    In chapter 2 page 18, you mentioned that there are “two types of redo allocation latch, one type covers the private redo threads, one type covers the public redo threads”.

    In this what do you mean by thread? when i check LGWR process on linux using ps -L , i don’t find any thread for LGWR process.

    Regards,
    Vipan

    Comment by vipan — September 25, 2013 @ 1:36 pm BST Sep 25,2013 | Reply

    • Vipan,

      This is a case of a word having different meanings in different contexts. In the context of redo the word “thread” has nothing to do with the “lightweight threads” that can operate inside a single operating system process. Oracle has used the term for many years, particular with RAC (or OPS as it was) in mind, to identify the link between an instance and the set of redo log files for that instance.

      It might have been wiser on my part to use the word “strand” in that sentence – especially since the X$ structure that refers to the redo that is in memory and not yet written to file is called x$kcrfstrand.

      Comment by Jonathan Lewis — September 27, 2013 @ 5:22 pm BST Sep 27,2013 | Reply

  20. Dear Jonathan,

    While discussing the ‘private redo’ and ‘in memory undo’ – you mentioned the below :

    “Flag each affected block as ‘has private redo’ (but don’t change the block)”

    Here, What is the need to flag affected block as ‘has private redo’?

    Regards,
    Vipan

    Comment by vipankumarsharma — November 5, 2013 @ 6:42 am GMT Nov 5,2013 | Reply

    • Vipan,

      There may be several reasons that I haven’t thought of, but one reason could be that if the database writer needs to flush the block to disc there may be some need for internal synchronisation that requires DBWR to force the redo to be applied (and the redo to be flushed) before the block is written.

      Comment by Jonathan Lewis — November 5, 2013 @ 7:36 am GMT Nov 5,2013 | Reply

  21. Hi,

    Just a small typo. “core_imu_01.sql” file contains the “core_demo_02.sql” as the script name in its header.

    Comment by talek — August 14, 2014 @ 7:09 pm BST Aug 14,2014 | Reply


RSS feed for comments on this post.

Leave a reply to Jonathan Lewis Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.