OC 6 Writing and Recovery

Addenda and Errata for Oracle Core Chapter 6 Writing and Recovery

Addenda

p.123	In the Note on this page I mention the option for setting the redo log block size in 11.2; Charles Hooper has written a related note that you might want to read.
p.126	In the side-bar with the title “Messages” I mention the log file sync timeout, and how this has changed from being fixed at one second up to 11.1, but being configurable from 11.2.0.1 with a new default value of 0.1 second (10 centi-seconds). This point came up in a discussion I had recently about a RAC system that was showing a lot of “Block Lost” errors on just one of its two nodes; the link being that the other node was showing a very large number of log file sync timeouts. When a block is sent from one RAC node to another the sending node first has to flush the log buffer to disc, and if this takes too long the received will time out at 0.5 seconds and report a lost block.
p.143	In the Note on this page, I make some comments about the absolute and relative file numbers for a file. I failed to point out that the relative file number is “relative” to the tablespace. The numbering system is a bit strange, however, and this is for reasons of backward compatibility with Oracle version 7 which only allowed 1023 (or possibly 1022) datafiles in the database compared to the limit of 65,533 that appeared in version 8.In fact the limit for version 6 was only 63 (plus or minus 1) which is why it’s a good idea to use the functions supplied in the dbms_rowid package to translate data block addresses into (file number, block number) pairs rather than try to work out the convoluted way that Oracle mangled the bits to make version 7 backwards compatible with versin 6.

Errata

p.126	In the side-bar with the title “Messages” last para: “… where the start* of current free space is …”* should be “… where the end* of current free space is …”*.
p.129	The Note on this page references a couple of new instance statistics, but calls one of them “redo synch write* time (usec)”* – this should “redo synch time (usec)” without the “write”.
p.140	Figure 6-5, right hand portion: there are four arrows pointing at the top buffer header. The two shorter arrows (i.e. the ones attached to the lower part of the buffer header) should be pointing away from the buffer header.
p.145	Second paragraph last three lines (just above the heading “Checkpoints and Queues” the text reads “…it can unlink them from WRITE_AUX, relink them to REPL_MAIN, and…”; the relink goes to REPL_AUX, not REPL_MAIN
p.155	Last paragraph: “Any block logged in file 195 will be a version with a last change SCN less than 133279 – but it won’t necessarily be a version with a last SCN that matches our target; any block logged in file 194 will be a version with a last change SCN less than 132564, …” This start of this paragraph is nearly back to front, almost as if I’d decided the SCN reported for each file was the highest SCN in the file rather than the lowest. As Wojtek says in this comment, the sentence should look more like: “Any block logged in file 195 will be a version with a last change SCN greater than or equal to 133279 – so newer than our target; any block logged in file 194 will be a version with a last change SCN greater than or equal to 132564, but less than 133279, …”

Typos/Grammar/Style/Punctuation

Back to Index

Comments (31)

31 Comments »

Jonathan,

in the section Log Writer Writes, at the end of the note “MESSAGE”

It’s easy for the session to check, of course, because it knows where lgwr had to get to satisfy it’s write(the session’s buffer#) and it can see where the current start of free space <– it should be "the current end of free space"

after flushing the redo entries from log buffer to redo log files, the lgwr advances the end of free space. So that foreground session can compare its buffer# with the pointer.

Comment by Sid — December 18, 2011 @ 7:57 am GMT Dec 18,2011 | Reply
- Sid,
  
  Thank you, you are correct, and I’ve added the comment to the Errata.
  
  Comment by Jonathan Lewis — December 19, 2011 @ 12:43 pm GMT Dec 19,2011 | Reply
Jonathan,

In the Note section on page 143, the book states the following, “… For testing purposes you can set event 10120 before adding a data file to the database—this makes Oracle create a file with different absolute and relative numbers.”

There really is not much information on the Internet about this event number, although I did find a small number of websites (for example http://www.adp-gmbh.ch/ora/tuning/diagnostic_events/list.html ) that suggested that the event number has an entirely different purpose:
“10120 CBO Disable index fast full scan”

A small test case:
```
ALTER SYSTEM SET EVENTS '10120 TRACE NAME CONTEXT FOREVER, LEVEL 1';
 
CREATE SMALLFILE TABLESPACE "TEST1" LOGGING DATAFILE 'C:\Oracle\OraData\OR1122P\TEST1.dbf' SIZE 5M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
 
SELECT
  FILE#,
  RFILE#
FROM
  V$DATAFILE
WHERE
  NAME LIKE '%TEST1%';
 
FILE#     RFILE#
----- ----------
    9         10
```
The above test case suggests that the Note section in the book is correct. I just thought that I would mention the potential risk of confusion due to conflicting information found on the Internet.

Comment by Charles Hooper — December 24, 2011 @ 1:02 am GMT Dec 24,2011 | Reply
- Continuing the previous test case, turn off event 10120, and create another tablespace with a single datafile:
```
ALTER SYSTEM SET EVENTS '10120 TRACE NAME CONTEXT OFF';
 
CREATE SMALLFILE TABLESPACE "TEST2" LOGGING DATAFILE 'C:\Oracle\OraData\OR1122P\TEST2.dbf' SIZE 5M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
 
SELECT
  FILE#,
  RFILE#
FROM
  V$DATAFILE
WHERE
  NAME LIKE '%TEST2%';
 
FILE#     RFILE#
----- ----------
   10         10
```
  Event 10120 clearly behaves as described in the book. Now my database has two datafiles with the same relative file number 10. Related to the Note’s comments, how would the user-facing code in Oracle Database that expects the relative file number react to the above situation – it seems as though one other key piece of information must be requested when the relative file number is required by user-facing code in Oracle Database.
  
  Referencing the documentation:
  http://docs.oracle.com/cd/E14072_01/server.112/e10595/dfiles001.htm
  “Absolute: Uniquely identifies a datafile in the database.
  Relative: Uniquely identifies a datafile within a tablespace.”
  
  Comment by Charles Hooper — December 24, 2011 @ 1:24 am GMT Dec 24,2011 | Reply
- Charles,
  
  Thanks for that. It prompted me to add a little note to the Addenda for the Appendix showing how to generate an up to date list of events in the10,000 range since that’s one of the things I used from time to time when investigating problems.
  
  I think event 10120 changed it’s meaning in the 8.1 time scale, the “CBO Index fast full scan” may have beeen the v7 and 8.0 effect, but I no longer have copies of those versions around to check. It’s a useful reminded, of course, that the events are (a) undocumented and (b) subject to change with no notice.
  
  I’ve also pointed outin the addenda for this chapter your follow-up detail that “relative” file number is nominally “relative to tablespace” – although when you look closely the numbering isn’t anything that would match what most people would think of as “relative” since Oracle doesn’t start each tablespace at (relative) file number 1.
  
  There’s plenty of scope for error when working with file numbers – for example, rowids show relative file numbers, not absolute file numbers:
```
SQL> select rowid from t1;

ROWID
------------------
AAAYHMAAGAAAACKAAA

1 row selected.

SQL> select rowid from t2;

ROWID
------------------
AAAYHLAAGAAAAAZAAA

1 row selected.
```
  The format of the rowid is: {data_object_id[6]} {relative_file_number[3]} {block number[6]} {rownumber[3]} so, in this case, both rowids come from relative file AAG (which translates into relative file 6 – which I can count to since AAA corresponds to zero).
  
  Since I happen to know that the two tables are in different tablespaces Oracle has to have some way of working out that there are two different absolute file numbers involved – and it can do this through the data object id, which can be used to determine the tablespace number for the segment.
  
  Comment by Jonathan Lewis — December 28, 2011 @ 7:10 pm GMT Dec 28,2011 | Reply
Here’s a question from Tony Hasler, orignally posted under the main index page for Oracle Core – but copied and answered here.

I am not sure if this question belongs here or in chapter 6 but here goes.

I am thoroughly confused by the concept of multiple public redo strands. Suppose three dependent transactions occur in the order T1 , T2, T3. Suppose the redo data for T1 and T3 are placed in one public redo strand and the redo data for transaction T2 gets placed in another. It seems that the redo log will eventaully contain the data in the order T1, T3, T2 or T2, T1, T3 depending on which strand’s buffer is written out by LGWR first. This would seem to suggest that any recovery process would need to have some mechanism for looking ahead and reordering redo before applying it but I have never heard of such a thing. I am similarly confused about how dependent transactions in RAC are recovered.

Can you help me out?

Tony,
Your assumption about recovery having to “look ahead” and reordering redo before applying it is, I believe, correct; and your mention of RAC is extremely pertinent. When the question originally came up about how Oracle could handle recovery from parallel threads the first answer I heard was “it’s been doing it for years with RAC”.

There are a couple of public clues about the mechanism – there is a statistic called “redo ordering marks” which is a little suggestive, and there has been a KK lock type for ages that is used as the “Kick” lock when one RAC instance forces another instance to switch log files in order to keep the log file sequences close to each other. Beyond that I don’t have any details – I could do some hand-waving and produce a hypothesis, but with your previous experience in designing logging mechanisms you could probably do at least as well as I could.

The patents supplied by Timur may contain the information you need – but I haven’t read them yet.

Reply made by Timuy Akhmadeev to Tony’s question at its original location:

Take a look at patents http://www.google.com/patents/US5974425 and http://www.google.com/patents/US7039773.

Comment by Jonathan Lewis — March 12, 2012 @ 8:10 am GMT Mar 12,2012 | Reply
- Jonathan,
  
  Timur’s references do, in fact, explain everything. I had found the patent for the previous OPS mechanism (with the default few second delay) but had searched in vain for a description of anything new. The records are indeed cached and sorted before application. Not only does this sort out the issues I raised but also allows a) records confirming the writing of a data block to disk to avoid any attempt to apply redo generated earleir and b) blocks to be updated in order optimising head movements. This algorithm isn’t rocket science but obviously only becomes practical with modern high memory systems. I haven’t read the patents in detail yet but do intend to. Seems like the contents would make for a good user group masterclass one day!
  
  Comment by tonyhasler — March 12, 2012 @ 10:41 am GMT Mar 12,2012 | Reply
- Hello,
  
  If I understand correctly the term “dependent transactions” it implies that T1 commits before T2 makes any changes which depend on changes made by T1 and T2 commits before T3 makes any changes depending on changes made by T2. If this is correct, than I believe the scenario described by Tony Hasler would not be possible however.
  If T1 commits before T2 makes any modification depending on T1 then all redo records generated by T1 will be written to the redo log files before any redo records are generated for the dependent modifications made by T2 and idem for T2/T3, no matter which redo strands are used by the transactions.
  Did I misunderstand something?
  
  Example:
  Assume transactions T1, T2, T3 which consecutively increment the value of one column for one row in a table i.e. “UPDATE T SET VAL=VAL+1 WHERE ID=:X;”. It is my understanding that this is what is ment by “dependent transactions” as mentioned in the scenario described by Tony Hasler.
  In this example T2 cannot lock the row in table T until T1 will have committed. I would assume that T2 will not generate any redo for the increment of T.VAL until it has locked the corresponding row. Once T2 can lock the row, this impiles that T1 has committed (or rolled back) and thus all redo records generated by T1 are in the redo log file, i.e. the redo records generated by T2 for the modification (increment) of the column value will be placed after the redo records generated by T1 in the redo log file.
  
  Thank you for clarification
  kind regards
  Martin
  
  Comment by Martin Maletinsky — March 14, 2013 @ 9:18 am GMT Mar 14,2013 | Reply
  - Martin,
    
    At the hypothetical level given in the manuals you’re correct. In fact Tony’s starting scenario is possible because of a time-lag that he and I have both blogged about elsewhere.
    
    When a session issues a commit (on a small transaction), the sequence of events is this:
  - copy private redo buffer to public log buffer
    apply changes to data buffers
    post message to lgwr to write
    wait for callback

Hello Jonathan,

Thank you for your reply. So if I understand correctly, the scenario is one possible side effect of what you describe in the section “ACID Anomaly” starting on page 129? I am however surprised that Oracle tries to rectify this side effect (by reordering redo records before applying them) but does not address the root problem (ACID anomaly) itself. Are there maybe other scenarios which justify / necessitate the reordering of redo records during recovery?

kind regards
Martin

Comment by Martin — March 14, 2013 @ 7:49 pm GMT Mar 14,2013

Hello Jonathan,

On page 141 you describe the scenario where LGWR is stuck because it cannot overwrite a redo log that contains changes that where not yet written to the data files by DBWR. You mention “… occasionally it means you need a faster device for your redo logs”. Shouldn’t that say “…you need a faster device for your data files”. As I understand it, the redo logs are already written too fast in this scenario (relatively speaking), thus I would not expect a faster device for the redo logs to relieve the problem.

Thank you for clarification
kind regards
Martin

Comment by Martin — March 18, 2013 @ 9:20 pm GMT Mar 18,2013 | Reply

Hello Jonathan,

On page 139 you describe how a buffer is getting on the checkpoint queue when it becomes dirty. You also write that the buffer’s LRBA is set – I understand this as being the Redo Block Address of the change record describing the change that made the buffer dirty. However, at the moment the buffer becomes dirty, this change record has not necessarily been written to the redo logs – it may still be in one of the (potentially multiple) log buffers. How can Oracle at this moment determine the Redo Block Address this record is going to be written to? This becomes even more difficult when taking into account the possibility that redo from private redo buffers may yet have to be flushed into one of the public redo log buffers and may thus end up in the redo log files before or after the change record mentioned before depending on the public redo log buffer it is written to and the order in which LGWR will write the public redo logs to disk.
Could it be (this is just a guess) that for the purpose of checkpointing it is sufficient to have a lower bound of what I understood to be the LRBA, in which case it might be the Redo Block Address last written by LGWR at the moment the buffer became dirty?

thank you
kind regards
Martin

Comment by Martin — March 18, 2013 @ 9:37 pm GMT Mar 18,2013 | Reply

Hello Jonathan,

Starting on page 151 you describe the scenario that leads to the “log file switch (private strand flush incomplete)” message in the alert log. I think I got the big picture, but there are a couple of things I didn’t understand.

1) What does “triggers a log file switch prematurely” exactly mean in the context it appears on page 151 – does LGWR start writing into the new log file at this very moment or does it mean the processing you describe subsequently (DBWR flushing private redo …) is initiated?

2) After the log file switch is triggered prematurely, DBWR flushes private redo into the public buffers as you describe on page 152. How does the redo finally get into the (old) redo log file? Is it DBWR that (exceptionally) writes this data into the redo log files (as you write “the database writer gets it [the redo data] there [into the log file]”) or is it LGWR? However, if it was LGWR, wouldn’t the data end up in the new log file, as the log file switch has already happened (see question 1)?

3) When Oracle decides at moment t to switch from log file L1 to log file L2, I believe it must ensure that (i) everything that was in the private or public redo buffers at time t will go into log file L1 and that (ii) no redo generated after time t will go into log file L1. (i) is to ensure that all the redo records in each log file are between this file’s starting SCN and the following file’s starting SCN as you describe on page 151 (I don’t understand exactly why this is necessary, but it seems plausible that e.g. in some scenarios the code needs to be sure it got all redo records up to a certain SCN). (ii) I believe is necessary to ensure that L1 can be safely overwritten once the corresponding media recovery checkpoint completes (as DBWR will have written to disk the data blocks that were modified up to time t, but blocks modified more recently may still not be up to date in the datafiles).
Does that imply that Oracle will stop redo generation for all sessions (not only those using private redo) until the private strands are flushed into the public redo buffers? This might be necessary to prevent some redo generated before time t from ending up in log file L2 despite the space that was reserved in L1 at the premature log file switch (which would compromise (i)) and also to prevent new redo generated after time t to end up in L1 (which would compromise (ii)). Does Oracle even stop such redo generation until the LGWR has written the entire public redo buffers (including the flushed content from the private redo buffers) into log file L1 or are there other ways (i.e. telling the LGWR after which redo buffer entry it should start writing into the new log file L2)?

Than you
kind regards
Martin

Comment by Martin — March 18, 2013 @ 10:44 pm GMT Mar 18,2013 | Reply

Jonathan,
(purely nitpick here)
Error in Errata above:

p.126 In the side-bar with the title “Messages” last para: “… where the start of current free space is …” should be “… where the end of current free space is …”.
should be:
p.126 In the side-bar with the title “Messages” last para: “… where the current start of free space is …” should be “… where the current end of free space is …”.

This is one of the least expensive books I have ever purchased, given what rich insights it provides!
Thanks, Merrill

Comment by Merrill B Lamont Jr. — December 15, 2013 @ 7:54 pm GMT Dec 15,2013 | Reply

Hi Jonathan,

Since there are series of changes to undo blocks and data blocks during any transaction, at what point of time the change vectors related to undo blocks and data blocks are written to log buffers?

Regards, Vipan

Comment by vipankumarsharma — August 30, 2014 @ 12:29 pm BST Aug 30,2014 | Reply

The answer to that question is in the book in the chapter on undo and redo.

Comment by Jonathan Lewis — September 1, 2014 @ 11:08 am BST Sep 1,2014 | Reply

To further elaborate on my question, are all the changes follow the write ahead policy, i mean changes to undo segment header at the start of the transaction and changes related to locking rows in the table.

Regards,
Vipan

Comment by vipankumarsharma — August 30, 2014 @ 1:06 pm BST Aug 30,2014 | Reply

Hi Jonathan,
i faced a weird buffer cache issue at client site and re-read your chapter 6 for some DBWR details – in the course of that i may discovered a (boundary) case that is not mentioned or i may missed that information somewhere. Hopefully you can bring some light into (my) darkness.

On page 139 you mentioned that the buffers are linked in order of RBA on checkpoint queue, but they can’t go on that queue as long as they are subject to private redo. So basically every buffer block that has been changed by a session that used less than round about 128 KB private redo (= 64-bit system) and has not committed yet, will stay dirty in buffer cache and will not be flushed to disk by DBWR by incremental checkpointing. These kind of buffer blocks may be covered by DBWR in case of space pressure (sub chapter “Database Writer and LRU” starting on page 143), but what about private redo and RBA in that case? On page 145 you wrote “Dbwr wakes up and scans through WRITE_MAIN, unlinking buffers from WRITE_MAIN and relinking them to WRITE_AUX. Remember that a changed buffer can’t be copied to disk until the related redo has been written to disk—it’s at this point that dbwr can compare RBAs, decide that a buffer shouldn’t go from WRITE_MAIN to WRITE_AUX, and post lgwr to write the redo.”

But we don’t have the RBA (for comparison) for that kind of buffer blocks yet as the redo is still private (block is flagged as “has private redo”) and so what happens next? Does the DBWR (or any other process?) post the corresponding foreground session(s) to copy the private redo to public so that the LGWR can flush that information? … or does DBWR skip all of these flagged (“affected by private redo”) blocks and they remain in buffer cache (and can not be flushed out at all) until the corresponding redo is made public on the foreground session’s intention.

Based on the observed behavior (wired buffer cache issue) i would assume that it is the latter case, but i was not able to track and prove that.

I hope you can enlighten me. Thank you very much.

Regards
Stefan

Comment by Stefan Koehler — December 28, 2014 @ 2:46 pm GMT Dec 28,2014 | Reply

Hi Jonathan,
i have done some further researching and it seems like (or better said i interpret it like that), that Oracle sets the LBRA in case of private redo as well. I have uploaded the whole reproducible test case on my web site ( http://tinyurl.com/l682mya ). As you can see Oracle sets a valid LBRA (0x6f.10c.0) for the CUR buffer header (0x733e4d88), even if the transaction (= 4 updates) was not committed and private redo was used (3908 bytes).

I am not quite sure what causes the difference between your described observation on page 139 and my test case, but maybe you have a clue and can point me in the right direction, if i have done a mistake or hit some special case with my demo.

Thank you.

Regards
Stefan

Comment by Stefan Koehler — December 28, 2014 @ 7:48 pm GMT Dec 28,2014 | Reply

Stefan,

I can think of four possibilities (at the moment)

a) I was wrong
b) Things have changed since I wrote that page
c) You’re updating by tablescan which does a special “switch current to new buffer” to do the update and follows a different code path
d) the LRBA is the LRBA current LGWR position when the change was made, but not the LRBA where the copy to log finally takes place

(a) is more likely than (b), of course, (c) is an option I may never have looked at.
I think you may have condemned me to a couple of days of careful testing (when I can find the time) to check what’s happening.

Comment by Jonathan Lewis — January 3, 2015 @ 1:25 pm GMT Jan 3,2015 | Reply

Hi Jonathan,
thank you for your reply. I don’t want you to waste your time on testing that LRBA stuff carefully, sorry if it looked like that. I rather hoped, that you may already know where that LBRA difference come from and you can just write a few lines about it. I have done some more further researching on this topic and can contribute a few comments to your thoughts.

b) I have tested it on 10.2.0.5 and 11.2.0.3.6 with the exact same behavior
c) I also tested it with an index on column num and forced that index by a hint. It was always the exact same behavior (update by index and update by full table scan with “switch current to new buffer”)

However i found something interesting by running numerous test cases. The buffer (header) is flagged with private redo right after the update for a few seconds (we can verify that, if we are fast enough with dumping the buffer headers), but if we wait a few seconds (even if we do not commit in the meantime and private redo is still active/used) and re-dump the buffer headers we can see that a LRBA is assigned.

Here are the corresponding snippets for my Oracle 10g and 11g tests (with index on column num and that index was used by DML).

11.2.0.3.6
=========================================================================================================================================
      INDX REDO_USAGE
---------- ----------
	 0	 7260

CHAIN: 63277 LOC: 0x83cee870 HEAD: [0x73be0f48,0x73be0f48]
    BH (0x73be0e98) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x738f2000
      set: 7 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 3,19
      dbwrid: 0 obj: 76017 objn: 76017 tsn: 4 afn: 4 hint: f
      hash: [0x83cee870,0x83cee870] lru: [0x73be10b0,0x73be0e50]
      lru-flags: debug_dump
      ckptq: [NULL] fileq: [NULL] objq: [0x8008b800,0x73be0e78] objaq: [0x8008b7f0,0x73be0e88]
      use: [NULL] wait: [NULL]
      st: XCURRENT md: NULL fpin: 'kdswh01: kdstgr' tch: 4 txn: 0x82c435b0
      flags: private
      LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]

… wait a few seconds and then re-dump again …

CHAIN: 63277 LOC: 0x83cee870 HEAD: [0x73be0f48,0x73be0f48]
    BH (0x73be0e98) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x738f2000
      set: 7 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 33,19
      dbwrid: 0 obj: 76017 objn: 76017 tsn: 4 afn: 4 hint: f
      hash: [0x83cee870,0x83cee870] lru: [0x73be10b0,0x73be0e50]
      lru-flags: debug_dump
      obj-flags: object_ckpt_list
      ckptq: [0x71be1a78,0x71fe6548] fileq: [0x83e28800,0x83e28800] objq: [0x8008b810,0x8008b810] objaq: [0x8008b7f0,0x73be0e88]
      st: XCURRENT md: NULL fpin: 'kdswh01: kdstgr' tch: 6
      flags: buffer_dirty redo_since_read
      LRBA: [0x7c.68.0] LSCN: [0x0.15db18] HSCN: [0x0.15db18] HSUB: [1]

      INDX REDO_USAGE
---------- ----------
	 0	 7260
=========================================================================================================================================


10.2.0.5
=========================================================================================================================================
      INDX REDO_USAGE
---------- ----------
	 0	 8324

CHAIN: 86036 LOC: 0x7906ed70 HEAD: [70bf3b68,70bf3b68]
    BH (0x70bf3b68) file#: 4 rdba: 0x0100000e (4/14) class: 1 ba: 0x70aca000
      set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 7
      dbwrid: 0 obj: 54668 objn: 54668 tsn: 4 afn: 4
      hash: [7906ed70,7906ed70] lru: [70bf3cf8,70bf3ad8]
      lru-flags: debug_dump
      ckptq: [NULL] fileq: [NULL] objq: [70bf3d68,70bf3b48]
      use: [NULL] wait: [NULL]
      st: XCURRENT md: NULL tch: 2 txn: 0x78add0a8
      flags: private gotten_in_current_mode
      LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
      buffer tsn: 4 rdba: 0x0100000e (4/14)
      scn: 0x0000.0009be40 seq: 0x01 flg: 0x06 tail: 0xbe400601
      frmt: 0x02 chkval: 0xcc16 type: 0x06=trans data

… wait a few seconds and then re-dump again …

CHAIN: 86036 LOC: 0x7906ed70 HEAD: [70bf3b68,70bf3b68]
    BH (0x70bf3b68) file#: 4 rdba: 0x0100000e (4/14) class: 1 ba: 0x70aca000
      set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 7
      dbwrid: 0 obj: 54668 objn: 54668 tsn: 4 afn: 4
      hash: [7906ed70,7906ed70] lru: [70bf3cf8,70bf3ad8]
      lru-flags: debug_dump
      obj-flags: object_ckpt_list
      ckptq: [723f3ec8,70bf7828] fileq: [70bf3328,791a8968] objq: [767ba578,767ba578]
      st: XCURRENT md: NULL tch: 2
      flags: buffer_dirty gotten_in_current_mode redo_since_read
      LRBA: [0x26.46.0] HSCN: [0x0.9c237] HSUB: [1]
      buffer tsn: 4 rdba: 0x0100000e (4/14)
      scn: 0x0000.0009c237 seq: 0x03 flg: 0x00 tail: 0xc2370603
      frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

      INDX REDO_USAGE
---------- ----------
	 0	 8324
=========================================================================================================================================

Finally it seems like some mechanism/process assigns a LRBA to the buffer (header) after a few seconds. The LRBA (from block header information) and parallel redo log analysis seems to confirm your thoughts of point d.

Regards
Stefan

P.S.: I moved my initial test case to my public folder – new URL: http://tinyurl.com/nl9agda

Comment by Stefan Koehler — January 3, 2015 @ 5:05 pm GMT Jan 3,2015

Stefan,

My comment about testing was about my need to satisfy my curiosity, not a suggestion that you were trying to make me do something I didn’t want to do.

I am puzzled by the appearance of “switch current to new buffer” on an indexed update – I just re-ran a couple of tests on 11.2.0.4 and didn’t see that happening, but your tests may be sufficiently different from mine that they expose a case that I missed.

If you can isolate the work sufficiently, you could check x$ktiff immediately after an update, and then a few seconds later (corresponding to the lag in the buffer header change), to see if you can seen any changes in the flush count – if there are then the category may give you some clue about what’s happening. Maybe (for example) the act of pinning it with the dump buffers call is sufficient to force the private redo to be flushed to the public buffer without the session recording that fact. You could try experiments along the lines of: “update, dump, update, commit” vs. “update, update, commit” on the same block to see if the dump results in any changes in the redo and IMU stats.

Comment by Jonathan Lewis — January 5, 2015 @ 10:01 am GMT Jan 5,2015

Hi Jonathan,

>> I am puzzled by the appearance of “switch current to new buffer” on an indexed update – I just re-ran a couple of tests on 11.2.0.4 and didn’t see that happening, but your tests may be sufficiently different from mine that they expose a case that I missed.

I meant that the LRBA assignment is always the same – in case of an update by full table scan (“switch current to new buffer”) and in case of an update by index (no “switch current to new buffer”). A “switch current to new buffer” does not occur in case of an update by index, but the LRBA is assigned as well – so this behavior is not caused by a different code path for “switch current to new buffer” or something like that (possibility C). Sorry, if my statement was not that clear. The snippets from comment (January 3, 2015 @ 5:05 pm GMT Jan 3,2015) are from an update by index and as you can see only one XCURRENT buffer is in the cache (no CR buffers).

Thanks for the hint with x$ktiff. I’ll keep you updated, if i will find something new and relevant related to this topic. Thank you.

Regards
Stefan

Comment by Stefan Koehler — January 5, 2015 @ 10:35 am GMT Jan 5,2015

Hello Jonathan,
On figure, 6-2, where is a 3rd location (flag) showing busy write on log buffer? I found only a pointer 1 and 2.

Thank you,
Bunditj

Comment by bunditj — May 2, 2015 @ 1:01 pm BST May 2,2015 | Reply

Bunditj,

I didn’t show a location with the write flag on the picture. I didn’t think it would add any value to the diagram and might just make it look more cluttered.

Comment by Jonathan Lewis — May 5, 2015 @ 4:41 pm BST May 5,2015 | Reply

Hi Jonathan

A quick query. What if we have a situation when a buffer header is showing on WRITE_MAIN and dbwr is yet to pick it and push/ pin it on WRITE_AUX, and I fire a sql requiring the block held by the buffer ?
Will Oracle allow me to use the associated buffer after my session has found it on the CBC and if yes I guess a buffer header can jump from WRITE_MAIN to REPL_MAIN ?

Thanks and regards
Vineet Ranjan

Comment by Vineet Ranjan — June 13, 2016 @ 8:19 am BST Jun 13,2016 | Reply

Vineet,

It’s a bit of a shock to be realise that the book is nearly five years old!

I though I’d made some comment about this scenario, but re-reading what I wrote I can see that there are a couple of gaps in the description that raise questions like the one you’ve asked. If the block is on WRITE_MAIN and pinned by your query I think DBWR will skip it and not transfer it to WRITE_AUX. If you pin it exclusively when it’s on WRITE_AUX I’m not sure what happens – maybe you can’t because possibly dbwr has it pinned exclusive, maybe DBWR has to go into a buffer busy wait until you release the pin. Either way I don’t think a buffer can get from WRITE_MAIN to REPL_MAIN or REPL_AUX without passing through WRITE_AUX.

Comment by Jonathan Lewis — June 14, 2016 @ 9:51 am BST Jun 14,2016 | Reply

Dear Jonathan,

Sorry to bother you with perhaps a trivial issue, but I think something is wrong regarding Flashback Database on page 155 that I don’t find described here already. The book on page 155 (bottom) states:

“Any block logged in file 195 will be a version with a last change SCN less than 133279 – but it won’t necessarily be a version with a last SCN that matches our target; any block logged in file 194 will be a version with a last change SCN less than 132564, …”

I believe it should read more less:

“Any block logged in file 195 will be a version with a last change SCN greater than or equal to 133279 – so newer than our target; any block logged in file 194 will be a version with a last change SCN greater than or equal to 132564, but less than 133279, …”

In the current form it does not make sense to me. It would be great if you could comment on this.

Thank you,
Wojtek

Comment by Wojtek — May 6, 2022 @ 11:50 am BST May 6,2022 | Reply

Wojtek,

Thanks for the question and apologies for not replying sooner. I’ll have to read a couple of pages from the book to get the context before I answer. I’ll follow up as soon as possible.

Regards
Jonathan Lewis

Update: That didn’t take long. The book is obviously wrong – it looks almost as if I was thinking of the recorded SCN being the HIGHEST SCN in the log file – and you’re “reversed” description is the correct one. Many Thanks, I’ll write that up into the Errata.

Comment by Jonathan Lewis — November 17, 2022 @ 12:41 pm GMT Nov 17,2022 | Reply

Many thanks Jonathan for replying, better late than never. It’s really nice that you are able to admit a mistake and that you created this errata to allow readers to get corrected information and even report errors. On this occasion I must also say that I really appreciate your book. Despite it being quite old, it is still a great source of knowledge that is nowhere else to be found. Thanks!

Comment by Wojtek — November 17, 2022 @ 4:57 pm GMT Nov 17,2022 | Reply

Wojtek,

Thanks for the comment, and I’m delighted to hear that you appreciate the book despite its age.

Regards
Jonathan Lewis

Comment by Jonathan Lewis — November 17, 2022 @ 6:40 pm GMT Nov 17,2022

Already have a WordPress.com account? Log in now.

Oracle Scratchpad