Oracle Scratchpad

December 31, 2010

Quiz Night

Filed under: Infrastructure — Jonathan Lewis @ 11:45 pm UTC Dec 31,2010

Many people in the UK like to greet the new year with some sort of party, and in this household the type of party we like includes old-fashioned party games (can you draw “Brief Encounters” in 10 seconds ?); so, for all those members of the Oracle community who haven’t prepared their party games this year, here’s a little quiz:

Which processes could report waits for “log file sequential read”, and why are they reading log files ?

    1.

    10.

Warning: the number of answers I have may, or may not, be the 10 suggested by the list.

Update Jan 5th: There were a couple of answers I wasn’t expecting (and would need to check), but some things are fairly easy to check. Here, for example, is a simple script that will identify sessions that regularly wait for ‘log file sequential read’.

select 
	ses.program
from 
	v$session_event sev,
	V$session ses
where
	sev.event = 'log file sequential read'
and	ses.sid = sev.sid
;

Answers 7th Jan:

The obvious one, of course, is ARCH (or ARCn as they tend to be) the archiver process(es). Archiving requires something to read the online redo and copy them to some other location. Here’s a brief extract from tracing an archive process – showing reads of 1MB (2048 x 512 bytes per block) -just after a log file switch:

WAIT #0: nam='log file sequential read' ela= 38410 log#=0 block#=1 blocks=2048 obj#=-1 tim=101515708285
WAIT #0: nam='Log archive I/O' ela= 183580 count=1 intr=256 timeout=-1 obj#=-1 tim=101515892589
WAIT #0: nam='log file sequential read' ela= 84409 log#=0 block#=2049 blocks=2048 obj#=-1 tim=101515977375
WAIT #0: nam='Log archive I/O' ela= 116641 count=1 intr=256 timeout=-1 obj#=-1 tim=101516094684
WAIT #0: nam='log file sequential read' ela= 31139 log#=0 block#=4097 blocks=2048 obj#=-1 tim=101516126157
WAIT #0: nam='Log archive I/O' ela= 122492 count=1 intr=256 timeout=-1 obj#=-1 tim=101516250399
WAIT #0: nam='log file sequential read' ela= 32197 log#=0 block#=6145 blocks=2048 obj#=-1 tim=101516282965
WAIT #0: nam='Log archive I/O' ela= 122779 count=1 intr=256 timeout=-1 obj#=-1 tim=101516552698
WAIT #0: nam='Log archive I/O' ela= 4 count=0 intr=32 timeout=2147483647 obj#=-1 tim=101516552996
WAIT #0: nam='Log archive I/O' ela= 25 count=1 intr=32 timeout=2147483647 obj#=-1 tim=101516553297

Slightly less obvious is LGWR, the log writer itself. Apart from reading the log files at (normal) startup to check that they’re there, LGWR reads the start of the current and next log files on every log switch. Here’s the trace file of LGWR on a log file switch where there are two files in each group:

WAIT #0: nam='log file sequential read' ela= 32788 log#=0 block#=1 blocks=1 obj#=-1 tim=100016453547
WAIT #0: nam='log file sequential read' ela= 18309 log#=1 block#=1 blocks=1 obj#=-1 tim=100016471921
WAIT #0: nam='log file single write' ela= 518 log#=0 block#=1 blocks=1 obj#=-1 tim=100016472654
WAIT #0: nam='log file single write' ela= 516 log#=1 block#=1 blocks=1 obj#=-1 tim=100016473225
Media recovery not enabled or manual archival only 0x10000
WAIT #0: nam='log file sequential read' ela= 90 log#=0 block#=1 blocks=1 obj#=-1 tim=100016473376
WAIT #0: nam='log file sequential read' ela= 73 log#=1 block#=1 blocks=1 obj#=-1 tim=100016473474
WAIT #0: nam='log file single write' ela= 477 log#=0 block#=1 blocks=1 obj#=-1 tim=100016473981
WAIT #0: nam='log file single write' ela= 856 log#=1 block#=1 blocks=1 obj#=-1 tim=100016474885

By playing around with files per group we can work out that the order of activity is:

    read the current
    write the next
    read the current
    write the current

Next up: any process which is involved in (asynchronous) Change Data Capture (CDC) or Streams will be reading the log files, as will any process running a Log Miner session.

A couple of people suggested the SMON and PX slaves could read the log file on database startup – but as another poster pointed out, it’s the startup process that reads the log files – and if parallel recovery is invoked it passes change vectors to the PX slaves, the slaves don’t read the log files directly. Apart from that, I don’t really know which processes read the log file – I don’t want to say “process X doesn’t” read the log file, because it may do at times that I haven’t considered. (And there are lots of processes to look into in 11g.)

However, there is one (generic) process that can read the log files – and that’s ANY process that has to handle data blocks. And that’s really why I wrote this note in the first place. If a process finds at any time that the buffer it’s looking at in memory is corrupt it may do real-time block recovery if it thinks the corruption has occurred between the moment the block was read from disk and the present. There are various flags and details in the buffer header that tell it that the block is dirty (differs from what’s on disk) and the redo block address (RBA) where the earliest known change to the in-memory block can be found in the log files.

So any process may, in conditions that shouldn’t happen often (we hope), identify an in-memory corrupt, read a block from disc, then go backwards in time in the redo logs and start walking the redo logs to find the redo needed to bring the block up to date. So if you see excessive amounts of “log file sequential read” coming from your foreground session, check what’s going on – it might be an indication of a hardware fault.

Thanks to anyone who participated – and if anyone runs the little script and finds some new examples of log file sequential reads, please add them to the list.

Update Jan 2011:
Thanks to Pavol Babel in comment 8 and his reply to comment 7 for adding MRP0 and RFS to the list of processes that are expected to read the log file.

23 Comments »

  1. Jonathan, I got the easy ones…
    1) FBDA – Total Recall – Flashback data archive reads redo to create “History” for a given transaction
    2) ARCH – During archive redo log processing
    3) SMON – Instance startup / recovery processing
    4) LNS – Data Guard LGWR SYNC processing, LNS on primary sends to RFS on standby
    5) ARCH – Data Guard all modes, archive gap detection/resolution
    Happy New Year!
    Leo

    Comment by Leo Glauber — January 1, 2011 @ 1:47 am UTC Jan 1,2011 | Reply

    • Leo,

      1) I don’t think FBDA will read redo logs, it checks undo in regular intervals
      3) I have never seen SMON waiting for “log file sequential read”. SMON probably triggers whole instance recovery process (and rollbacks uncommited transactions), but it will not read redo log files. The server processes (invoked by sqlplus) will always report “log file sequential read” during startup. I’m not sure why (I have only a theory).
      4) LGRW submits redo to LNS, but LNS does not directly read redo logs.

      Comment by Pavol Babel — January 4, 2011 @ 2:54 am UTC Jan 4,2011 | Reply

  2. I would like to add server processes & shared server processes (if configured).
    I remember reading an article (from Steve Adams ?) where the server processes reread a previously incompletely filled log block while adding more redo entries to it.

    regards
    srivenu

    Comment by srivenu — January 1, 2011 @ 6:50 am UTC Jan 1,2011 | Reply

    • Srivenu,

      Server processes (whether shared or dedicated) don’t write to the redo log file so they certainly don’t read it to “back-fill” incompletely filled blocks. I am confident that if you read anything to the contrary it didn’t come from Steve Adams.

      Comment by Jonathan Lewis — January 2, 2011 @ 8:03 pm UTC Jan 2,2011 | Reply

  3. Streams capture process
    Log miner

    Comment by Craig Simpson — January 1, 2011 @ 11:36 am UTC Jan 1,2011 | Reply

  4. One of those not so commonly known ones: LGWR *reads* (!) log files at instance startup (even with a clean shutdown, so no instance recovery) which are recorded as “log file sequential read” wait events for the LGWR background session I think

    Happy New Year!

    Randolf

    Comment by Randolf Geist — January 1, 2011 @ 1:12 pm UTC Jan 1,2011 | Reply

  5. Pnnn, for parallel instance and media recovery.

    Comment by Fladi — January 1, 2011 @ 1:24 pm UTC Jan 1,2011 | Reply

    • Parallel servers do not read log files when parallel recovery takes place. Just the query coordinator (server process which invoked parallel recovery) reads redo, parallel servers are just modifying data blocks.

      Comment by Pavol Babel — January 6, 2011 @ 1:13 pm UTC Jan 6,2011 | Reply

  6. I would like to add LGWR process for the same purpose as server processes when private redo is not being used.
    Srivenu

    Comment by srivenu — January 2, 2011 @ 11:56 am UTC Jan 2,2011 | Reply

    • Srivenu,

      See my earlier reply – even LGWR doesn’t “read and back-fill” log blocks that were incompletely used. If there is any free space in a block being written by LGWR then its wasted (and reported in the statistic “redo wastage”).

      Comment by Jonathan Lewis — January 2, 2011 @ 8:05 pm UTC Jan 2,2011 | Reply

  7. I’ve just added a little script that lets you check which sessions have been waiting on your systems for “log file sequential read”. The results might be a little surprising.

    Comment by Jonathan Lewis — January 5, 2011 @ 9:26 am UTC Jan 5,2011 | Reply

    • Jonathan,

      This was the very first query, which I have run on some of our production databases few hours after your new year’s post :-)

      Although we have no project using STREAMS, LOG MINING or LOGICAL STANDBY DATABASE (and I’m too lazy to create my own testing environment), Craig seems to be right. I would just add READER process used in LOGICAL STANDBY architecture. So we have three (but could be counted as one, becuase there is always some kind of log mining behind)

      I have seen following processes in my databases:
      4) LGWR – I wodner why LGWR has not been mentioned by anyone :) Yes, LGWR writes into REDO LOGS, but it also has to read few blocks from REDO LOGS during LOG SWITCH :)
      5) ARCH – clear to everyone
      6) dedicated server process – I’ve already mentioned it here . Surprisingly, when a database is recovered, redo log file seems to be read by server process (created on behalf sqlplus or rman, mostly ). That’s why are redo logs read during instance crash recovery by server process, too.
      SMON triggers crash recovery process (checking controlfile vs. datafile headers), but redo logs are read by dedicated server process, which is opening database.

      There are two more processes, which could also report “log file sequential read”, but I don’t have access to customers databases with sysdba privileges, so I’m not able to check it:
      7) MRP0 server process – reading STANDBY REDO LOG (but it is also reported as log file seq read)
      8) RFS processes on physical standby (which are actually server processes, at least in 9i) – writing to STDBY REDO LOGS and reading, too.

      Comment by Pavol Babel — January 5, 2011 @ 11:29 pm UTC Jan 5,2011 | Reply

  8. Jonathan,

    I have sent simple script to our customer to run on standby database (on which i don’t have sysdba) and the result showed, that both MRP0 and RFS processes can report LOG FILE SEQUENTIAL READ on phsyical standby database. So my expectations were correct.

    Real time block recover is really great point, I really didn’t expect such an answer.

    Comment by Pavol Babel — January 7, 2011 @ 10:40 pm UTC Jan 7,2011 | Reply

    • Pavol,

      Thanks for the update.

      It would be quite useful to build a catalogue of the “normal” behaviour of all such processes – i.e. do they (like LGWR) do just a couple of reads per file at specific events or do they (like ARCH) read the whole file.

      Comment by Jonathan Lewis — January 8, 2011 @ 10:30 am UTC Jan 8,2011 | Reply

      • Building a catgalogue of normal behaviour would be nice, but it would be really difficult job ;-)

        WE are running PHYSICAL STANDBY DATABSE in LGWR SYNC AFFIRM mode (i.e. in Maximum Availability Protection Level), so we are using STANDBY REDO LOG FILES (we are not using real time apply)
        – MRP0 is reading whole STANDBY REDO LOG FILES ( like ARCH).
        – RFS is writing to the STANDBY REDO LOG FILES (and sending back notifications do LNS preocess on the primary site in AFFIRM SYNC mode). So it’s behaviour regarding LOGS is very close to LGWR on primary database. It seems to read one block from the old STANDBY REDO LOG and one from the new one during log switch, (like LGWR).

        Comment by Pavol Babel — January 8, 2011 @ 11:08 pm UTC Jan 8,2011 | Reply

        • Pavol,

          Thanks for the follow-up.

          It may be hard to create a catlogue – but with a couple of thousand readers with different setups it becomes feasible.

          Comment by Jonathan Lewis — January 9, 2011 @ 4:11 pm UTC Jan 9,2011

  9. Jonathan,

    thank you for letting us know this interesting detail about real-time block recovery. I’ve searched the web & MOS a little bit but couldn’t find anything similar you’ve described. Do you have something like MOS Doc ID which contains more info on that topic (like when the feature was introduced, is it controllable behavior or not and how to determine if this ever happened to a block in a given database)?

    Comment by Timur Akhmadeev — January 8, 2011 @ 7:08 pm UTC Jan 8,2011 | Reply

    • Timur,

      I have to admit that I don’t know how I came to know this. I’ve searched through my document library (but it’s hard to think of the right keywords) and all I’ve found is a note in my “Optimising Oracle” seminar, written in 2000 that states that 8i will do this.

      Hunting through MOS I found document 32969.1 – which describes (amongst other things) “Logical Block Checking” and “cache recovery”. In the document this is described as the consequence of setting db_block_checking to true, leading to “soft-corruption” errors being flagged and subsequent ORA-01578 errors – but I think there are some logical checks that take place even when db_block_checking is not -set, with the same real-time cache recovery being attempted.

      I happened to write this blog note up because I recently hit a problem at a client site where several sessions seemed to go into a frenzied attempt to do cache recovery at the same time, going back several hours into the archived redo logs! We never got to the bottom of it, but it looked like a race condition appearing in the code to create a locally partitioned index online when row movement between partitions had been enabled and was happening at a fairly high rate.

      Comment by Jonathan Lewis — January 9, 2011 @ 4:08 pm UTC Jan 9,2011 | Reply

      • Jonathan,

        I think I’ve just seen something similar on an Oracle 11.2.0.2 instance running on the Linux x86-64:

        Wed Jan 26 03:35:48 2011
        Corrupt Block Found
                 TSN = 4, TSNAME = TS_NAME
                 RFN = 29, BLK = 1277711, RDBA = 122912527
                 OBJN = 95299, OBJD = 95846, OBJECT = OBJ_NAME, SUBOBJECT =
                 SEGMENT OWNER = SEG_OWNER, SEGMENT TYPE = Table Segment
        Wed Jan 26 03:36:14 2011
        Errors in file .../....trc  (incident=19408):
        ORA-00600: internal error code, arguments: [kdBlkCheckError], [29], [1277711], [18038], [], [], [], [], [], [], [], []
        Incident details in: ...
        Wed Jan 26 03:36:49 2011
        Use ADRCI or Support Workbench to package the incident.
        See Note 411.1 at My Oracle Support for error and packaging details.
        Doing block recovery for file 3 block 79091
        Resuming block recovery (PMON) for file 3 block 79091
        Block recovery from logseq 8670, block 125005 to scn 9324189340887
        Recovery of Online Redo Log: Thread 1 Group 5 Seq 8670 Reading mem 0
          Mem# 0: /.../redo05.log
        Block recovery completed at rba 8670.125263.16, scn 2170.4110308568
        Doing block recovery for file 29 block 1277711
        Resuming block recovery (PMON) for file 29 block 1277711
        Block recovery from logseq 8670, block 124942 to scn 9324189340887
        Recovery of Online Redo Log: Thread 1 Group 5 Seq 8670 Reading mem 0
          Mem# 0: /.../redo05.log
        Block recovery completed at rba 8670.125263.16, scn 2170.4110308568
        

        db_block_checking=false for this database, but still when running into ORA-00600 [kdBlkCheckError] Oracle tries to auto recover corrupted block.

        Comment by Timur Akhmadeev — February 4, 2011 @ 11:47 am UTC Feb 4,2011 | Reply

      • Here’s another example, possibly, of a similar event being raised in the OTN database forum – although it looks as if this one might only be the online redo log files being used to recover a couple of undo segment headers to derive transaction commit times.

        http://forums.oracle.com/forums/thread.jspa?threadID=2175710&tstart=0

        Comment by Jonathan Lewis — February 14, 2011 @ 4:01 pm UTC Feb 14,2011 | Reply

  10. I’m with Timur :-) . More information would be appreciated. I have to admitt, if i saw foreground process with many “log file sequential read”, I wouldn’t guess what’s going wrong… Was any trace generated in this situation?

    Comment by Pavol Babel — January 8, 2011 @ 9:16 pm UTC Jan 8,2011 | Reply

  11. NSAn processes will also report the ‘log file sequential read’ in a dataguard configuration.

    The Oracle documentation states the following about this process:
    “Ships redo from current online redo logs to remote standby destinations configured for ASYNC transport”
    which suggests the process is reading from the redo log file.

    On a production database during peek load I found the NSA processes to be responsible for more reads from the redo logs than the archiver processes.

    Comment by Martin Maletinsky — June 23, 2011 @ 1:43 pm UTC Jun 23,2011 | Reply

  12. Another case for reading log files: as part of dumping errorstack at level 3 the process can also dump some redo records and this inevitably produces ‘log file sequential read’ waits.

    Comment by Timur Akhmadeev — January 20, 2012 @ 8:47 am UTC Jan 20,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 871 other followers