Oracle Scratchpad

January 10, 2022

Control File Waits

Filed under: Infrastructure,LOBs,Oracle,Performance — Jonathan Lewis @ 2:10 pm GMT Jan 10,2022

Originally drafted 4th Aug 2011, this is a blog note that has been waiting a little over 10 years to be completed and published; and it’s probably worth doing it even now since the topic came up on the Oracle database forum just a couple of weeks ago.

The original title was CF Enqueues but I’ve combined it with notes from another draft called LOB write waits to get to the final title because a couple of small variations in my test script can produce all the annoying symptoms you may want to investigate in that area.

The forum note that triggered completion and publication was a complaint that’s common: “why is this so slow” but with a cause that comes up  less frequently: “BLOB operations and control file parallel writes”. The OP was running Oracle 19.9 but was suffering from a problem which (when I checked my archive and MOS) turned out to have a history (and a workaround) that started with 8.0.4. Their performance issue was due to their BLOB defintion which, with the storage information removed, was as follows:

LOB (CONTENT) STORE AS BASICFILE (
        ENABLE     STORAGE IN ROW
        CHUNK      32768
        PCTVERSION 10
        NOCACHE
        NOLOGGING
)

Note, particularly, the key words basicfile, nocache, and nologging. You should not be using basicfile LOBs in 19c, and both the nocache and nologging options can introduce problems of I/O and enqueue contention.

The impact of I/O and and enqueue contention depends, of course, on the nature of the work being done – you might be wasting effort or losing time but it might be a waste of resources that you’re not worred about and time that is effectively invisible to the end-user. For this user the task looked like an example of a “batch running as a pl/sql loop” repeatedly executing a statement similar to the following:


insert into table2(blob_column2)
select  blob_column1 
from    table1
where   id = :B1

A large fraction of the processing time was spent on “control file parallel write” (a consequence of nologging basicfile LOBs) and a further significant fraction was spent on “direct path write” (a consequence of nocache basicfile LOBs).

To understand the problem of the control file parallel write waits you need to know that every time a session makes a nologging change to a data file it also has to update an “unrecoverable SCN” field in the controlfile entry for that file. But if you’re inserting lots of individual LOBs then that turns into a lot of control file writes and the time spent on those writes, and the time spent waiting for other sessions to complete their writes and release the CF enqueue that might be protecting the control file from conflicting writes, can become significant.

As I said, this problem was first noted in 8.0.4, and an event was created to disable this control file update for LOBs (which were only of the basicfile variant at the time). The event was 10359 and the entry in $ORACLE_HOME/rdbms/mesg/oraus.msg for this event (which still has the same effect in 19.9) says:


10359, 00000, "turn off updates to control file for direct writes"
// *Cause:
// *Action:  Control files won't get updated for direct writes for  LOBs
//           when NOCACHE NOLOGGING is set. The only bad impact that it
//           can have is that if you are using the recovery manager,
//           it may affect a warning that says that the user should
//           back everything up. Now the recovery manager won't know
//           to tell you that the files that were updated with
//           unrecoverable events should be backed up.

Setting this event solved the OP’s performance problem.

It doesn’t really sound very nice, of course, setting an event that stops Oracle from telling you that “the files … should be backed up”, but the text does say the only bad impact is that it “may affect a warning” not that future attempts at recovery are going to fail. In fact there’s also MOS Doc ID: 1058851.6 Performance Degradation as a result of ‘enq: CF – contention (last updated March 2019 despite only having references to Oracle 8.0.4) which says (my emphasis):

If event 10359 is set to level 1, update of the control file with invalidation
redo is stopped.  This will not affect any recovery, but note that recovery
manager reports will be stale.

With event 10359 out of the way (but see footnote), I want to get back to my comment about switching to securefile LOBs. Here’s a very simple script I wrote a few years ago (and tweaked occasionally thereafter) to test the performance characteristics of LOBs:

rem
rem     Script: lob_write_waits.sql
rem     Author: Jonathan Lewis
rem     Dated:  Aug 2014
rem

drop table test_lobs;

@@setup

create table test_lobs (
        id              number(5),
        bytes           number(38),
        text_content    clob
)
lob (text_content)
store as 
--      securefile
        basicfile
        text_lob(
        disable storage in row
        nocache nologging
--
--      cache logging compress low                              --  0.26
--      cache reads logging compress low                        --  1.00
--      nocache logging compress low                            --  1.00
--
--      cache nologging compress low                            -- 14.00
--      cache reads nologging compress low                      --  1.00
--      nocache nologging compress low                          --  1.00
--
--      cache filesystem_like_logging compress low              -- 11.00
--      cache reads filesystem_like_logging compress low        --  1.00
--      nocache filesystem_like_logging compress low            --  1.00
)
;

insert into test_lobs values (1,1000, rpad('x',3000));
commit;

execute snap_my_stats.start_snap
execute snap_events.start_snap
execute snap_enqueues.start_snap

set serveroutput off
set feedback off

-- alter session set events '10359 trace name context forever, level 1';
-- alter session set db_unrecoverable_scn_tracking=false;
-- alter session set events '10046 trace name context forever, level 8';

declare
        c1 clob;
begin
        for r in 1..1000 loop
                insert into test_lobs values(r, 1000, lpad(r,1000,0));
                commit;
        end loop;
end;
/

commit;

alter session set events '10046 trace name context off';
alter session set events '10359 trace name context off';
alter session set db_unrecoverable_scn_tracking=true;

spool lob_write_waits

set serveroutput on
set feedback on

execute snap_enqueues.end_snap
execute snap_events.end_snap
execute snap_my_stats.end_snap

spool off

The script creates a table with a LOB column and inserts one row into it to avoid minor details of object creation and/or block formatting affecting the results I see from a loop test inserting 1000 rows one at a time with commit. You’ll notice that I’ve got lots of possible configurations for the LOB segment, including a choice between basicfile and securefile.

As it stands it will create a basicfile lob with the nocache nologging attributes (all the other options I have commented out are possible only for securefile LOBs thanks to the compress low clause.) Apart from the loop the script also allows a couple of parameters to be changed, or events to be set; and captures snapshots of sundry statistics.

If I allow the controlfile activity to take place, the following snippet shows the highlights of what my basicfile LOB does for me (or to me!) in 21c when the database is in ARCHIVELOG mode (which you shold be if you are trying to test the effects of any unrecoverable or nologging operation – the database is not in force_logging mode, by the way):

----------------------------------
System enqueues
Interval:-      6 seconds
----------------------------------
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
CF         2,006           0       2,006           0           0 contention
XR         1,000           0       1,000           0           0 database force logging
TM         1,002           0       1,002           0           0 contention
TX         1,031           0       1,031           0           0 contention

---------------------------------------------------------
Session Events
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
direct path sync                                  1,000           0           125.32        .125           6
control file sequential read                      5,014           0             5.50        .001           0
control file parallel write                       4,000           0           385.75        .096           6
direct path write                                 1,000           0             3.02        .003           5

---------------------------------
Session stats
Interval:-  6 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
total cf enq hold time                                                       3,953
total number of cf enq holders                                               1,000
physical writes                                                              1,000
physical writes direct                                                       1,000
physical writes direct (lob)                                                 1,000
redo entries                                                                 2,638
redo size                                                                1,647,276
redo size for direct writes                                                102,460
KTFB alloc space (block)                                                 9,437,184
ASSM gsp:get free block                                                      1,014
lob writes                                                                   1,000

As you can see, there’s a significant penalty due to the CF (controlfile) enqueues and control file reads and writes; and if I simply set event 10359 to level 1 (or set db_unrecoverable_scn_tracking to false) the most significant change is that the CF and control file I/O disappear and the job takes about 2 seconds instead of 6.

There’s nothing particularly interesting about the session activity stats, but since the activity is nologging it’s interesting to note that the redo size is nowhere near the 8MB we might expect (8KB blocks x 1,000 LOBs) even though we can see that Oracle has had to allocate 9MB of space for (mostly) the LOB data segment. Each nologging/unrecoverable LOB data block has averaged about 100 bytes of redo for tracking unrecoverable blocks. (When I changed the LOB chunk to 32K this KTFB allocation increased to 32MB – but that shouldn’t be a surprise).

When I switch to securefile LOBs (with the nocache nologging options) the results look like this:

----------------------------------
System enqueues
Interval:-      2 seconds
----------------------------------
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
TM         1,001           0       1,001           0           0 contention
TX         2,073           0       2,073           0           0 contention
DW         1,057           0       1,057           0           0 contention

---------------------------------------------------------
Session Events
Interval:-      2 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
direct path write                                 1,000           0            85.72        .086           4

---------------------------------
Session stats
Interval:-  2 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
redo entries                                                                 7,737
redo size                                                                3,594,280
KTFB alloc space (block)                                                 1,114,112
securefile allocation bytes                                              8,192,000
securefile allocation chunks                                                 1,000
securefile direct write bytes                                            8,192,000
securefile direct write ops                                                  1,000
securefile bytes non-transformed                                         2,000,000

If I use the compression low option at the same time the statistic about securefile bytes non-transformed drops to zero – not surprisingly – and I see 1,000 WG enqueues with reason “delete fso” and 3,000 WG enqueues with reason “lock fso”, but the timing is basically unchanged. The DW enqueue is the “In memory Dispenser” where the id1/id2 reference a tablespace and block address, and the WG enqueue is the “Write gather local enqueue” where the id1/id2 report the LOB ID (in two halves).

The most significant change, though, and the one that made me spend time on looking at results and traces etc. was what happened when I set the system to use “cache nologging” (or “cache filesystem_like_logging”) – the effect was really surprising; the report on enqueues didn’t show any significant differences (the same WG enqueues appeared if I enabled compression), but this is what happened with the session events:

---------------------------------------------------------
Session Events
Interval:-      12 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
write complete waits                              1,000           0         1,007.02       1.007           5

For every LOB write there was a write complete wait that averaged close to one centisecond! That’s a fairly suspect number, looking more like a time-out from a self-timer than a wait that’s terminated by some sort of signal of an action completed. After thinking about what might be happening and browsing through the session activity stats I noticed the following clue:

---------------------------------
Session stats
Interval:-  12 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
messages sent                                                                2,000

That looks like two messages sent for each LOB inserted – which I confirmed by doing a few more tests with different numbers of LOBs. So who is received those messges (and not replying to them)? I re-ran the test taking a snapshot of the session stats for every session in the instance and extracted (using grep) the figures for just the “messages sent” and “messages received”. Here are the results, rearranged slightly:

SID:   283: - oracle - oracle@linux21.localdomain (W003)
messages sent                                                              4
SID:   293: - oracle - oracle@linux21.localdomain (W006)
messages sent                                                              3
SID:   294: - oracle - oracle@linux21.localdomain (W005)
messages sent                                                              4
SID:    20: - oracle - oracle@linux21.localdomain (W000)
messages sent                                                              8
SID:    33: - oracle - oracle@linux21.localdomain (W002)
messages sent                                                              2
SID:    44: - oracle - oracle@linux21.localdomain (W007)
messages sent                                                              3

SID:    51:TEST_USER - jonathan - sqlplus@linux21.localdomain (TNS V1-V3)
messages sent                                                          2,004    (2,000)
SID:   263: - oracle - oracle@linux21.localdomain (DBW0)
messages sent                                                          1,989    (2,000)
messages received                                                      3,021    (3,000)
SID:    11: - oracle - oracle@linux21.localdomain (LGWR)
messages sent                                                          1,021    (1,000)
messages received                                                      2,019    (2,000)

If we ignore the messages sent by Wnnn (which, I believe, go to DBW0) and allow for a few timing/rounding errors, we might decide to work with the figures in brackets in the above. With a few extra tests, changing “commit;” to “commit write wait;” or commenting out the “commit;” my best interpretation (i.e. guess) of these numbers came down to:

  • The user session sends 1,000 messages to DBWR (to write the nologging LOB) and 1,000 messages to LGWR to write the “ordinary” table redo (and the number 1,000 also show up in the “redo synch writes” stat and  “log file sync” waits if we use “commit write wait”)
  • LGWR receives 1,000 messages from the session, and 1,000 messages from DBW0 (which would relate to “LOB management” with redo OP code 26.4 and 13.53) and sends 1,000 completion messages to DBW0.
  • DBW0 receives 1,000 messages from the session and 1,000 messages from LGWR, and sends 1,000 “log LOB Maintenance” messages to LGWR.

And if you’ve been counting carefully you’ll have noticed that there are still 1,000 message sent and received by DBW0 that I haven’t been able to account for. It’s possibly a “double-counting” error; or maybe there’s another process involved in a dialogue with DBW0 that doesn’t record its end of the dialogue, or maybe it’s DBW0 sending messages to itself(!). At some point I may remember how to switch on message tracing and check exactly which messages are sent from where to where, but for the moment I’m going to put a line under the investigation and just highlight the critical effect.

Whatever’s going on, a key detail from the experiments above is that the session does not seem to get any sort of message telling it that the unlogged LOB is safely on disc, it just goes to sleep for 1/100 second – presumably checking the buffer status when it wakes up to decide whether or not the cached LOB has been written.

Footnote

You may have come across the parameter db_unrecoverable_scn_tracking which appeared some time around 11.2.0.4 with the description: “Track nologging SCN in controlfile”. If this parameter is set to true (the default value) at instance startup then you can use event 10359 to switch the feature on and off at the session (and possibly system) level. More elegantly, though, you can set the parameter at the session level instead of using the numeric event to address the problem:

alter session set db_unrecoverable_scn_tracking=false;

I would advise against setting the value to false at instance startup, or even through a call to “alter system”; there’s a comment in Doc ID 12360160.8 – ALTER SYSTEM / SESSION of DB_UNRECOVERABLE_SCN_TRACKING that if the parameter is set to false at startup it can’t be changed dynamically thereafter.  (I haven’t tested if this last detail is still true.)

tl;dr

You really shouldn’t be using basicfile LOBs with newer versions of Oracle, you need to convert them to securefile LOBs.

If you do use basicfile LOBs then you may need to set event 10359 (or adjust the db_unrecoverable_scn_tracking parameter) if you want them to be nologging LOBs.

Even with securefile LOBs the options “cache nologging” and “cache filesystem_like_logging” introduce a surprising I/O wait time of 1/100 second per LOB, so those are two options to avoid.  All other securefile variations seemed to behave nicely (for me) but the amount of undo and redo can vary considerably with option so even with a 1/100 second time penalty you might still choose to use one of these options.

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: