Originally drafted 4th Aug 2011, this is a blog note that has been waiting a little over 10 years to be completed and published; and it’s probably worth doing it even now since the topic came up on the Oracle database forum just a couple of weeks ago.
The original title was CF Enqueues but I’ve combined it with notes from another draft called LOB write waits to get to the final title because a couple of small variations in my test script can produce all the annoying symptoms you may want to investigate in that area.
The forum note that triggered completion and publication was a complaint that’s common: “why is this so slow” but with a cause that comes up less frequently: “BLOB operations and control file parallel writes”. The OP was running Oracle 19.9 but was suffering from a problem which (when I checked my archive and MOS) turned out to have a history (and a workaround) that started with 8.0.4. Their performance issue was due to their BLOB defintion which, with the storage information removed, was as follows:
LOB (CONTENT) STORE AS BASICFILE ( ENABLE STORAGE IN ROW CHUNK 32768 PCTVERSION 10 NOCACHE NOLOGGING )
Note, particularly, the key words basicfile, nocache, and nologging. You should not be using basicfile LOBs in 19c, and both the nocache and nologging options can introduce problems of I/O and enqueue contention.
The impact of I/O and and enqueue contention depends, of course, on the nature of the work being done – you might be wasting effort or losing time but it might be a waste of resources that you’re not worred about and time that is effectively invisible to the end-user. For this user the task looked like an example of a “batch running as a pl/sql loop” repeatedly executing a statement similar to the following:
insert into table2(blob_column2) select blob_column1 from table1 where id = :B1
A large fraction of the processing time was spent on “control file parallel write” (a consequence of nologging basicfile LOBs) and a further significant fraction was spent on “direct path write” (a consequence of nocache basicfile LOBs).
To understand the problem of the control file parallel write waits you need to know that every time a session makes a nologging change to a data file it also has to update an “unrecoverable SCN” field in the controlfile entry for that file. But if you’re inserting lots of individual LOBs then that turns into a lot of control file writes and the time spent on those writes, and the time spent waiting for other sessions to complete their writes and release the CF enqueue that might be protecting the control file from conflicting writes, can become significant.
As I said, this problem was first noted in 8.0.4, and an event was created to disable this control file update for LOBs (which were only of the basicfile variant at the time). The event was 10359 and the entry in $ORACLE_HOME/rdbms/mesg/oraus.msg for this event (which still has the same effect in 19.9) says:
10359, 00000, "turn off updates to control file for direct writes" // *Cause: // *Action: Control files won't get updated for direct writes for LOBs // when NOCACHE NOLOGGING is set. The only bad impact that it // can have is that if you are using the recovery manager, // it may affect a warning that says that the user should // back everything up. Now the recovery manager won't know // to tell you that the files that were updated with // unrecoverable events should be backed up.
Setting this event solved the OP’s performance problem.
It doesn’t really sound very nice, of course, setting an event that stops Oracle from telling you that “the files … should be backed up”, but the text does say the only bad impact is that it “may affect a warning” not that future attempts at recovery are going to fail. In fact there’s also MOS Doc ID: 1058851.6 Performance Degradation as a result of ‘enq: CF – contention (last updated March 2019 despite only having references to Oracle 8.0.4) which says (my emphasis):
If event 10359 is set to level 1, update of the control file with invalidation
redo is stopped. This will not affect any recovery, but note that recovery
manager reports will be stale.
With event 10359 out of the way (but see footnote), I want to get back to my comment about switching to securefile LOBs. Here’s a very simple script I wrote a few years ago (and tweaked occasionally thereafter) to test the performance characteristics of LOBs:
rem rem Script: lob_write_waits.sql rem Author: Jonathan Lewis rem Dated: Aug 2014 rem drop table test_lobs; @@setup create table test_lobs ( id number(5), bytes number(38), text_content clob ) lob (text_content) store as -- securefile basicfile text_lob( disable storage in row nocache nologging -- -- cache logging compress low -- 0.26 -- cache reads logging compress low -- 1.00 -- nocache logging compress low -- 1.00 -- -- cache nologging compress low -- 14.00 -- cache reads nologging compress low -- 1.00 -- nocache nologging compress low -- 1.00 -- -- cache filesystem_like_logging compress low -- 11.00 -- cache reads filesystem_like_logging compress low -- 1.00 -- nocache filesystem_like_logging compress low -- 1.00 ) ; insert into test_lobs values (1,1000, rpad('x',3000)); commit; execute snap_my_stats.start_snap execute snap_events.start_snap execute snap_enqueues.start_snap set serveroutput off set feedback off -- alter session set events '10359 trace name context forever, level 1'; -- alter session set db_unrecoverable_scn_tracking=false; -- alter session set events '10046 trace name context forever, level 8'; declare c1 clob; begin for r in 1..1000 loop insert into test_lobs values(r, 1000, lpad(r,1000,0)); commit; end loop; end; / commit; alter session set events '10046 trace name context off'; alter session set events '10359 trace name context off'; alter session set db_unrecoverable_scn_tracking=true; spool lob_write_waits set serveroutput on set feedback on execute snap_enqueues.end_snap execute snap_events.end_snap execute snap_my_stats.end_snap spool off
The script creates a table with a LOB column and inserts one row into it to avoid minor details of object creation and/or block formatting affecting the results I see from a loop test inserting 1000 rows one at a time with commit. You’ll notice that I’ve got lots of possible configurations for the LOB segment, including a choice between basicfile and securefile.
As it stands it will create a basicfile lob with the nocache nologging attributes (all the other options I have commented out are possible only for securefile LOBs thanks to the compress low clause.) Apart from the loop the script also allows a couple of parameters to be changed, or events to be set; and captures snapshots of sundry statistics.
If I allow the controlfile activity to take place, the following snippet shows the highlights of what my basicfile LOB does for me (or to me!) in 21c when the database is in ARCHIVELOG mode (which you shold be if you are trying to test the effects of any unrecoverable or nologging operation – the database is not in force_logging mode, by the way):
---------------------------------- System enqueues Interval:- 6 seconds ---------------------------------- Type Requests Waits Success Failed Wait m/s Reason ---- -------- ----- ------- ------ -------- ------ CF 2,006 0 2,006 0 0 contention XR 1,000 0 1,000 0 0 database force logging TM 1,002 0 1,002 0 0 contention TX 1,031 0 1,031 0 0 contention --------------------------------------------------------- Session Events Interval:- 6 seconds --------------------------------------------------------- Event Waits Time_outs Csec Avg Csec Max Csec ----- ----- --------- ---- -------- -------- direct path sync 1,000 0 125.32 .125 6 control file sequential read 5,014 0 5.50 .001 0 control file parallel write 4,000 0 385.75 .096 6 direct path write 1,000 0 3.02 .003 5 --------------------------------- Session stats Interval:- 6 seconds --------------------------------- Name Value ---- ----- total cf enq hold time 3,953 total number of cf enq holders 1,000 physical writes 1,000 physical writes direct 1,000 physical writes direct (lob) 1,000 redo entries 2,638 redo size 1,647,276 redo size for direct writes 102,460 KTFB alloc space (block) 9,437,184 ASSM gsp:get free block 1,014 lob writes 1,000
As you can see, there’s a significant penalty due to the CF (controlfile) enqueues and control file reads and writes; and if I simply set event 10359 to level 1 (or set db_unrecoverable_scn_tracking to false) the most significant change is that the CF and control file I/O disappear and the job takes about 2 seconds instead of 6.
There’s nothing particularly interesting about the session activity stats, but since the activity is nologging it’s interesting to note that the redo size is nowhere near the 8MB we might expect (8KB blocks x 1,000 LOBs) even though we can see that Oracle has had to allocate 9MB of space for (mostly) the LOB data segment. Each nologging/unrecoverable LOB data block has averaged about 100 bytes of redo for tracking unrecoverable blocks. (When I changed the LOB chunk to 32K this KTFB allocation increased to 32MB – but that shouldn’t be a surprise).
When I switch to securefile LOBs (with the nocache nologging options) the results look like this:
---------------------------------- System enqueues Interval:- 2 seconds ---------------------------------- Type Requests Waits Success Failed Wait m/s Reason ---- -------- ----- ------- ------ -------- ------ TM 1,001 0 1,001 0 0 contention TX 2,073 0 2,073 0 0 contention DW 1,057 0 1,057 0 0 contention --------------------------------------------------------- Session Events Interval:- 2 seconds --------------------------------------------------------- Event Waits Time_outs Csec Avg Csec Max Csec ----- ----- --------- ---- -------- -------- direct path write 1,000 0 85.72 .086 4 --------------------------------- Session stats Interval:- 2 seconds --------------------------------- Name Value ---- ----- redo entries 7,737 redo size 3,594,280 KTFB alloc space (block) 1,114,112 securefile allocation bytes 8,192,000 securefile allocation chunks 1,000 securefile direct write bytes 8,192,000 securefile direct write ops 1,000 securefile bytes non-transformed 2,000,000
If I use the compression low option at the same time the statistic about securefile bytes non-transformed drops to zero – not surprisingly – and I see 1,000 WG enqueues with reason “delete fso” and 3,000 WG enqueues with reason “lock fso”, but the timing is basically unchanged. The DW enqueue is the “In memory Dispenser” where the id1/id2 reference a tablespace and block address, and the WG enqueue is the “Write gather local enqueue” where the id1/id2 report the LOB ID (in two halves).
The most significant change, though, and the one that made me spend time on looking at results and traces etc. was what happened when I set the system to use “cache nologging” (or “cache filesystem_like_logging”) – the effect was really surprising; the report on enqueues didn’t show any significant differences (the same WG enqueues appeared if I enabled compression), but this is what happened with the session events:
--------------------------------------------------------- Session Events Interval:- 12 seconds --------------------------------------------------------- Event Waits Time_outs Csec Avg Csec Max Csec ----- ----- --------- ---- -------- -------- write complete waits 1,000 0 1,007.02 1.007 5
For every LOB write there was a write complete wait that averaged close to one centisecond! That’s a fairly suspect number, looking more like a time-out from a self-timer than a wait that’s terminated by some sort of signal of an action completed. After thinking about what might be happening and browsing through the session activity stats I noticed the following clue:
--------------------------------- Session stats Interval:- 12 seconds --------------------------------- Name Value ---- ----- messages sent 2,000
That looks like two messages sent for each LOB inserted – which I confirmed by doing a few more tests with different numbers of LOBs. So who is received those messges (and not replying to them)? I re-ran the test taking a snapshot of the session stats for every session in the instance and extracted (using grep) the figures for just the “messages sent” and “messages received”. Here are the results, rearranged slightly:
SID: 283: - oracle - oracle@linux21.localdomain (W003) messages sent 4 SID: 293: - oracle - oracle@linux21.localdomain (W006) messages sent 3 SID: 294: - oracle - oracle@linux21.localdomain (W005) messages sent 4 SID: 20: - oracle - oracle@linux21.localdomain (W000) messages sent 8 SID: 33: - oracle - oracle@linux21.localdomain (W002) messages sent 2 SID: 44: - oracle - oracle@linux21.localdomain (W007) messages sent 3 SID: 51:TEST_USER - jonathan - sqlplus@linux21.localdomain (TNS V1-V3) messages sent 2,004 (2,000) SID: 263: - oracle - oracle@linux21.localdomain (DBW0) messages sent 1,989 (2,000) messages received 3,021 (3,000) SID: 11: - oracle - oracle@linux21.localdomain (LGWR) messages sent 1,021 (1,000) messages received 2,019 (2,000)
If we ignore the messages sent by Wnnn (which, I believe, go to DBW0) and allow for a few timing/rounding errors, we might decide to work with the figures in brackets in the above. With a few extra tests, changing “commit;” to “commit write wait;” or commenting out the “commit;” my best interpretation (i.e. guess) of these numbers came down to:
- The user session sends 1,000 messages to DBWR (to write the nologging LOB) and 1,000 messages to LGWR to write the “ordinary” table redo (and the number 1,000 also show up in the “redo synch writes” stat and “log file sync” waits if we use “commit write wait”)
- LGWR receives 1,000 messages from the session, and 1,000 messages from DBW0 (which would relate to “LOB management” with redo OP code 26.4 and 13.53) and sends 1,000 completion messages to DBW0.
- DBW0 receives 1,000 messages from the session and 1,000 messages from LGWR, and sends 1,000 “log LOB Maintenance” messages to LGWR.
And if you’ve been counting carefully you’ll have noticed that there are still 1,000 message sent and received by DBW0 that I haven’t been able to account for. It’s possibly a “double-counting” error; or maybe there’s another process involved in a dialogue with DBW0 that doesn’t record its end of the dialogue, or maybe it’s DBW0 sending messages to itself(!). At some point I may remember how to switch on message tracing and check exactly which messages are sent from where to where, but for the moment I’m going to put a line under the investigation and just highlight the critical effect.
Whatever’s going on, a key detail from the experiments above is that the session does not seem to get any sort of message telling it that the unlogged LOB is safely on disc, it just goes to sleep for 1/100 second – presumably checking the buffer status when it wakes up to decide whether or not the cached LOB has been written.
Footnote
You may have come across the parameter db_unrecoverable_scn_tracking which appeared some time around 11.2.0.4 with the description: “Track nologging SCN in controlfile”. If this parameter is set to true (the default value) at instance startup then you can use event 10359 to switch the feature on and off at the session (and possibly system) level. More elegantly, though, you can set the parameter at the session level instead of using the numeric event to address the problem:
alter session set db_unrecoverable_scn_tracking=false;
I would advise against setting the value to false at instance startup, or even through a call to “alter system”; there’s a comment in Doc ID 12360160.8 – ALTER SYSTEM / SESSION of DB_UNRECOVERABLE_SCN_TRACKING that if the parameter is set to false at startup it can’t be changed dynamically thereafter. (I haven’t tested if this last detail is still true.)
tl;dr
You really shouldn’t be using basicfile LOBs with newer versions of Oracle, you need to convert them to securefile LOBs.
If you do use basicfile LOBs then you may need to set event 10359 (or adjust the db_unrecoverable_scn_tracking parameter) if you want them to be nologging LOBs.
Even with securefile LOBs the options “cache nologging” and “cache filesystem_like_logging” introduce a surprising I/O wait time of 1/100 second per LOB, so those are two options to avoid. All other securefile variations seemed to behave nicely (for me) but the amount of undo and redo can vary considerably with option so even with a 1/100 second time penalty you might still choose to use one of these options.
[…] Control File Waits (Jan 2022): Investingating where time can be lost on I/O with basicfile Lobs. […]
Pingback by LOB Catalogue | Oracle Scratchpad — July 15, 2022 @ 8:04 pm BST Jul 15,2022 |
[…] Control File Waits (Jan 2022): Investingating where time can be lost on I/O with basicfile Lobs. […]
Pingback by Performance catalogue | Oracle Scratchpad — July 15, 2022 @ 8:09 pm BST Jul 15,2022 |