I think that many DBAs are likely to be familiar with the event “free buffer waits” and recognise it as an indication that DBWR (the database writer) is not clearing the buffer cache of dirty blocks fast enough. But that’s not the only cause of free buffer waits.
If a session needs to load a block from disk or make a consistent read clone of an existing buffer, it needs to be able to empty another buffer and re-use it. But if all the buffers are either dirty or pinned the session has to wait for a buffer to become available – either when the database writer cleans some buffers (by copying them to disc) or when some pinned buffers are finally released. (See, for example this note from Steve Adams - written in Feb 2000 before 9i went on general release – for further comment). Given the size of buffer caches these days, and the limit on how many buffers a single session may pin, the “free buffer waits” is more likely to be a database writer problem than anything else.
However, I came across a surprise variation on this problem a few weeks ago (on a 126.96.36.199 system). On a totally repeatable test run of a (very busy) web-based application, my client was reporting excessive time lost on “free buffer waits” – even though he could see from a simple query against view v$bh (the buffer headers) that the db cache was half empty before, during, and after the test.
So for the next test run I ran my snapshot code against x$kcbwds (the “working data sets” object) to see how localised the waits were. The snapshot is (roughly) the following query:
select set_id, fbwait, wcwait, bbwait from x$kcbwds;
The free buffer waits (fbwait) were spread evenly across just 24 adjacent sets of the entire list – and a quick check of v$buffer_pool showed that those working data sets made up the recycle cache.
So, one step closer to the understanding the problem, I asked the client which objects had storage clauses that assigned them to the recycle cache. The answer was none. Naturally I queried dba_segments to check this – since something clearly was getting into the recycle cache – and sure enough, there really were no objects defined to use the recycle cache.
So I whipped up a little query to find out what was in the recycle cache – the SQL, execution plan, and a couple of comments follow. This isn’t a script that I would advise you to use on a production system because it’s just a quick hack I put together for a test system – and I haven’t worked out if (a) it’s actually completely correct and (b) there any consequential side-effects.
rem rem What objects and blocks are in the recycle pool rem select /*+ ordered use_hash(xbh) cardinality(wds 24) */ xbh.dbarfil, xbh.dbablk, xbh.obj, xbh.state from x$kcbwds wds, x$bh xbh where wds.set_id between ( select min(bp_lo_sid) from x$kcbwbpd where bp_name = 'RECYCLE' ) and ( select min(bp_hi_sid) from x$kcbwbpd where bp_name = 'RECYCLE' ) and xbh.set_ds = wds.addr / set doc off doc Expected execution plan - the hash join with x$bh as the probe table is probably the only sensible option Note - the calls to check x$kxbwdp are actually running as child operations to a "hidden" FILTER on x$kcbwds Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=24 Bytes=1848) 1 0 HASH JOIN (Cost=44 Card=24 Bytes=1848) 2 1 FIXED TABLE (FULL) OF 'X$KCBWDS' (Cost=22 Card=24 Bytes=456) 3 2 SORT (AGGREGATE) 4 3 FIXED TABLE (FULL) OF 'X$KCBWBPD' (Cost=22 Card=1 Bytes=25) 5 2 SORT (AGGREGATE) 6 5 FIXED TABLE (FULL) OF 'X$KCBWBPD' (Cost=22 Card=1 Bytes=25) 7 1 FIXED TABLE (FULL) OF 'X$BH' (Cost=22 Card=100 Bytes=5800) #
The results of this query were very interesting – and I’ve reproduced a few lines from the output below. For reference, state=3 decodes to ‘CR’ (consistent read).
DBARFIL DBABLK OBJ STATE ---------- ---------- ---------- ---------- 49 52674 1020720 3 49 52674 1020720 3 49 52674 1020720 3 49 52675 1020720 3 49 52675 1020720 3 49 52675 1020720 3 49 52676 1020720 3 49 52676 1020720 3 49 52677 1020720 3 49 52680 1020720 3 49 52680 1020720 3 14 9 4294967295 3 14 9 4294967295 3 14 9 4294967295 3 14 9 4294967295 3 14 9 4294967295 3 14 1289 4294967295 3 14 1289 4294967295 3 14 1289 4294967295 3
Object 1020720 was the most popular data segment in the entire system – which just happened to be allocated to the keep cache, not the recycle cache. As you may have guessed from the block numbers, the blocks from the undo segments (identified by obj = 4294967295) were all segment header blocks. You’ll also notice that there are multiple CR blocks for nearly every block I’ve reported.
Somewhere I once wrote a note about Oracle’s intention to create CR copies in the recycle cache for objects assigned to the keep cache in order to protect the keep cache from excessive garbage. But at the time I pointed out that I had never seen this feature working. Finally, I have an example of this mechanism in action; and the trick even includes CR clones of undo segment header blocks (the things you’ve managed to generate when you see statistics about “transaction tables consistent read rollbacks” in v$sysstat).
My interpretation of the waits, then, was this: because the recycle cache was there it was being used as a dumping ground for CR blocks of a few special objects. But because the system was so busy (and the recycle cache so small) the system regularly got into a state where a session needed to create a CR block for one of the special objects at a moment when all the buffers in the recycle cache were still pinned (They’re CR blocks, so never dirty, hence never written). So the session waited for a buffer to be unpinned rather than create the CR copy in the keep or default caches.
It would have been nice to run further checks on this hypothesis – were there any CR copies of the critical objects in the keep or default caches, for example, and if so did their flags differ at all from those in the recycle cache. Unfortunately there was no time for further investigation. (Oracle can copy the current block to a new buffer and mark the older block as the CR copy, for example – an event recorded under statistic “switch current to new buffer”.)
The client had created a small recycle cache “just in case” and didn’t really need it, so we got rid of it and re-ran the test suite. Inevitably (well, almost inevitably) there were no free buffer waits recorded.
So, in a busy 9i environment, free buffer waits might be a sign of DBWR failing to keep up – but it’s just possible that they might be an indication of some sort of conflict between the recycle cache and volatile objects in the keep cache. It’s worth a quick check of the buffer pool statistics (v$buffer_pool_statistics for the ‘official’ view) to see which caches the waits appear in.
Update Mar 2011:
Following on from a question about the KEEP pool that appeared on OTN, I’ve repeated a few of my tests on 10.2.0.3 and 188.8.131.52 – and there have been some significant changes. In 10.2.0.3 (which is just the version I happened to test, the change might have come in much earlier, maybe even 10.1) Oracle is able to use the DEFAULT cache to create CR copies of blocks in the KEEP cache if you haven’t created a RECYCLE cache. In 184.108.40.206 the number of cases where the CR block go into the DEFAULT cache has increased.
However, when you update a block, Oracle may clone the block to create a new CURrent copy and re-label the previous copy as a CR copy – in this case you will still end up with CR blocks in the KEEP cache.