Oracle Scratchpad

March 16, 2011

buffer flush

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 6:34 pm BST Mar 16,2011

This is part 2 of an article on the KEEP cache. If you haven’t got here from part one you should read that first for an explanation of the STATE and CUR columns of the output.

Here’s a little code to demonstrate some of the problems with setting a KEEP cache – I’ve set up a 16MB cache, which gives me 1,996 buffers of 8KB in 10.2.0.3, and then created a table that doesn’t quite fill that cache. The table is 1,900 data blocks plus one block for the segment header (I’ve used freelist management to make the test as predictable as possible, and fixed the pctfree to get one row per block).

create table t1
pctfree 90
pctused 10
storage (buffer_pool keep)
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1900
;

alter system flush buffer_cache;

-- scan the table to load it into memory

select
	/*+ full(t1) */
	count(small_vc)
from	t1
where
	id > 0
;

-- check the content of x$bh from another connection

-- update every fifth row (380 in total)

update t1
	set small_vc = upper(small_vc)
where
	mod(id,5) = 0
;

-- check the content of x$bh from another connection

The query I ran under the SYS account was this:

select
	bpd.bp_blksz,
	bpd.bp_name,
	wds.set_id,
	bh.state,
	cur,
	ct
from
	(
	select
		set_ds,
		state,
		bitand(flag, power(2,13))	cur,
		count(*)  ct
	from
		x$bh
	group by
		set_ds,
		state,
		bitand(flag, power(2,13))
	)		bh,
	x$kcbwds	wds,
	x$kcbwbpd	bpd
where
	wds.addr = bh.set_ds
and	bpd.bp_lo_sid <= wds.set_id
and	bpd.bp_hi_sid >= wds.set_id
and	bpd.bp_size != 0
order by
	bpd.bp_blksz,
	bpd.bp_name,
	wds.set_id,
	bh.state,
	bh.cur
;

In my test case this produced two sets of figures, one for the DEFAULT cache, and one for the KEEP cache but I’ve only copied out the results from the KEEP cache, first after the initial tablescan, then after the update that affected 380 blocks:


  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     0     0         95
                                  1     0       1901
           ******* **********             ----------
           sum                                  1996

  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     1     0       1462
                                  1  8192        380
                                  3     0        323
           ******* **********             ----------
           sum                                  1996

In the first output you see the 1901 buffers holding blocks from the table (1,900 data plus one segment header), with the remaining 95 buffers still “free” (state 0). The table blocks are all shown as XCUR (state 1, exclusive current)

In the second output you see 380 buffers holding blocks with state ‘XCUR’ with bit 13 of the flag column set, i.e. “gotten in current mode”. These are the 380 blocks that have been updated – but there are also 323 blocks shown as CR (state 3, “only valid for consistent read”). A detailed check of the file# and dbablk for these buffers shows that they are clones of (most of) the 380 blocks in the XCUR buffers.

Do a bit of arithmetic – we have 1462 blocks left from the original tablescan, plus 380 blocks in CUR mode (of which there are 323 clones) for a total of 1,842 blocks – which means that 59 blocks from the table are no longer in the cache. As we clone blocks we can lose some of the blocks we want to KEEP.

Unfortunately for us, Oracle has not given any preferential treatment to buffers which hold blocks in the XCUR state – any buffer which reaches the end of LRU chain and hasn’t been accessed since it was first loaded will be dumped so that the buffer can be used to create a clone (but see footnote). This means that a constant stream of inserts, updates, deletes, and queries could result in lots of clones being created in your KEEP cache, pushing out the data you want to keep.

If you want to size your KEEP cache to minimise this effect, you probably need to start by making it somewhat larger than the objects it is supposed to KEEP, and then checking to see how many clones you have in the cache – because that will give you an idea of how many extra buffers you need to stop the clones from pushing out the important data.

Upgrades:

When I wrote and ran the test cases in this note the client was running Oracle 10.2 – while writing up my notes I happened to run the test on 11.1.0.6 (still using freelists rather than ASSM) and got the following output from my scan of the KEEP cache:

  BP_BLKSZ BP_NAME     SET_ID STATE   CUR         CT
---------- ------- ---------- ----- ----- ----------
      8192 KEEP             1     1     0       1901
                                  3     0         91
           ******* **********             ----------
           sum                                  1992

Apart from the fact that you get slightly fewer buffers per granule in 11g (the x$bh structure has become slightly larger – and x$bh is a segmented array where each segment shares the granule with the buffers it points to) you can see that we only have 91 clones in the KEEP cache, and apparently we’ve managed to update our 380 blocks without changing their flag to “gotten in current mode”. Doing an update is, of course, just one way of making clones appear – but perhaps 11g will generally have more success in keeping current versions of blocks in memory than earlier versions.

There is , unfortunately, a very special feature to this test case – it’s using a single tablescan to update the table. So having said in part 1 that I was going to write a two-part article, I’ve got to this point, done a few more tests, and decided I need to write part three as well. ┬áStay tuned.

Footnote: Technically there are a couple of circumstances where Oracle will bypass the buffer and walk along the LRU chain looking for another block – but I’ve excluded them from this demonstration.

2 Comments »

  1. The gold rule in Oracle has always been to not set a parameter if you don’t know the benefit. But up to now I never found anyway to assess the benefit of setting a keep cache. There are plenty queries to be found on the net which list the tables to set in ‘cache’ attribute or you die on the spot.

    Nevertheless, I notice in all OLTP I ever did looked into, that small and medium tables are still doing gets and no physical reads, and if there are physical on these objects, increasing the buffer_cache solves the issue. In respect of the big tables, they are too big to fit anyway. So I am reading you articles with interest but do not find any clue as of why I should set a feature that up to now, I could do without.

    My feeling is that the KEEP is like the FIRST_ROW : something Oracle did, but in fact other parts of the engine, circumvent the needs. I recon, I might be wrong, but up to know, the need for a KEEP cache never come up or I was not smart enough to detect its needs.

    Comment by Bernard Polarski — March 17, 2011 @ 9:06 am BST Mar 17,2011 | Reply

    • Bernard,

      I’m strongly inclined to agree with you about “other parts of Oracle circumventing the need”. (The cynic might, at this point, check the TPC-C disclosures to see which bit of the TPC-C data set Oracle put into the KEEP cache – and ask why.)

      There was an interesting defect with “smallish” tablescans in 8i and 9i that made it worth KEEPing tables as a short-term damage limitation option – but apart from that I don’t think there will be many people who really see a benefit from KEEP.

      The strategy I use is basically: check v$segstat (Segments by … in the AWR/Statspack) and IF I find a segment which is responsible for a large fraction of the physical I/O, and IF I can work out that this isn’t just some inefficient code or poor indexing, and IF a very large fraction of the object is already cached, and IF I can’t see an obvious threat to the rest of the data set by taking away some of the DEFAULT cache, then I will suggest a temporary switch of cache from DEFAULT to KEEP – and check what this does to the physical I/O of that segment and the general level of physical I/O on the whole system.

      Comment by Jonathan Lewis — March 20, 2011 @ 9:50 am BST Mar 20,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,454 other followers