Oracle Scratchpad

March 14, 2011

Buffer States

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 6:22 pm GMT Mar 14,2011

Here’s a bit of geek stuff that I’ve been meaning to write up for nearly a year – to the day, more or less – and I’ve finally been prompted to finish the job off by the re-appearance on the OTN database forum of the standard “keep cache” question:

    Why isn’t Oracle keeping an object “properly” when it’s smaller than the db_keep_cache_size and it has been assigned to the buffer_pool keep ?

This is a two-part note – and in the first part I’m just going to run a query and talk about the results. The query is one that has to be run by SYS because it references a couple of x$ structures, and this particular version of the query was engineered specifically for a particular client.

select
        obj,
        state,
        bitand(bh.flag,power(2,13))     cur,
        count(*)        ct
from
        x$bh            bh,
        x$kcbwds        wds
where
        wds.addr = bh.set_ds
and     wds.set_id  between 1 and 24
group by
        obj,
        state,
        bitand(bh.flag,power(2,13))
order by
        obj,
        state,
        bitand(bh.flag,power(2,13))
;

You’ll notice I’m joining x$bh (the “buffer header” array) to x$kcbwds (the “working data set” array) where I’ve picked sets 1 to 24. On this particular system these were the sets for the KEEP cache. (If you want a generic query to isolate a particular cache then there’s an example here that identifies the RECYCLE cache by reference – but I wanted the query in this note to run as efficiently as possible against this production system, so I did a preliminary lookup against x$kcbwbpd and then used the literal set ids).

 

Here are a few lines from the resulting output:


       OBJ      STATE        CUR         CT
---------- ---------- ---------- ----------
     40158          1       8192          1

     40189          1          0      87233
                            8192     272789

                    3          0      69804
                            8192     393868

     40192          1          0         87
                            8192      12197

                    3          0      30763
                            8192       1994

...

    117291          1          0        498
                            8192       4419

                    3          0       3001
                            8192         15

    117294          1          0        243
                            8192       3544

                    3          0       1245
                            8192         23

4294967294          3          0          2
**********                       ----------
sum                                 1216072

Since we’re looking at x$ structures – which rarely have any official documentation – the rest of this note isn’t guaranteed to be correct – and things do change with version so I need to stress that this specific example comes from 9.2.0.6. This is what I think the results show:

The state column is instance-related and is essentially something that’s useful in a RAC enviroment. State 1 translates to ‘XCUR’ (exclusive current) which means that this instance has exclusive rights to the most recent version of the block; state 3 translates to ‘CR’ (only valid for consistent read).

Bit 13 of the flag column is set if the buffer has been “gotten in current mode”. (If you’re interested in the other bits there’s a page on my old website that might keep you entertained for a while – I haven’t yet updated it to 11g, though.)

The problem for the client was this – the total size of all the data segments in the KEEP cache was about 6GB and the total size of the KEEP cache was about 10GB, yet the database was still reporting a constant trickle of physical reads to the objects and, when the code to “re-load” the cache  was executed at a quiet period at the start of the day some 60,000 physical blocks had to be  read. With a 10GB cache for 6GB of data would you really expect to see this I/O ?

Take a look at the figures for object 40189:

There are 272,789 buffers for blocks that were “gotten in current mode” (bit 13 is set) and are also “exclusive current” (state 1) to the instance, but there are also 393,868 buffers that were originally “gotten in current mode” but are now “only valid for consistent read”.

Similarly there are 87,233 buffers for blocks that weren’t “gotten in current mode” but are “exclusive current” to the instance – in other words they are the most up to date version of the block but weren’t fetched with a “db block get”, and again there are 69,804 buffers holding blocks that were not “gotten in current mode” but which are now “only valid for consistent read”.

Buffers that are “only valid for consistent read” are buffers holding blocks that have been generated through one of Oracle’s mechanisms for creating CR (consistent read) clones. As you can see, then, a block that is a CR clone may still be flagged as “gotten in current mode”. In fact, in line with Oracle’s generally “lazy” approach to work you can even find (in some versions of Oracle, at least) CR clones that still have the “dirty” bit set in the flag, even though CR clones can never really be dirty and are NEVER written to disc.

Take another look at the buffer counts – this KEEP cache is sized at 1.2M buffers (10GB), but object 40189 alone has taken out 460,000 of those buffers (3.6GB) in block clones, and for this object there are more clones than originals (at 360,000, which happens to be just a few thousand blocks less than the size of the table). So, when you’re thinking about creating a KEEP cache, remember that you have to allow for block cloning – simply setting the db_keep_cache_size to something “a bit bigger” than the object you want to keep cached may not even be close to adequate.

Part 2 to follow soon.

9 Comments »

  1. That’s an interesting post. My question would be, therefore, in order to adequately size a keep cache, can you accurately predict the amount of block cloning that an object may be subject to? Is this a concurrency issue?

    Comment by Tony Sleight — March 15, 2011 @ 7:24 am GMT Mar 15,2011 | Reply

    • Tony,

      It’s probably very hard to predict. It is dependent to some extent on concurrency, but also on the nature of data usage. If, for example, it’s a product table with constant lookups, but most of the lookups are for a small fraction of the products the most of the table might end up with two copies of each block while a small part of the table might end up with six.

      On the other hand, if there’s a lot of update going on all over the table then nearly every block may end up with six copies (although that varies with version of Oracle and the way in which the update was applied.)

      I think it’s probably one of these things where you make a reasonable guess and then check if you’re in the right ball-park.

      Comment by Jonathan Lewis — March 15, 2011 @ 10:23 pm GMT Mar 15,2011 | Reply

      • “If, for example, it’s a product table with constant lookups, but most of the lookups are for a small fraction of the products the most of the table might end up with two copies of each block while a small part of the table might end up with six.”

        Question – if the product table is purely a lookup table (meaning no – or very rare updates) – then it would not have any CR blocks. If so, why would most of the table end up with 2 copies of each block?

        Comment by Naresh Bhandare — March 16, 2011 @ 5:22 am GMT Mar 16,2011 | Reply

        • Naresh,

          I think you’re probably right. I was trying to come up with something that just didn’t look like a small read-only look-up table (which probably wouldn’t need KEEP pool assignment anyway) and didn’t think the details through carefully. The table would also have to be subject to different update frequency as well.

          Comment by Jonathan Lewis — March 20, 2011 @ 9:38 am GMT Mar 20,2011

      • Thanks for the response, we were contemplating using the keep cache to hold a table, plus a couple of indexes to try and ensure these commonly accessed structures were always in memory. It seems from your research that the indexes in particular would be subject to a great deal of cloning and will require up to, say, twice the amount of space the object occupies currently. I shall have to run a test to identify how many block clones are likely to be produced accessing the indexes.

        Comment by Tony Sleight — March 17, 2011 @ 7:12 am GMT Mar 17,2011 | Reply

  2. […] is part 2 of an article on the KEEP cache. If you haven’t got here from part one you should read that first for an explanation of the STATE and CUR columns of the […]

    Pingback by buffer flush « Oracle Scratchpad — March 16, 2011 @ 8:31 pm GMT Mar 16,2011 | Reply

  3. […] up to six (and potentially many more) consistent read versions of the objects’ blocks (reference)?  The last two paragraphs of this recipe seem to indicate that it is the segments themselves […]

    Pingback by Reading Material On Order 2 « Charles Hooper's Oracle Notes — May 29, 2011 @ 10:44 pm BST May 29,2011 | Reply

  4. […] * The “clone” copies require each a buffer from the cache effectively reducing the number of different blocks that can be held in the buffer cache. They are also the reason why an object might require much more cache than its original segment size in order to stay completely in the cache. […]

    Pingback by Logical I/O – Evolution: Part 1 « Ukrainian Oracle User Group — July 8, 2011 @ 8:58 am BST Jul 8,2011 | Reply

  5. […] creates up to six (and potentially many more) consistent read versions of the objects’ blocks (reference)?  The last two paragraphs of this recipe seem to indicate that it is the segments themselves […]

    Pingback by Book Review: Oracle Database 11g Performance Tuning Recipes « Charles Hooper's Oracle Notes — September 10, 2011 @ 9:13 pm BST Sep 10,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Reading Material On Order 2 « Charles Hooper's Oracle Notes Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.