Oracle Scratchpad

September 26, 2007

Hit Ratios (4)

Filed under: Ratios,Troubleshooting — Jonathan Lewis @ 9:23 pm BST Sep 26,2007

From the Oracle-L mailing list in the last 48 hours:

I am working on tuning an app running against oracle 10.2.0.3 We have 48G on the server; my db_cache is 18G. When I look at the awr reports, I see db hit ratio being over 99% and a lot of waits for db sequential reads. Based on the SQL there are a lot of table reads based on the primary keys so that kind of waits is reasonable. But the question is if the hit ratio is that high , if we read mostly for the cache, why do we do that many reads. Is there an explanation for that?

To me this question sounds a bit like a joke in a geek’s comedy show. It’s very close to saying: “how can we be doing this much physical I/O, when the data buffer buffer hit ratio is so good”. (Or, as the economists’ joke goes: “I know it works in practice, but does it work in theory?”)

The answer is easy – all you need is 100 gets from the cache for every disk read and you’ve got a buffer cache hit ratio (BCHR) of 99%.  With a buffer cache at 18G it shouldn’t be too hard to find a few tables (such as small reference tables) that are always cached; and, given the size of the machine, it sounds as if you’re talking about hardware that’s been selected to service a “large application” that could easily have one or two tables that are much too big to experience any beneficial caching at all.

Assume you have just one very big table (a customer table, perhaps with 160 million rows) that is so large that its primary key index has a height of four but still fits comfortably in 18GB. Based on the example shown in this blog  note, 6GB of memory would be adequate for such a index if it had a 16 byte key and the index was operating at typical (70%) efficiency.

When you access this table randomly by primary key you probably end up close to the perfect pattern of “buffer gets on the index and one physical on the table” – and the read on the table also re-appears as a fifth buffer get – for a data buffer hit ratio of 80%. (You might even have assigned this table to the recycle pool to make sure that it didn’t get cached.)

From this base point, you don’t have to do a lot of work on “normal” size tables to make that “necessary” buffer cache hit ratio look a lot “healthier”.

In the telecoms industry, for example, if each customer access is followed by a few queries like “select the rate for the tariff which was the current tariff as at the customer’s last tariff change date” then those queries (small, multitable with subquery, perhaps)  could easily clock up a few dozen logical I/Os each. Say you perform 10 million customer operations per day, that’s 10 million physical reads which need to be matched by 1 billion buffer gets – my laptop can do that in about 66 minutes on just one of its CPUs.

If you are worried by the high volume of physical reads then go and look for the largest sources of the physical reads. (Statspack/AWR: “SQL ordered by Reads”)

If you are worried by the fact that the data buffer hit ratio is high when you have a necessarily high level of physical reads then go and look for the largest sources of logical I/Os. (Statspack.AWR: SQL ordered by Gets”)

So Oracle’s buffer cache hit ratio (BCHR) is irrelevant because (a) you’re worried about the scale of the physical I/O, or (b) you’re worried about the logical I/O because you know the hit ratio is “wrong”

So why are you wasting time looking at the hit ratio ?

[Further reading on ratios]

3 Comments »

  1. Well I think there are more “errors” in the above post.
    The OP doesn’t mention what’s to slow and what will be “fast enough”.
    My first approch would be to see a statspack report for about 15-20 minutes when the system is “too slow” and simply look at the top 5 sqls and top5 wait events (personally I perefere a statspack report over a awr report).
    As the op said, that most of the reads are done with PK access, the reason might be slow disks or a bad configured san switch.
    A simple 10046 trace would give as more information in the db file sequential read wait event.

    Comment by Wolfgang — September 30, 2007 @ 9:34 am BST Sep 30,2007 | Reply

  2. Wolfgang, The original comment wasn’t about performance, or how to do performance analysis.

    The OP simply wanted to understand why he could have a lot of db file sequential reads when his cache hit ratio was so “good” – and that’s what I wanted to explain.

    Comment by Jonathan Lewis — October 1, 2007 @ 10:35 am BST Oct 1,2007 | Reply

  3. [...] case is shown below. (It’s based on a simple script I wrote many years ago to demonstrate how pointless it was to depend on the buffer cache hit ratio as a source of meanigful information – subsequently hi-jacked by Connor McDonald for his [...]

    Pingback by CPU usage « Oracle Scratchpad — May 14, 2008 @ 6:30 am BST May 14,2008 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,014 other followers