From the Oracle-L mailing list in the last 48 hours:
I am working on tuning an app running against oracle 10.2.0.3 We have 48G on the server; my db_cache is 18G. When I look at the awr reports, I see db hit ratio being over 99% and a lot of waits for db sequential reads. Based on the SQL there are a lot of table reads based on the primary keys so that kind of waits is reasonable. But the question is if the hit ratio is that high , if we read mostly for the cache, why do we do that many reads. Is there an explanation for that?
To me this question sounds a bit like a joke in a geek’s comedy show. It’s very close to saying: “how can we be doing this much physical I/O, when the data buffer buffer hit ratio is so good”. (Or, as the economists’ joke goes: “I know it works in practice, but does it work in theory?”)
The answer is easy – all you need is 100 gets from the cache for every disk read and you’ve got a buffer cache hit ratio (BCHR) of 99%. With a buffer cache at 18G it shouldn’t be too hard to find a few tables (such as small reference tables) that are always cached; and, given the size of the machine, it sounds as if you’re talking about hardware that’s been selected to service a “large application” that could easily have one or two tables that are much too big to experience any beneficial caching at all.
Assume you have just one very big table (a customer table, perhaps with 160 million rows) that is so large that its primary key index has a height of four but still fits comfortably in 18GB. Based on the example shown in this blog note, 6GB of memory would be adequate for such a index if it had a 16 byte key and the index was operating at typical (70%) efficiency.
When you access this table randomly by primary key you probably end up close to the perfect pattern of “buffer gets on the index and one physical on the table” – and the read on the table also re-appears as a fifth buffer get – for a data buffer hit ratio of 80%. (You might even have assigned this table to the recycle pool to make sure that it didn’t get cached.)
From this base point, you don’t have to do a lot of work on “normal” size tables to make that “necessary” buffer cache hit ratio look a lot “healthier”.
In the telecoms industry, for example, if each customer access is followed by a few queries like “select the rate for the tariff which was the current tariff as at the customer’s last tariff change date” then those queries (small, multitable with subquery, perhaps) could easily clock up a few dozen logical I/Os each. Say you perform 10 million customer operations per day, that’s 10 million physical reads which need to be matched by 1 billion buffer gets – my laptop can do that in about 66 minutes on just one of its CPUs.
If you are worried by the high volume of physical reads then go and look for the largest sources of the physical reads. (Statspack/AWR: “SQL ordered by Reads”)
If you are worried by the fact that the data buffer hit ratio is high when you have a necessarily high level of physical reads then go and look for the largest sources of logical I/Os. (Statspack.AWR: SQL ordered by Gets”)
So Oracle’s buffer cache hit ratio (BCHR) is irrelevant because (a) you’re worried about the scale of the physical I/O, or (b) you’re worried about the logical I/O because you know the hit ratio is “wrong”.
So why are you wasting time looking at the hit ratio ?