Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running 11.2.0.2
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- direct path read 3,464,056 6,593 2 33.5 User I/O DB CPU 3,503 17.8 db flash cache single block ph 2,293,604 3,008 1 15.3 User I/O db file sequential read 200,779 2,294 11 11.6 User I/O enq: TC - contention 82 1,571 19158 8.0 Other
The huge number of direct path reads wasn’t occurring when the system was running 10.2 and there are no objects in the database declared parallel and no parallel hinted code – so the owner of this system couldn’t understand why it was spending so much time on direct path reads after the upgrade. Enough people, fortunately, have written about serial direct path reads (that’s 4 separate links) in 11g to give us a clue about what’s going on; so let’s take a look at the “Segments by” sections to see which objects are the target of most I/O.
I’ve printed up the sections on Direct Physical Reads (which reports number of blocks read) and Unoptimized Read Requests (since the Top N report shows a large volume of “db flash cache single block physical read” waits and “unoptimized” means it didn’t go to the flash – a.k.a. L2 – cache):
Segments by Direct Physical Reads DB/Inst: xxxxxxx/xxxxxxx Snaps: 1740-1741 -> Total Direct Physical Reads: 18,880,742 -> Captured Segments account for 94.1% of Total Tablespace Subobject Obj. Direct Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- ABCUSER ABC_SMALL_ ACCESS_CONTROL_ALLOW TABLE 8,657,735 45.85 ABCUSER ABC_SMALL_ ................ TABLE 4,299,122 22.77 ABCUSER ABC_XX ................ TABLE 2,425,772 12.85 ABCUSER ABC_SMALL_ ACCESS_CONTROL_DENY TABLE 1,731,317 9.17 ABCUSER ABC_XX ................ TABLE 593,083 3.14 ------------------------------------------------------------- Segments by UnOptimized Reads DB/Inst: xxxxxxx/xxxxxxx Snaps: 1740-1741 -> Total UnOptimized Read Requests: 3,830,498 -> Captured Segments account for 97.1% of Total Tablespace Subobject Obj. UnOptimized Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- ABCUSER ABC_SMALL_ ACCESS_CONTROL_ALLOW TABLE 1,731,497 45.20 ABCUSER ABC_SMALL_ ACCESS_CONTROL_DENY TABLE 1,731,418 45.20 ABCUSER ABC_XX ............... TABLE 127,933 3.34 ABCUSER ABC_SMALL_ ............... TABLE 33,886 .88 ABCUSER ABC_XX ............... TABLE 18,357 .48 -------------------------------------------------------------
If you’ve read the articles I linked to you know that Oracle can choose to do direct path serial reads for tablescans on tables that are “sufficiently large”; but before we check the table sizes in the data dictionary we can look at these two sets of figures and notice that we have done 1.7M read requests to tables access_control_deny and access_control_allow, resulting in 1.7M blocks reads from one table and 8.7M blocks read from the other. These tables are one and five blocks long respectively – and that’s not “sufficiently large” (anyway, would you expect a large table to be in a tablespace called ABC_SMALL).
Having identified a couple of interesting objects I used dbms_metadata.get_ddl() to report the object definitions from the database and found that they had both been assigned to the KEEP buffer pool – and that was the upgrade problem. The DBA had decided to trust Oracle to do the right thing after the upgrade so he had set the memory_target and eliminated all the other memory parameters including the db_keep_cache_size. In earlier versions of Oracle if you assigned an object to the keep cache when you don’t have a keep cache defined the object is cached in the default pool. In the client’s version of 11g in the same circumstances Oracle doesn’t try to cache the object when doing segment scans!
A quick check on the other objects subject to lots of direct path reads showed that they were all assigned to the keep cache, so I wrote a little PL/SQL block to run through all such objects and set their storage back to the default cache. The following day over the same period we recorded 6,800 blocks read by direct path reads and the db file sequential read average time had dropped from 11 ms to 2 ms (and the general system response time has improved noticeably).
Sound-bite section: If you have any objects associated with the keep cache, make sure you either create a keep cache on upgrade, or re-assign the objects to the default cache.
That´s a good one. The “Top N” thought is to exclaim why one trusts automated things in the first Oracle release ?
But the next thought is: why were parameters changed during upgrade in the first place ? Just to give a try ?
Discarding the established methods results in well hidden issues which require ingenious minds like that of Jonathan. Which is good for him and for other DBA’s but really shatters confidence in Oracle…
Comment by laimis (lnd) — September 27, 2011 @ 6:42 am BST Sep 27,2011 |
We have to change parameters some time – otherwise we don’t get all the “new, improved” bits.
The problems appear when your pre-launch testing doesn’t happen to pick up the issue.
This particular extreme case was during a materialized view refresh – and it’s easy for a job like that to not be run through a test suite (think of the number of times OTN has seen requests from people who discovered too late the side effects of the ‘atomic’ refresh kicking in on a process that had worked perfectly for years).
Comment by Jonathan Lewis — September 27, 2011 @ 8:21 am BST Sep 27,2011 |
Excuse my obtuseness… Oracle did used Serial Direct Path Read for these tables not because they were “sufficiently large” but because they were “not to be cached” ? Would this behaviour be similar if we set “NOCACHE” for these tables ?
Comment by Hemant K Chitale — September 27, 2011 @ 8:00 am BST Sep 27,2011 |
It seems as if Oracle used serial direct path read because the tables were assigned to a non-existent keep cache.
I didn’t check to see what would happen if they were defined as NOCACHE (or if they were assigned to a non-existent recycle cache).
I’ve got my laptop with me and a 13 hour journey ahead of me, so I might check that out – but I’m sure there are plenty of readers who will beat me to it before I land.
Comment by Jonathan Lewis — September 27, 2011 @ 8:18 am BST Sep 27,2011 |
Just landed and got to the hotel.
Strangely the problem doesn’t apply to the recycle cache, or if you use the default cache with the nocache option.
Comment by Jonathan Lewis — September 27, 2011 @ 10:53 pm BST Sep 27,2011 |