Oracle Scratchpad

September 26, 2011

Upgrade Argh

Filed under: Infrastructure,Oracle,Statspack,Troubleshooting,Upgrades — Jonathan Lewis @ 4:30 pm GMT Sep 26,2011

Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running 11.2.0.2

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
direct path read                  3,464,056       6,593      2   33.5 User I/O
DB CPU                                            3,503          17.8
db flash cache single block ph    2,293,604       3,008      1   15.3 User I/O
db file sequential read             200,779       2,294     11   11.6 User I/O
enq: TC - contention                     82       1,571  19158    8.0 Other


This wasn’t happening when the system was running 10.2, and there are no objects in the database declared parallel, and no parallel hinted code. Fortunately, enough people have written about serial  direct path reads in 11g to give us a clue about what’s going on, so let’s take a look at the “Segments by” sections to see which objects are the target of most I/O . I’ve printed up the sections on Direct Physical Reads (which reports number of blocks read) and Unoptimized Read Requests (and unoptimized means it didn’t go to the flash – aka L2 – cache):


Segments by Direct Physical Reads   DB/Inst: xxxxxxx/xxxxxxx  Snaps: 1740-1741
-> Total Direct Physical Reads:      18,880,742
-> Captured Segments account for   94.1% of Total

           Tablespace                      Subobject  Obj.        Direct
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
ABCUSER    ABC_SMALL_ ACCESS_CONTROL_ALLOW            TABLE    8,657,735   45.85
ABCUSER    ABC_SMALL_ ................                TABLE    4,299,122   22.77
ABCUSER    ABC_XX     ................                TABLE    2,425,772   12.85
ABCUSER    ABC_SMALL_ ACCESS_CONTROL_DENY             TABLE    1,731,317    9.17
ABCUSER    ABC_XX     ................                TABLE      593,083    3.14
          -------------------------------------------------------------
Segments by UnOptimized Reads       DB/Inst: xxxxxxx/xxxxxxx  Snaps: 1740-1741
-> Total UnOptimized Read Requests:       3,830,498
-> Captured Segments account for   97.1% of Total

           Tablespace                      Subobject  Obj.   UnOptimized
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
ABCUSER    ABC_SMALL_ ACCESS_CONTROL_ALLOW            TABLE    1,731,497   45.20
ABCUSER    ABC_SMALL_ ACCESS_CONTROL_DENY             TABLE    1,731,418   45.20
ABCUSER    ABC_XX     ...............                 TABLE      127,933    3.34
ABCUSER    ABC_SMALL_ ...............                 TABLE       33,886     .88
ABCUSER    ABC_XX     ...............                 TABLE       18,357     .48
          -------------------------------------------------------------

Now, if you’re read the articles you know that Oracle can choose to do direct path serial reads for tablescans on tables that are “sufficiently large”; but before we check the table sizes in the data dictionary we can look at these two sets of figures and notice that we have done 1.7M read requests tables access_control_deny and access_control_allow, resulting in 1.7M blocks reads from one table and 8.7M blocks read from the other. These tables are one and five blocks long respectively – and that’s not “sufficiently large”.

Having identified a couple of interesting objects I used dbms_metadata.get_ddl()  to get the object definitions from the database – and found that they had both been assigned to the KEEP buffer pool – and there was the upgrade problem. The DBA had decided to trust Oracle to do the right thing after the upgrade, so he had set the memory_target and eliminated all the other memory parameters, including the db_keep_cache_size. In earlier versions of Oracle if you associate an object with the keep cache and there is no keep cache the object is cached in the default pool. In 11g in the same circumstances Oracle doesn’t try to cache the object when doing segment scans!

A quick check on the other objects subject to lots of direct path reads showed that they were all assigned to the keep cache, so I wrote a PL/SQL block to run through all such objects and set their storage back to the default cache. The following day over the same period we recorded 6,800 blocks read by direct path reads, and the db file sequential read average time had dropped from 11 ms to 2 ms (and the general system response time has improved noticeably).

Sound-bite section: If you have any objects associated with the keep cache, make sure you either create a keep cache on upgrade, or re-assign the objects to the default cache.

5 Comments »

  1. That´s a good one. The “Top N” thought is to exclaim why one trusts automated things in the first Oracle release ?
    But the next thought is: why were parameters changed during upgrade in the first place ? Just to give a try ?
    Discarding the established methods results in well hidden issues which require ingenious minds like that of Jonathan. Which is good for him and for other DBA’s but really shatters confidence in Oracle…

    Comment by laimis (lnd) — September 27, 2011 @ 6:42 am GMT Sep 27,2011 | Reply

    • We have to change parameters some time – otherwise we don’t get all the “new, improved” bits.
      The problems appear when your pre-launch testing doesn’t happen to pick up the issue.
      This particular extreme case was during a materialized view refresh – and it’s easy for a job like that to not be run through a test suite (think of the number of times OTN has seen requests from people who discovered too late the side effects of the ‘atomic’ refresh kicking in on a process that had worked perfectly for years).

      Comment by Jonathan Lewis — September 27, 2011 @ 8:21 am GMT Sep 27,2011 | Reply

  2. Excuse my obtuseness… Oracle did used Serial Direct Path Read for these tables not because they were “sufficiently large” but because they were “not to be cached” ? Would this behaviour be similar if we set “NOCACHE” for these tables ?

    Comment by Hemant K Chitale — September 27, 2011 @ 8:00 am GMT Sep 27,2011 | Reply

    • It seems as if Oracle used serial direct path read because the tables were assigned to a non-existent keep cache.
      I didn’t check to see what would happen if they were defined as NOCACHE (or if they were assigned to a non-existent recycle cache).
      I’ve got my laptop with me and a 13 hour journey ahead of me, so I might check that out – but I’m sure there are plenty of readers who will beat me to it before I land.

      Comment by Jonathan Lewis — September 27, 2011 @ 8:18 am GMT Sep 27,2011 | Reply

      • Just landed and got to the hotel.
        Strangely the problem doesn’t apply to the recycle cache, or if you use the default cache with the nocache option.

        Comment by Jonathan Lewis — September 27, 2011 @ 10:53 pm GMT Sep 27,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,429 other followers