Oracle Scratchpad

February 2, 2017

Upgrades

Filed under: Oracle,Upgrades — Jonathan Lewis @ 1:38 pm GMT Feb 2,2017

This is a note I wrote a couple of years ago, but never published. Given the way it’s written I think it may have been the outline notes for a presentation that I was thinking about rather than an attempt to write a little essay. Since it covers a number of points that are worth considering and since I’ve just rediscovered it by accident I thought I’d publish it pretty much as is. Many of the examples of change are now quite old – the intent was to demonstrate how to be cautious rather than trying to supply every possible change that might impact your next upgrade.

We start with a couple of strategy points:

  • The effort worth spending to minimise the risk of performance-related surprises on an upgrade depends on the how critical the system is.
  • The effort needed to minimise the risk of performance-related surprises on an upgrade depends on how complex the system is.
  • The more “rare” features and “cunning strategies” and the greater the level of concurrent activity the more likely you are to find surprising effects.

Typical causes of problems are:

  • New automatic jobs installed by Oracle, which might cause extra load during overnight batch tasks
    • e.g. automatic stats collection (10g)
    • Automatic evolution of baselines (12c)
    • automatic cleaning of partial global indexes  (12cR2)
  • Changes to existing packaged procedures
    • e.g. switch to atomic refresh of MVs (11g)
      • changed the time take to do the refresh itself and added a new load to the redo log activity
  • automatic histograms (10g)
    • changed the time taken to collect stats
    • changed lots of execution plans as a side effect
    • changed the behaviour of cursor_sharing=similar as a side effect,
      • which increased CPU and library cache loading
  • “Notable changes” in behaviour
    • e.g. The standard audit trail no longer updates existing aud$ rows, it always inserts new records
      • This had side effects on redo generation
      • This also allowed Oracle to drop an index, with side effects on typical DBA queries
    • Oracle also introduced a timestamp with time zone as the “audit timestamp”
      • This had a catastrophic side effects on a particular OEM “failed logins” query
  • New optimizer  features appear
    • This can introduce lots of new plans – and a few may perform badly for your data set
  • Optimizer bugs will be fixed
    • This can also introduce new plans – and a few may perform badly for your data set
  • New stats collection mechanisms
    • e.g. approximate_ndv with auto_sample_size (11g)
    • If introduced by default you could get unlucky
      • (See histogram comment above, for example).
    • If not introduced by default (now) you might want to test and adopt them on the upgrade
    • This may require changes to your current coding, and checks for bad changes in plans.
  • Concurrency issues
    • Could introduce side effects in locking, latching, mutexes on busy systems
    • Very hard to test without knowing the system
    • e.g. partition exchange and query optimisation colliding (various changes in 11g)

Background reading which can prompt you to look at particular features of your application that might be at risk.

  • Read the “New Features” guide
  • Read the “Changes in this release” section of the various reference guides
  • Check MoS for upgrade notes for the version you are aiming at
  • Search MoS for any documents with the text: “Notable changes {in|of} behaviour”
  • Check MoS for bug fix notes for later releases (e.g. 12.1 if upgrading to 11.2)

Test as much as possible, and as realistically as possible. You could take advantage of flashback database to repeat large test suites starting from the same point in time with the same data set, but making changes to database structures or code strategies between tests. Proper concurrency tests are ultimately going to be the hardest things to do right – but you have to work on it (or understand Oracle’s architecture really well) if you are upgrading a highly concurrent system.

Ultimately, though, you probably won’t catch every significant change unless you’re really lucky, or impossibly thorough. Would you, for example, have noticed pre-upgrade that Oracle changed sys_context(‘userenv’,’sessionid’) from zero to 4294967295 for the sys schema and realised the significance this would have on your application ?

8 Comments »

  1. Thank you for sharing.

    Comment by Clifford Mathew — February 3, 2017 @ 12:06 pm GMT Feb 3,2017 | Reply

  2. Another possible source of useful bugs: Check the required database patches for eBusiness Suite. Sometimes the EBS team requires database patches to fix bugs which nevertheless, for some reason, aren’t urgent enough for the database team to include in a patchset or PSU.

    Comment by Jason Bucata — February 4, 2017 @ 2:11 am GMT Feb 4,2017 | Reply

  3. In terms of the optimizer, I like to think of it broadly as two scenarios

    1) you’re happy with your current performance/plans
    => So I’d SPM all of the critical parts of my apps, so things dont change when I upgrade

    2) I’m unhappy with the current performance
    => I’d focus on the new optimizer stuff, and see where it helps me (and also take care where it hinders).

    I mention this because option (1) is often not considered.

    Comment by connormcdonald — February 5, 2017 @ 5:41 am GMT Feb 5,2017 | Reply

    • Connor,

      I’m always a little cautious about (1).

      Once you have an SQL Plan Baseline for a statement Oracle will keep capturing new baselines for it even if “optimizer_capture_sql_plan_baselines” is set to false; and 12c will go into its automatic evolution attempt overnight, so you’ve introduced scope for lots of extra processing that didn’t exist before you populated the SPM. On top of that, of course, (a) very few people know which bits of there application are critical, (b) more commonly it’s the bits you didn’t think about stabilising that break on the upgrade, and (c) there are still cases where Oracle doesn’t reproduce the plan.

      On the plus side, it’s much less painful than following Oracle’s advice in the 8i/9i days of “create stored outlines for all your SQL before upgrading”. Nevertheless I still favour the approach of enabling event 10132 on the production system for a couple of days before the upgrade to capture all the hard-parsed SQL with plans and “outline” section in trace files so that there’s a reference library of SQL and prior plans that can be searched if any statements start using a catastrophic plan. The preparative steps for that involving checking the statistic “parse count (hard)” over a few days to estimate the number of statements and plans that are likely to be dumped and assuming that each one take somewhere between 16KB and 32KB to dump.

      Comment by Jonathan Lewis — February 9, 2017 @ 8:55 am GMT Feb 9,2017 | Reply

  4. what a helpful note , i am gonna share it with my team.

    Comment by himanshudex — February 5, 2017 @ 6:44 pm GMT Feb 5,2017 | Reply

  5. Thank you for the share.

    Comment by fouedgray — February 14, 2017 @ 12:24 pm GMT Feb 14,2017 | Reply

  6. Thank you.

    Comment by fouedgray — June 14, 2017 @ 12:42 pm GMT Jun 14,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.