Oracle Scratchpad

February 2, 2017

Upgrades

Filed under: Uncategorized — Jonathan Lewis @ 1:38 pm GMT Feb 2,2017

This is a note I wrote a couple of years ago, but never published. Given the way it’s written I think it may have been the outline notes for a presentation that I was thinking about rather than an attempt to write a little essay. Since it covers a number of points that are worth considering and since I’ve just rediscovered it by accident I thought I’d publish it pretty much as is. Many of the examples of change are now quite old – the intent was to demonstrate how to be cautious rather than trying to supply every possible change that might your next upgrade.

We start with a couple of

  • The effort worth spending to minimise the risk of performance-related surprises on an upgrade depends on the how critical the system is.
  • The effort needed to minimise the risk of performance-related surprises on an upgrade depends on how complex the system is.
  • The more “rare” features and “cunning strategies” and the greater the level of concurrent activity the more likely you are to find surprising effects.

Typical causes of problems are:

  • New automatic jobs installed by Oracle, which might cause extra load during overnight batch tasks
    • e.g. automatic stats collection (10g)
    • Automatic evolution of baselines (12c)
  • Changes to existing packaged procedures
    • e.g. switch to atomic refresh of MVs (11g)
      • changed the time take to do the refresh itself and added a new load to the redo log activity
  • automatic histograms (10g)
    • changed the time taken to collect stats
    • changed lots of execution plans as a side effect
    • changed the behaviour of cursor_sharing=similar as a side effect,
      • which increased CPU and library cache loading
  • “Notable changes” in behaviour
    • e.g. The standard audit trail no longer updates aud$ rows, it always inserts new records
      • This had side effects on redo generation
      • This also allowed Oracle to drop an index, with side effects on typical DBA queries
    • Oracle also introduced a timestamp with time zone as the “audit timestamp”
      • This had a catastrophic side effects on a particular OEM “failed logins” query
  • New optimizer  features appear
    • This can introduce lots of new plans – and a few may perform badly for your data set
  • Optimizer bugs will be fixed
    • This can also introduce new plans – and a few may perform badly for your data set
  • New stats collection mechanisms
    • e.g. approximate_ndv with auto_sample_size (11g)
    • If introduced by default you could get unlucky
      • (See histogram comment above, for example).
    • If not introduced by default (now) you might want to test and adopt them on the upgrade
    • This may require changes to your current coding, and checks for bad changes in plans.
  • Concurrency issues
    • Could introduce side effects in locking, latching, mutexes on busy systems
    • Very hard to test without knowing the system
    • e.g. partition exchange and query optimisation colliding (various changes in 11g)

Background reading which can prompt you to look at particular features of your application that might be at risk.

  • Read the “New Features” guide
  • Read the “Changes in this release” section of the various reference guides
  • Check MoS for upgrade notes for the version you are aiming at
  • Search MoS for any documents with the text: “Notable changes {in|of} behaviour”
  • Check MoS for bug fix notes for later releases (e.g. 12.1 if upgrading to 11.2)

Test as much as possible, and as realistically as possible. You could take advantage of flashback database to repeat large test suites starting from the same point in time with the same data set, but making changes to database structures or code strategies between tests. Proper concurrency tests are ultimately going to be the hardest things to do right – but you have to work on it (or understand Oracle’s architecture really well) if you are upgrading a highly concurrent system.

 

October 31, 2016

So Long ACED

Filed under: Uncategorized — Jonathan Lewis @ 7:53 pm GMT Oct 31,2016

… and thanks for all the fish.

Today I removed myself from the OTN ACE program. This isn’t a reflection on the anything to do with the ACE program – quite the reverse, in fact – it’s because they’re introducing steps to ensure that the ACE Directors can justify their titles. Unfortunately, as anyone who has gone through (e.g.) ISO 9001 certification can tell you, quality assurance tends to translate into paperwork and ticking boxes – and while I can always find time to write up some interesting feature of Oracle I really find it hard to prioritise time for filling in forms.

In the last 4 months I’ve failed to file my monthly list of relevant activities twice, failed to request funding for two of the international conferences I’ve spoken at, and failed to submit claims against the two for which I had requested and received funding approval – so there really was no hope of me being motivated to collect all the extra details that the new regime requires.

So, best wishes to the ACE program – I’m still happy to do what I’ve been doing for the last 25+ years, and I’ll still be meeting up with ACEDs, but I’ll just be wearing one label less as I do it.

September 20, 2016

Why Bother

Filed under: Uncategorized — Jonathan Lewis @ 11:16 am GMT Sep 20,2016

This note comes to you prompted by “Noons” in a recent twitter exchange

In response to a complaint by Lukas Eder about having to educate people in 2016 that there is no (performance) difference between count(*) and count(1), Nuno  asked me to blog about my claim that this non-difference is a good educational example on at least three different counts.

One thing I won’t count in the list of three is an actual demonstration of the non-difference. I would be surprised if anyone reading this blog needed a proof of the point, but if you’re interested here are a few examples from my past writings – including a couple where the problem becomes subtler and the “non-difference” stops happening:

You’ll notice that the earliest note I’ve published on the blog dates back 8 years or more; and one of the notes references back to a FAQ dated 2001 published on my old website. So why bother making a fuss about this non-difference when it ought to be common knowledge and really isn’t worth the effort.

  • There is a huge amount of garbage on the internet about how Oracle works and what you can to do make it work better, so when you demonstrate to someone that claims can be backed up by proof and an idea that they’ve picked up from the internet is wrong it might make them pause for thought the next time they read some unjustified claim that really matters.
  • This is a delightfully simple example to use to demonstrate how you can get a little extra information from an execution plan that could help you understand the cause of a performance problem.  (I could make the same comment about being able to get something helpful a 10053 trace file, but I won’t claim that as reason number 3)
  • Starting even with this extremely simple example that proves one point we can see what happens as we push the boundaries of what we know. Thinking about this problem teaches us how to come up with new questions and modified examples which enhance our understanding of the product.

Frankly I find it extremely tedious how often I have to explain mechanisms that I first described in Practical Oracle 8i – it can be teribly depressing to see how ignorant people can be – but there are always new people starting their careers in Oracle, the manuals they look to are more concerned with describing how to use GUIs than with explaining the technology, and the old garbage on the internet won’t go away. New arrivals have little previous experience for telling the difference between fact and fiction until someone provides the demonstration – so we keep repeating the simple examples.

My favourite quote about the internet came by way of Terry Pratchett (in “The Truth”): A lie can run round the world before the truth has got its boots on.”

You just have to keep hammering away at the truth.

 

November 25, 2015

Tech 15

Filed under: Uncategorized — Jonathan Lewis @ 1:00 pm GMT Nov 25,2015

Updated 27th Nov 2015:

Thanks for all the questions so far – there are plenty of useful topics coming out. At this rate I may have to pass some of these on to the round-table that Tony Hasler.

I’ve made a couple of comments in response, but generally I plan to avoid making any comments until after the panel.

Monday 7th Dec: 11:20 – 12:05

I’ve arranged a panel session on the Cost Based Optimizer for UKOUG Tech 2015, with Maria Colgan, Nigel Bayliss, and Chris Antognini joining me to answer (and maybe argue about) questions from the audience.

To keep things moving along, we aim to have a few questions available before the day and collect a few questions at the door as well as accepting questions from the floor. Martin Widlake and Neil Chandler will be acting as MCs, wielding the microphones, and collecting questions at the door.

So far we have a handful of questions – if you have a question you’d like to raise, add it to the comments below. Even if you can’t attend we may get to your question, and I may be able to write up a summary of questions and answers after the event.

Questions so far:

  • Why do queries that have been working fine for months suddenly change to a very slow plan ?
  • Are smaller, simpler queries less likely to have a bad plan or change from a good plan to a bad one ?
  • What is the down-side of using SQL Profiles to lock in plans ?
  • What’s the best way to collect stats on a partitioned table where you add a daily / weekly / monthly partition ?
  • Is there a good way to make sure you’ve collected stats for interval partitions when you don’t know they’ve been added ?
  • Why do the manuals talk about “prefixed” local indexes (on partitioned tables) as if they have some special effect ?
  • Why does a query with literals run faster than one with bind variables? Stats are up to date on all tables concerned.
  • For a single table an SQL Plan directive will create extended statistics, but how does the Optimizer resolve SPD for the group by or join queries ?
  • I’ve heard that for the group by and join queries, SPD always forces dynamic sampling, is this true ?
  • Will SPD with dynamic sampling take precedence over e.g. columns with histograms?
  • What is the order of precedence the optimizer uses when resolving table cardinality?
  • Will 12.2 allow us to combine column groups and expression as a single extended statistic?

Powered by WordPress.com.