Oracle Scratchpad

September 26, 2008

Root Cause

Filed under: humour,Infrastructure,Troubleshooting — Jonathan Lewis @ 5:43 pm BST Sep 26,2008

There are a few expressions in the industry that irritate me – not necessarily for good reason but simply because they sound like the extremes of pretentiousness and marketing put together.  (Pretentious, moi !)

For example: “root cause analysis”, “holistic methods”  and so on – it’s all those little ways of saying “we’re doing just the same as everybody else but we’re trying to make it sound as if we’re doing something better.”

When offered fluff like this, I always like to restate it from the opposite perspective to see what impression it makes:

  • “We have a holistic approach to tuning” = “Other people will only look at a little bit of your system”.
  • “We focus on root cause analysis” = “Other people don’t try to find out what the problem really is”.

Put like this, the (lack of) added value and the attempted deception in statements of this type become that little bit clearer.

Nevertheless, I recently came across a wonderful piece of root cause analysis. It went like this:

  • Question: “Why have we started to see such a big increase in log file sync wait time ?”
  • Answer:”It’s the economy, stupid.”

I kid you not, that really was the root cause (although the pejorative is there only because I wanted to use the quotation attributed to Bill Clinton’s 1992 election campaign).

Here’s how it came about.

The housing market is slowing down (it’s the economy) and banks are less keen to offer mortgages (large scale loans based on property) to home-buyers[1]. This means they are also slower to process applications for mortgages.

So there’s this system which deals with mortgage applications that could have one of four possible states: new, processing, granted, rejected.  Suddenly there are a lot more applications stalled in the ‘processing’ state.

Since most of the office work during the day focuses on current (processing) applications most of the SQL run during the day now addresses a larger working data set which means it uses more CPU to run. When the CPU load goes up the time taken for a log file sync round trip can increase – even when the number of log file syncs and the volume of redo log generated doesn’t change.

So – when the housing market slows down, the log file sync waits go up. That’s root cause analysis for you.

Footnote

This is not an accurate description of the system that prompted this little note, nevertheless it is true that changes in the economy (or other external factors) may affect your database performance.

[1] – A recent story in The Times (UK, that is) pointed out that the banks seemed to be competing to see who could offer the worst possible deals; presumably on the basis that they had to offer the mortgages to maintain their presence in the market but didn’t want anyone to take them up.

11 Comments »

  1. Me thinks you are a lover of Fawlty Towers sir? Particularly episode 8 – The Psychiatrist.

    Comment by Jeff Moss — September 26, 2008 @ 8:58 pm BST Sep 26,2008 | Reply

  2. Jonathan, I love this example but you’ve beaten me to my punchline. (well, one of them) In IT, we can’t always fix the root cause: it’s beyond our control, our budget or our time constraints. But the more we know about the cause of the problem, the more likely we can correct it and stop it from reoccurring. Resolving the economic crunch would fix the problem in this example and a lot of others but since that can’t be controlled, understanding the changing load on database helps identify what is necessary to deal with the new conditions, and with that understanding the best option can be chosen.

    Many good dba’s already look for the ‘root cause’ even if they never use the phrase. I’d suggest that they try the terminology out on management; sometimes speaking in their language gets the point across better. (damn, there goes the second punchline .. now what will I talk about?)

    Comment by Robyn — September 27, 2008 @ 3:50 am BST Sep 27,2008 | Reply

  3. Jeff,
    I’d forgotten where the line (Pretentious, moi!) came from: but it is one of the wittiest lines I have ever heard – just two words and perfect self-reference.

    Robyn,
    Sorry, I’ve just checked the UKOUG agenda and realised you’re doing a presentation on Root Cause Analysis. (It’s different when it’s technicians talking about it rather than salesmen ;) )

    Comment by Jonathan Lewis — September 28, 2008 @ 6:51 pm BST Sep 28,2008 | Reply

  4. no worries … I did some summer reading of the current crop of RCA books. Pretentious fluff is dead on for some of it :)

    Comment by Robyn — September 29, 2008 @ 2:51 am BST Sep 29,2008 | Reply

  5. Very interesting and, hell, it’s a quite hot and current topic – the mortgage and bankrupcy!!! :)

    Comment by Dion Cho — September 29, 2008 @ 8:55 am BST Sep 29,2008 | Reply

  6. I find the expression du jour (what, pretentious, moi?) to be best practices.

    Nothing kills a legitimate “why?” faster than an invocation of the best industry practices.

    Comment by Gabe — September 29, 2008 @ 3:59 pm BST Sep 29,2008 | Reply

  7. Hi Jonathan,

    This post is really interesting. Can I reference it and translate it to Chinese in my blog.

    Thanks,
    Charlie

    Comment by Charlie Z — September 29, 2008 @ 9:43 pm BST Sep 29,2008 | Reply

  8. Charlie,

    Certainly. Thank you for asking for my permission.

    Comment by Jonathan Lewis — September 30, 2008 @ 7:02 am BST Sep 30,2008 | Reply

  9. […] other news, Jonathan Lewis talks about finding the root cause, in a different […]

    Pingback by Log Buffer #117: a Carnival of the Vanities for DBAs — October 3, 2008 @ 4:15 pm BST Oct 3,2008 | Reply

  10. I’m sitting in Copenhagen airport, waiting for a plane to take me home after the Miracle Oracle Open World event in Lalandia.

    Many of the great names in Oracle were speaking at this event – but the presentation that really stood out for me was the one by Robyn Sands on “Root Cause Analysis”.

    Many of the presentations at MOOW tend to be biased towards the in-depth technical stuff – but this one made the point that we MUST ask the right questions and behave the right way BEFORE we dive in with all the high-tech stuff to try fixing a problem.

    This may seem like an obvious message – but it’s amazing how rarely it gets mentioned, and Robyn put the message across very well.

    If you’re coming to the UKOUG annual conference this year, Robyn will be doing the same presentation there. It doesn’t matter whether you see yourself as a developer, DBA, or manager – go to it, and learn how to avoid wasting your most valuable resource .. your time.

    Comment by Jonathan Lewis — October 25, 2008 @ 1:24 pm BST Oct 25,2008 | Reply

  11. […] An Oracle database example of this is simply throwing hardware at a performance problem because a root cause analysis is perceived as requiring too much time and being too expensive (computer hardware costs are decreasing while at the same time IT labor costs are increasing).  Sure, replace the server with one having 4 times as many CPUs and 4 times as much memory – after all, hardware is cheap compared to the perceived cost of a root cause analysis (at least that is what it says on the news).  Forget that such a cheap upgrade will require 4 times as many Oracle Database CPU licenses, accompanied by 4 times as much for annual Oracle support/maintenance fees.  On second thought, maybe a root cause analysis is really a much better and less costly approach, no matter if the performance problem is caused by a change to daylight savings time, someone verbally abusing the SAN, an upgrade of the Oracle Database version, or something else. […]

    Pingback by Battling the Symptoms or Addressing the Root Cause « Charles Hooper's Oracle Notes — April 3, 2010 @ 4:26 pm BST Apr 3,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.