Oracle Scratchpad

September 26, 2008

Root Cause

Filed under: humour,Infrastructure,Troubleshooting — Jonathan Lewis @ 5:43 pm BST Sep 26,2008

There are a few expressions in the industry that irritate me - not necessarily for good reason but simply because they sound like the extremes of pretentiousness and marketing put together.  (Pretentious, moi !)

For example: “root cause analysis”, “holistic methods”  and so on – it’s all those little ways of saying “we’re doing just the same as everybody else, but we’re trying to sound as if we’re doing something better and more meaningful.”

When offered fluff like this, I always like to restate it from the opposite perspective to see what impression it makes:

“We have a holistic approach to tuning” = “Other people will only look at a little bit of your system”.

“We focus on root cause analysis” = “Other people don’t try to find out what the problem really is”.

Put like this, the (lack of) added value, and the hand-waving attempt at deception in statements of this type become that little bit clearer.

Nevertheless, I recently came across a brilliant piece of root cause analysis. It went like this:

Question: “Why have we started to see such a big increase in log file sync wait time ?”

Answer:”It’s the economy, stupid.”

Really – that was the root cause (although the pejorative is only there because I wanted to use the quotation attributed to Bill Clinton’s 1992 election campaign).

Here’s how it came about.

The housing market is slowing down (it’s the economy), and banks are less keen to offer mortgages (large scale loans based on property) to home-buyers**. This means they are also slower to process applications for mortgages.

So there’s this system which deals with mortgage applications that could have one of four possible states: new, processing, granted, rejected.  Suddenly there are a lot more applications stalled in the ‘processing’ state.

Since most of the office work during the day focuses on current (processing) applications, most of the SQL run during the day now addresses a larger working data set, so uses more CPU to run. When the CPU load goes up, the time taken for a log file sync round trip can increase even when the number of log file syncs and the volume of redo log generated doesn’t change.

So – when the housing market slows down, the log file sync waits go up. That’s root cause analysis for you.

Footnote – this is not a true and accurate description of the system that prompted this little note but it is true that changes in the economy may affect your database performance.

 ** A recent story in The Times (UK, that is) pointed out that the banks seemed to be competing to see who could offer the worst possible deals. Presumably on the basis that they had to offer the mortgages to maintain their presence on the market, but didn’t want anyone to take them up.

11 Comments »

  1. Me thinks you are a lover of Fawlty Towers sir? Particularly episode 8 – The Psychiatrist.

    Comment by Jeff Moss — September 26, 2008 @ 8:58 pm BST Sep 26,2008 | Reply

  2. Jonathan, I love this example but you’ve beaten me to my punchline. (well, one of them) In IT, we can’t always fix the root cause: it’s beyond our control, our budget or our time constraints. But the more we know about the cause of the problem, the more likely we can correct it and stop it from reoccurring. Resolving the economic crunch would fix the problem in this example and a lot of others but since that can’t be controlled, understanding the changing load on database helps identify what is necessary to deal with the new conditions, and with that understanding the best option can be chosen.

    Many good dba’s already look for the ‘root cause’ even if they never use the phrase. I’d suggest that they try the terminology out on management; sometimes speaking in their language gets the point across better. (damn, there goes the second punchline .. now what will I talk about?)

    Comment by Robyn — September 27, 2008 @ 3:50 am BST Sep 27,2008 | Reply

  3. Jeff,
    I’d forgotten where the line (Pretentious, moi!) came from: but it is one of the wittiest lines I have ever heard – just two words and perfect self-reference.

    Robyn,
    Sorry, I’ve just checked the UKOUG agenda and realised you’re doing a presentation on Root Cause Analysis. (It’s different when it’s technicians talking about it rather than salesmen ;) )

    Comment by Jonathan Lewis — September 28, 2008 @ 6:51 pm BST Sep 28,2008 | Reply

  4. no worries … I did some summer reading of the current crop of RCA books. Pretentious fluff is dead on for some of it :)

    Comment by Robyn — September 29, 2008 @ 2:51 am BST Sep 29,2008 | Reply

  5. Very interesting and, hell, it’s a quite hot and current topic – the mortgage and bankrupcy!!! :)

    Comment by Dion Cho — September 29, 2008 @ 8:55 am BST Sep 29,2008 | Reply

  6. I find the expression du jour (what, pretentious, moi?) to be best practices.

    Nothing kills a legitimate “why?” faster than an invocation of the best industry practices.

    Comment by Gabe — September 29, 2008 @ 3:59 pm BST Sep 29,2008 | Reply

  7. Hi Jonathan,

    This post is really interesting. Can I reference it and translate it to Chinese in my blog.

    Thanks,
    Charlie

    Comment by Charlie Z — September 29, 2008 @ 9:43 pm BST Sep 29,2008 | Reply

  8. Charlie,

    Certainly. Thank you for asking for my permission.

    Comment by Jonathan Lewis — September 30, 2008 @ 7:02 am BST Sep 30,2008 | Reply

  9. [...] other news, Jonathan Lewis talks about finding the root cause, in a different [...]

    Pingback by Log Buffer #117: a Carnival of the Vanities for DBAs — October 3, 2008 @ 4:15 pm BST Oct 3,2008 | Reply

  10. I’m sitting in Copenhagen airport, waiting for a plane to take me home after the Miracle Oracle Open World event in Lalandia.

    Many of the great names in Oracle were speaking at this event – but the presentation that really stood out for me was the one by Robyn Sands on “Root Cause Analysis”.

    Many of the presentations at MOOW tend to be biased towards the in-depth technical stuff – but this one made the point that we MUST ask the right questions and behave the right way BEFORE we dive in with all the high-tech stuff to try fixing a problem.

    This may seem like an obvious message – but it’s amazing how rarely it gets mentioned, and Robyn put the message across very well.

    If you’re coming to the UKOUG annual conference this year, Robyn will be doing the same presentation there. It doesn’t matter whether you see yourself as a developer, DBA, or manager – go to it, and learn how to avoid wasting your most valuable resource .. your time.

    Comment by Jonathan Lewis — October 25, 2008 @ 1:24 pm BST Oct 25,2008 | Reply

  11. [...] An Oracle database example of this is simply throwing hardware at a performance problem because a root cause analysis is perceived as requiring too much time and being too expensive (computer hardware costs are decreasing while at the same time IT labor costs are increasing).  Sure, replace the server with one having 4 times as many CPUs and 4 times as much memory – after all, hardware is cheap compared to the perceived cost of a root cause analysis (at least that is what it says on the news).  Forget that such a cheap upgrade will require 4 times as many Oracle Database CPU licenses, accompanied by 4 times as much for annual Oracle support/maintenance fees.  On second thought, maybe a root cause analysis is really a much better and less costly approach, no matter if the performance problem is caused by a change to daylight savings time, someone verbally abusing the SAN, an upgrade of the Oracle Database version, or something else. [...]

    Pingback by Battling the Symptoms or Addressing the Root Cause « Charles Hooper's Oracle Notes — April 3, 2010 @ 4:26 pm BST Apr 3,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers