Oracle Scratchpad

May 8, 2015

Cost

Filed under: CBO,Oracle — Jonathan Lewis @ 8:21 am GMT May 8,2015

I’ve just been checking “Cost Based Oracle – Fundamentals” (Apress 2005) to see what I said on a particular topic, and I couldn’t resist quoting the following from the opening page of Chapter 1:

One of the commonest questions about the CBO on the Internet is: “What does the cost represent?” This is usually followed by comments like: “According to explain plan the cost of doing a hash join for this query is seven million and the cost of a nested loop is forty-two – but the hash join completes in three seconds and the nested loop takes 14 hours.”

The answer is simple: the cost represents (and has always represented) the optimizer’s best estimate of the time it will take to execute the statement. But how can this be true when people can see oddities like the hash join / nested loop join example above? The answer can usually be found in that good old acronym GIGO: Garbage In, Garbage Out.

The CBO makes errors for six main reasons:

  • There are some inappropriate assumptions built into the cost model.
  • The relevant statistics about the data distribution are available, but misleading
  • The relevant statistics about the data distribution are not available
  • The performance characteristics of the hardware are not known
  • The current workload is not known
  • There are bugs in the code

Still true – although there are more features and cunning bits where inappropriate assumptions and bugs can appear.

 

 

13 Comments »

  1. “….The performance characteristics of the hardware are not known…”
    So you advocate for gathering system stats ?
    In contrast to Arup Nanda ( https://twitter.com/arupnanda/status/595286911058575361 ) ?

    Comment by Matthias Rogel — May 8, 2015 @ 10:18 am GMT May 8,2015 | Reply

  2. I dont get this quote in my head:
    “The answer is simple: the cost represents (and has always represented) the optimizer’s best estimate of the time it will take to execute the statement.”

    Because i always have in mind what Tom Kyte wrote on this (admitetly long time ago):

    https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:313416745628
    “Don’t compare them, you cannot — they might as well be random numbers.” (2000)
    “THE COST HAS NOTHING WHATSOEVER TO DO WITH THE RUNTIME PERFORMANCE OF A QUERY.
    YOU CANNOT COMPARE THEM AND SAY “THIS WILL BE FASTER” (2002)
    “1) the cost is the thing that drives the CBO, it is what it optimizes towards. You and I looking
    at that cost? No, the optimizer – YES.” (2005)
    “You cannot compare the cost of two different queries with each other – practically speaking. As Jonathan himself wrote:
    Cost IS time – but only in theory.” (2011)

    If i follow your statement above then, if there wont be any garbage in If costs would represent time they should be comparable over different queries. But then they would not be random numbers.

    I still think of costs as a metric for the CBO to compare different possible execution plans of exactly the same query under the same conditions.

    Therefore at the moment i still find thinking of costs as time misleading.

    Am i missing the point totally?

    Comment by chris — May 8, 2015 @ 10:31 am GMT May 8,2015 | Reply

    • Chris,
      the execution plan actually contains the corresponding TIME information for a given COST value. Of course the results are frequently far from accurate – so I would not try to create a forecast of the query execution time based on it (and I think that’s Tom Kyte’s point) . Randolf Geist wrote in http://oracle-randolf.blogspot.de/2011/07/cost-is-time-next-generation.html: “The cost estimate generated by the Cost-Based Optimizer (CBO) has always been a time estimate, although expressed in a slightly obscure unit, which is number of single block reads.” There you find also some links to other articles written by Jonathan and Randolf in which the relation of cost and time is explained.

      Regards

      Martin

      Comment by Martin Preiss — May 8, 2015 @ 11:02 am GMT May 8,2015 | Reply

    • Probably the most important bit to notice: “You cannot compare the cost *of two different queries* with each other”. Cost is only comparable between two plans for the same query.

      Comment by Jeffrey Kemp — May 9, 2015 @ 1:08 am GMT May 9,2015 | Reply

      • Jeffrey,

        Apart from the link I supplied to Chris, there’s another way of viewing that comment.

        We OFTEN see people say things like: “Oracle chose a nested loop, but when I hinted a hash join it was 10 times quicker even though the cost was 1000 times bigger?”

        What does it mean to say that “cost is comparable between two plans for the same query” when the optimizer picks the wrong plan for a query by comparing the costs.

        Comment by Jonathan Lewis — May 9, 2015 @ 7:45 am GMT May 9,2015 | Reply

        • Over the years I’ve surprised a number of developers by telling them that when they use a hint to get the optimiser to pick the “right” plan, that I fully expect the calculated cost of the hinted query will be higher than of the original unhinted query. After all, that’s the reason the CBO was picking the “wrong” plan in the first place.

          Comment by Jeffrey Kemp — May 9, 2015 @ 12:20 pm GMT May 9,2015

        • Jeffrey,

          I’ll second that.
          “Why did the cost go up when I hinted ?” — because it’s absolutely not supposed to go down.

          The irony of the optimizer cost is that the only time that anyone looks at it is when it’s wrong in some way; which is why it’s reasonably truthful to say “there’s no point in looking at the cost” – really it’s more accurate to say “if you NEED to look at it, there’s no point because it will probably be wrong”.

          Comment by Jonathan Lewis — May 9, 2015 @ 10:50 pm GMT May 9,2015

    • Chris,

      It’s not often that Tom is wrong, or puts things badly; most frequently he’s misinterpreted.

      In this case nearly everything he says is qualified:
      “… might as well be random ..” — is not the same as “are random”
      “… practically speaking …” — is not the same as “in principal”.

      The first all-caps sentence, though, is mostly wrong: the cost really is what the optimizer “thinks” the query time will be at runtime.

      The second second all-caps statement could be corrected in two ways:

        a) You SHOULD not compare them and say “this will be faster”
        b) You CAN compare them and say “this OUGHT TO be faster BUT IT MIGHT NOT BE BECAUSE THERE ARE SO MANY DEFECTS AND LIMITATIONS IN THE OPTIMIZER”.

      As for your comment “You cannot compare the cost of two different queries with each other” – there’s an obvious fallacy in that statement.

      Comment by Jonathan Lewis — May 9, 2015 @ 7:41 am GMT May 9,2015 | Reply

  3. Matthias: JL notably did not advocate “gathering system stats” but rather noted that it limits the accuracy of the CBO when “The performance characteristics of the hardware are not known.” The best values for system stats probably vary over time for most systems. Rough edge values that are good enough most of the time can be set in a variety of ways including leaving the defaults alone when those are decent values for an actual system. Exactly when it is worth while to tinker with system stats is a complex topic and the answer likely varies with changes in the CBO. Dynamically collecting system statistics exposes you to a risk of destabilizing efficient plans at the same time it might also improve some plans. Destabilization is something that Arup quite correctly avoids.

    Comment by rsiz — May 8, 2015 @ 12:39 pm GMT May 8,2015 | Reply

  4. “if you NEED to look at it, there’s no point because it will probably be wrong”

    In my experience when the cost is wrong, cardinality estimates in one or more steps are also wrong.
    If you can find which step of the plan has the wrong cardinality estimate, you can use it as a starting point to start working on the query.

    But if all cardinality estimates are correct (+ – 10%) the cost can usually be trusted to compare plans of the same query.

    Comment by Pablo — May 11, 2015 @ 2:50 pm GMT May 11,2015 | Reply

  5. Have there already been ideas around to choose the plan with the lowest cost per row when different plans will return a different number of rows (maybe even for sub queries or before aggregation operations) or the consideration of a margin of error, depending from a parameter a less reliable (more guesses) plan has to be x times better (lower cost) as a more reliable one to be choosen for execution ?

    Comment by Rainer Stenzel — May 12, 2015 @ 8:04 am GMT May 12,2015 | Reply

    • Rainer,

      I don’t know what ideas may have been considered and discarded by the developers at Oracle Corp. so I can’t say anything about the “cost per row” type of idea – although it would be an error in the optimizer if two different internal transformations produced different cardinality estimates for the same query.

      There is at least one example of Oracle using a “reliability” effect – and I’ve written about it here: https://jonathanlewis.wordpress.com/2010/09/07/cbo-surprise-3/

      Comment by Jonathan Lewis — May 12, 2015 @ 8:50 am GMT May 12,2015 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.