I’ve just been checking “Cost Based Oracle – Fundamentals” (Apress 2005) to see what I said on a particular topic, and I couldn’t resist quoting the following from the opening page of Chapter 1:
One of the commonest questions about the CBO on the Internet is: “What does the cost represent?” This is usually followed by comments like: “According to explain plan the cost of doing a hash join for this query is seven million and the cost of a nested loop is forty-two – but the hash join completes in three seconds and the nested loop takes 14 hours.”
The answer is simple: the cost represents (and has always represented) the optimizer’s best estimate of the time it will take to execute the statement. But how can this be true when people can see oddities like the hash join / nested loop join example above? The answer can usually be found in that good old acronym GIGO: Garbage In, Garbage Out.
The CBO makes errors for six main reasons:
- There are some inappropriate assumptions built into the cost model.
- The relevant statistics about the data distribution are available, but misleading
- The relevant statistics about the data distribution are not available
- The performance characteristics of the hardware are not known
- The current workload is not known
- There are bugs in the code
Still true – although there are more features and cunning bits where inappropriate assumptions and bugs can appear.