Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:
My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.
Now I know that this is a statement that pretty much paraphrases something that Tom Kyte wrote on AskTom several years ago – but it’s wrong. As I’ve pointed out in the past, “Cost is Time”. The cost of a query represents the optimizer’s estimate of how long it will take that query to run – so it is perfectly valid to compare the cost of two queries to see which one the optimizer thinks will be faster but, thanks to limitations and defects in the optimizer it may not be entirely sensible to do so.
The point I want to address in this post though is the comment that “it’s valid to compare the cost of different plans, but not to compare the cost of two different queries”. Consider the following queries:
select t1.v1 from t1 where t1.id in ( select t2.id from t2 where t2.n1 = 15 ) ; select t1.v1 from t1 where exists ( select t2.id from t2 where t2.n1 = 15 and t2.id = t1.id ) ; select t1.v1 from ( select distinct t2.id from t2 where t2.n1 = 15 ) t2, t1 where t1.id = t2.id ; select v1.v1 from ( select distinct t1.rowid, t2.id, t1.v1 from t2, t1 where t2.n1 = 15 and t1.id = t2.id ) v1 ;
Tables t1 and t2 have the following definitions, so all four queries are logically equivalent:
Name Null? Type ----------------------- -------- ---------------- ID NOT NULL NUMBER N1 NOT NULL NUMBER V1 VARCHAR2(6) PADDING VARCHAR2(100)
According to the claim, it is not valid to compare the costs that the optimizer gives you for these four different queries – but they are the same query. In principle the optimizer might transform the IN to an EXISTS, it might simply unnest, it might unnest and merge – so when the optimizer is “comparing different costs for the same query”, it will also be “comparing costs of different queries”.
Cost IS time – but only in theory. The “trick” to sorting out optimization problems lies in recognising where the optimizer model is not right for your data, or the optimizer arithmetic is too simplistic or has a bug.

Cost…
As soon as I saw the title of Jonathan Lewis’ post, I had an inkling of what it might have to say and I wasn’t too far off the mark. Although I don’t disagree with a single statement of his post (I’ve read it a few times to make sure), I tend to ta…
Trackback by Doug's Oracle Blog — January 19, 2011 @ 12:00 am UTC Jan 19,2011 |
This is a very valid point, and makes the whole point about “comparing costs of the same / different query” moot, in particular with the introduction of Cost Based Query Transformation (CBQT) in 10g – because since then the optimizer effectively compares costs of “different”, namely transformed, queries all the time (I assume this is what you intended to express with “it will also be comparing costs of different queries”).
It also gives an important clue why this improvement not always produces desirable results, simply because the cost estimates might be wrong for various reasons.
Randolf
Comment by Randolf Geist — January 21, 2011 @ 11:23 am UTC Jan 21,2011 |