Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:
My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.
Now I know that this is a statement that pretty much paraphrases something that Tom Kyte wrote on AskTom several years ago – but it’s wrong. As I’ve pointed out in the past, “Cost is Time”. The cost of a query represents the optimizer’s estimate of how long it will take that query to run – so it is perfectly valid to compare the cost of two queries to see which one the optimizer thinks will be faster but, thanks to limitations and defects in the optimizer it may not be entirely sensible to do so.
The point I want to address in this post though is the comment that “it’s valid to compare the cost of different plans, but not to compare the cost of two different queries”. Consider the following queries:
select t1.v1 from t1 where t1.id in ( select t2.id from t2 where t2.n1 = 15 ) ; select t1.v1 from t1 where exists ( select t2.id from t2 where t2.n1 = 15 and t2.id = t1.id ) ; select t1.v1 from ( select distinct t2.id from t2 where t2.n1 = 15 ) t2, t1 where t1.id = t2.id ; select v1.v1 from ( select distinct t1.rowid, t2.id, t1.v1 from t2, t1 where t2.n1 = 15 and t1.id = t2.id ) v1 ;
Tables t1 and t2 have the following definitions, so all four queries are logically equivalent:
Name Null? Type ----------------------- -------- ---------------- ID NOT NULL NUMBER N1 NOT NULL NUMBER V1 VARCHAR2(6) PADDING VARCHAR2(100)
According to the claim, it is not valid to compare the costs that the optimizer gives you for these four different queries – but they are the same query. In principle the optimizer might transform the IN to an EXISTS, it might simply unnest, it might unnest and merge – so when the optimizer is “comparing different costs for the same query”, it will also be “comparing costs of different queries”.
Cost IS time – but only in theory. The “trick” to sorting out optimization problems lies in recognising where the optimizer model is not right for your data, or the optimizer arithmetic is too simplistic or has a bug.