A couple of days ago, I offered up the following SQL for review, asking you to consider how many strategies there might be for executing it. When I asked the question, I wasn’t actually asking about technical options – I just wanted to prompt people into thinking about what the query might mean, as the purpose of a query like this will have a big impact on the best choice of execution path.
select /*+ qb_name(main) */ t1.v1 from t1, t3 where t1.n2 = 100 and t3.n1 = t1.n1 and t3.n2 = 100 and exists ( select /*+ qb_name(sub2) */ t2.id from t2 where t2.n1 = 15 and t2.id = t1.id ) and exists ( select /*+ qb_name(sub4) */ t4.id from t4 where t4.n1 = 15 and t4.id = t3.id ) ;
Bear in mind that this specific query is simply intended as a framework, and the possibilities I will be describing are dependent on the version of the optimizer, the data, the constraints, the indexing, and variations in the nature of the predicates.
In particular, Oracle’s ability to unnest the subqueries can vary dramatically with version as well depending on the extra bits and pieces you might have included in a slightly more generic query; and I may choose to change exists to not exists, or an and to an or for the purposes of discussion.
The number of possibilities, and the commentary that goes with them is so great that I’m going to write a series of “chapters” about the query, and publish them as standalone pages, with entries on the main menu. This is the first such chapter.
Simple Joins and Subqueries
In the basic case, and ignoring the possibility of unnesting subqueries, the optimizer will first join the tables in the outer from clause, and postpone the subqueries until the end of the main query.
This may, or may not, be a good thing. One of the critical strategies for optimising SQL is to ensure that you “eliminate early” – which often means that in the middle of a multi-table join you should pick the “next” table as the one that is the best compromise between an efficient access path and the effect it has on reducing the size of the current result set.
For example, if t1 was a table of order lines, then t3 might be a table of orders, and t2 (in the first subquery) might be a table of delivery notes which ( in this hypothetical design) relate to order lines rather than orders.
So, if you have identified a list of order lines, should you join to the orders table because that has a date check on the order that eliminates 90% of the order lines you have selected so far; or should you run the subquery against the delivery notes table because that eliminates 99% of the order lines ?
Clearly, it seems sensible to eliminate as much data as possible as early as possible by running the subquery first – but the join to orders may be much more efficient, and would leave you with far fewer executions of the (more expensive) subquery. You have to know the data and understand the query to make the best decision.
Of course you may decide that unnesting the subquery (hence finding delivery notes first and then joining to order lines) may be more efficient – on the other hand, there may be no efficient way to select a minimal set of delivery notes until after you have acquired some information from the order lines table.
Hinting
In cases like this Oracle gives you some strategic control The hints push_subq, no_push_subq, unnest and no_unnest can be used to indicate the general strategy for the query.
In 10g, the push_subq hint should be placed inside the subqueries that you want to operate early – in prior versions of Oracle it went in the outer query. Watch out for this if you already use push_subq in 8i or 9i – after upgrading to 10g it may appear that Oracle is ignoring your hint.
Hi Jonathan,
Can you please help me to understand the following :
Oracle Version :10.2.0.4.0
When I trying running the query1 it takes ages to finish compared to query 2 which takes seconds.
Question 1 :The COST of query2 is almost 350 times more. What is the reason behind this ?
Question 2: The query returns 90 rows. But why the cardinality is different in the plans of the queries ? Isn’t cardinality should be the same as the number of rows returned ?
I have used autotrace to generate the plan and the gem.ref_quotation table is analyzed.
Thanks,
Comment by Joe — December 14, 2009 @ 1:37 pm GMT Dec 14,2009 |
The two queries seem to be identical – is this intentional, did you simply change some parameters between the two runs ?
A couple of things first:
(a) there are articles on this blog pointing out that if you want to give someone an execution plan it should use dbms_xplan, and you have to supply the predicate section
(b) you haven’t helped me to help you by formatting your code to make it easy to read – see the “How to Comment” link near the top right hand side (so I had to edit it)
(c) this isn’t the autotrace output from 10.2.0.4 – are you using a 9i client ?
(d) this isn’t AskTom – I don’t do troubleshooting for people
(e) part of your answer may be related to an example I give in my chapter of “Expert Oracle Practices” (due out January)
Question 1: In theory, cost is time; but the optimizer is not perfect and there are many reasons why the optimizer’s calculated cost and the final run time of the query don’t seem to be related. Your first execution plan operates as a filter subquery, and it’s impossible to predict how many times the subquery will operate – but for the purposes of costing, Oracle assumes it will be once per row returned from the driving rowsource (hence the very high cost in your case). The cardinality of one may be variation of a bug I’ve found in 10gR2 relating to aggregate subqueries that result in a cardinality of one after filtering.
Question 2: See end of question 1.
Comment by Jonathan Lewis — December 16, 2009 @ 10:30 am GMT Dec 16,2009 |