Join Ordering - pt1
A couple of days ago, I offered up the following SQL for review, asking you to consider how many strategies there might be for executing it. When I asked the question, I wasn’t actually asking about technical options - I just wanted to prompt people into thinking about what the query might mean, as the purpose of a query like this will have a big impact on the best choice of execution path.
select
/*+ qb_name(main) */
t1.v1
from
t1, t3
where
t1.n2 = 100
and t3.n1 = t1.n1
and t3.n2 = 100
and exists (
select
/*+ qb_name(sub2) */
t2.id
from t2
where t2.n1 = 15
and t2.id = t1.id
)
and exists (
select
/*+ qb_name(sub4) */
t4.id
from t4
where t4.n1 = 15
and t4.id = t3.id
)
;
Bear in mind that this specific query is simply intended as a framework, and the possibilities I will be describing are dependent on the version of the optimizer, the data, the constraints, the indexing, and variations in the nature of the predicates.
In particular, Oracle’s ability to unnest the subqueries can vary dramatically with version as well depending on the extra bits and pieces you might have included in a slightly more generic query; and I may choose to change exists to not exists, or an and to an or for the purposes of discussion.
The number of possibilities, and the commentary that goes with them is so great that I’m going to write a series of “chapters” about the query, and publish them as standalone pages, with entries on the main menu. This is the first such chapter.
Simple Joins and Subqueries
In the basic case, and ignoring the possibility of unnesting subqueries, the optimizer will first join the tables in the outer from clause, and postpone the subqueries until the end of the main query.
This may, or may not, be a good thing. One of the critical strategies for optimising SQL is to ensure that you “eliminate early” - which often means that in the middle of a multi-table join you should pick the “next” table as the one that is the best compromise between an efficient access path and the effect it has on reducing the size of the current result set.
For example, if t1 was a table of order lines, then t3 might be a table of orders, and t2 (in the first subquery) might be a table of delivery notes which ( in this hypothetical design) relate to order lines rather than orders.
So, if you have identified a list of order lines, should you join to the orders table because that has a date check on the order that eliminates 90% of the order lines you have selected so far; or should you run the subquery against the delivery notes table because that eliminates 99% of the order lines ?
Clearly, it seems sensible to eliminate as much data as possible as early as possible by running the subquery first - but the join to orders may be much more efficient, and would leave you with far fewer executions of the (more expensive) subquery. You have to know the data and understand the query to make the best decision.
Of course you may decide that unnesting the subquery (hence finding delivery notes first and then joining to order lines) may be more efficient - on the other hand, there may be no efficient way to select a minimal set of delivery notes until after you have acquired some information from the order lines table.
Hinting
In cases like this Oracle gives you some strategic control The hints push_subq, no_push_subq, unnest and no_unnest can be used to indicate the general strategy for the query.
In 10g, the push_subq hint should be placed inside the subqueries that you want to operate early - in prior versions of Oracle it went in the outer query. Watch out for this if you already use push_subq in 8i or 9i - after upgrading to 10g it may appear that Oracle is ignoring your hint.
[...] For the next chapter in the treatment of the query, go to this page. [...]
Pingback by Join Ordering - 1 « Oracle Scratchpad — March 5, 2007 @ 8:44 pm UTC Mar 5,2007