Join Ordering - pt2
In part one of this short series, I introduced a query structure for investigation, and made a few comments about subqueries and unnesting. In this follow-up, I plan to cover two points - ways of dealing with subqueries, and the general approach that Oracle will take with queries like the following:
select
/*+ qb_name(main) */
t1.v1
from
t1, t3
where
t1.n2 = 100
and t3.n1 = t1.n1
and t3.n2 = 100
and exists (
select
/*+ qb_name(sub2) */
t2.id
from t2
where t2.n1 = 15
and t2.id = t1.id
)
and exists (
select
/*+ qb_name(sub4) */
t4.id
from t4
where t4.n1 = 15
and t4.id = t3.id
)
;
There are four basic ways (that I know of) for Oracle to handle a subquery:
- unnesting to an in-line view
- semi-join/anti-join
- filter subquery
- (driving) access subquery
The following examples demonstrate the options:
Unnesting to an in-line view:
select small_vc
from t1
where n2 between 10 and 200
and exists (
select null
from t2
where t2.mod1 = t1.n1
and t2.n2 = 15
)
;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 24 |
|* 1 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 19 | 2 |
| 2 | NESTED LOOPS | | 1 | 26 | 24 |
| 3 | SORT UNIQUE | | 1 | 7 | 2 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 7 | 2 |
|* 5 | INDEX RANGE SCAN | T2_N2 | 1 | | 1 |
|* 6 | INDEX RANGE SCAN | T1_PK | 1 | | 1 |
------------------------------------------------------------------------
In this example, we see table t2 from the subquery appearing in an aggregate view (lines 3 - 5) that drives a nested loop join.
Loosely speaking, the optimizer has chosen this path because the correlation between the subquery is using the mod1 column - for which there is no efficient access path - as the correlating column.
Semi-join / Anti-join
select small_vc
from t1
where n2 between 10 and 200
and exists (
select null
from t2
where t2.n2 = t1.n1
and t2.n2 = 15
)
;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 3 |
| 1 | NESTED LOOPS SEMI | | 1 | 23 | 3 |
|* 2 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 19 | 2 |
|* 3 | INDEX RANGE SCAN | T1_PK | 1 | | 1 |
|* 4 | INDEX RANGE SCAN | T2_N2 | 1 | 4 | 1 |
------------------------------------------------------------------------
In a remarkably similar looking query - where, however, the correlating column is highly selective, not null, and has a convenient index based on it - we see that the nested loop is driven by table t1, using an index to probe table t2, stopping after one row if a match is found - the semi-join.
The anti-join appears in equivalent circumstances where you change the operator to “not exists”, and follows the same strategy of driving off the main table and attempting to join to the subquery table. In the case of the anti-join, though, we eliminate a driving row on the first match in the subquery table, rather than deciding to keep it as we do with the semi-join.
The semi-join and anti-join can also be used for “in” and “not in” subqueries, which can (often) be transformed to “exists” and “not exists” subqueries. (Note: “not in” is not the exact opposite to “in”, and there are cases where it cannot be transformed to “not exists”). See this post for a short note explaining the critical difference.
Filter Subquery
select small_vc
from t1
where n2 between 100 and 200
or exists (
select null
from t2
where t2.n1 = t1.n1
and t2.mod1 = 15
)
;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 597 | 11343 | 28 |
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL | T1 | 10000 | 185K| 28 |
|* 3 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 7 | 2 |
|* 4 | INDEX RANGE SCAN | T2_PK | 1 | | 1 |
------------------------------------------------------------------------
Because of the “or” condition in this query the optimizer cannot unnest or otherwise transform the subquery. Consequently the execution path is driven by a tablescan on table t1, after which each row that doesn’t pass the simple condition is (notionally) subject to a check that entails running the subquery. This could be very inefficient, although scaler subquery caching often makes an enormous difference to the performance. If necessary you may be able to re-write this query as a union all of two disjunct queries. (For a discussion of this technique, see this note).
Access subquery
select small_vc
from min_max mm1
where mm1.id_parent = 100
and mm1.id_child = (
select max(mm2.id_child)
from min_max mm2
where mm2.id_parent = 100
)
;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 108 | 4 |
| 1 | TABLE ACCESS BY INDEX ROWID | MIN_MAX | 1 | 108 | 2 |
|* 2 | INDEX UNIQUE SCAN | MM_PK | 1 | | 1 |
| 3 | SORT AGGREGATE | | 1 | 8 | |
| 4 | FIRST ROW | | 10 | 80 | 2 |
|* 5 | INDEX RANGE SCAN (MIN/MAX)| MM_PK | 10 | 80 | 2 |
--------------------------------------------------------------------------
In some ways this is a cross between the in-line view and the filter subquery.
The subquery is executed first because it is guaranteed to produce just one row (thanks, in this case, to the definition of the primary key which is (id_parent, id_child)). The single value is then passed back to the outer query to act as an access predicate for a driving index.
Note: I do not know if this is appropriate terminology (from the perspective of Oracle Corp.) but there is a difference in the way Oracle handles subqueries that generate values for filter predicates compared to subqueries that generate values for access predicates, hence my choice of terms.
Note 2: You would normally expect to see the predicate in the subquery (mm2.id_parent = 100) written with a correlating column (mm2.id_parent = mm1.id_parent) rather than repeating the constant. But in 9i I found that using the repetition, for literals or for bind variables, gave the optimizer the option of using this (slightly more efficient) path rather then turning the subquery into an in-line view. The trick is probably not (usually) necessary in 10g.
Basic Strategy
Given these possibilities for handling subqueries, Oracle generally attempts to transform a query with multiple subqueries into a simple join using in-line views, anti-joins and semi-joins.
Filter subqueries, where they cannot be transformed, are postponed by default to execute at the end of the join. Access subqueries may execute early.
This means that queries like the example at the top of the page may be transformed into simple four-table joins - and in this specific case that is exactly what happens. If this transformation does appear the optimizer in 9i will consider no other alternatives.
In 10gR2 you may find some queries that the optimizer transforms to the simple join, and then recosts in other ways. This is a consequence of the “cost based query transformation” feature introduced in 10gR2; but there are still some transformations that the optimizer does without costing, and even describes in the 10053 trace file as “not requiring costing”. I’m not entirely convinced that this non-costed transformation is entirely appropriate - but we will be looking at that in later articles.
[...] Page pointer Filed under: Uncategorized — Jonathan Lewis @ 8:29 pm UTC Feb 22,2007 I have just published a new page, referenced from the main menu, to continue the investigation of a four-table query with two subqueries. The page is Join Ordering - pt2 [...]
Pingback by Page pointer « Oracle Scratchpad — February 22, 2007 @ 8:29 pm UTC Feb 22,2007
It would be terrific if you could transform this page into a section of a book of yours.
The style is perfect, a catalog of quick-to-read definitions each illustrated with a clear and simple example (the simplest possible) … my preferred format.
Obviously the topic is very interesting to me as well.
Comment by Alberto Dell'Era — February 24, 2007 @ 10:58 pm UTC Feb 24,2007