Oracle Scratchpad

July 24, 2015

Subquery Factoring (9)

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 12:34 pm BST Jul 24,2015

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:

rem
rem     Script:         subq_factor_materialize.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2013
rem

create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum <= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


==============================
Unhinted - doesn't materialize
==============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved.

If I add the predicate “where object_id > 0” (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule:

two or more appearances of the subquery and some indication of row selection in the subquery rowsource

(In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table. (This is not always the case, though).

Update

Following on from Mark Farhham’s comment below, a simple test showed that adding the predicate “where 1 = 1” was sufficient to make the optimizer materialize the CTE without hinting!

10 Comments »

  1. Randolf has made similar observation:

    http://oracle-randolf.blogspot.co.uk/2014/10/heuristic-temp-table-transformation.html
    http://oracle-randolf.blogspot.co.uk/2015/05/heuristic-temp-table-transformation-2.html

    “The logic simply seems to be: If the CTE expression is referenced more than once AND the CTE expression contains at least some (filter or join) predicate then it will be materialized.”

    Comment by Dom Brooks — July 24, 2015 @ 1:20 pm BST Jul 24,2015 | Reply

  2. A Wolfgang trace would probably be interesting in the un-hinted, join only predicate case. Perhaps the materialized (even though 100%) subset is not considered, some bit being flipped even in the CBO row source exploration choices even though the stats’ low value should tell it everything is selected. Interesting that an always true predicate may trigger an optimizer consideration path in that unlike a hint it still lets the CBO choose minimum cost, so it shouldn’t require future testing and possible removal. I *think* that is qualitatively less optimizer influencing baggage than a hint. (By the way, would that be a useful new acronym: OIB ?)

    Comment by rsiz — July 24, 2015 @ 2:22 pm BST Jul 24,2015 | Reply

  3. Jonathan,

    I’ve covered this and some other aspects of the TEMP TABLE transformation in a small mini series some time ago:

    Part 1: When does materialization happen and some cases where the heuristic approach clearly is sub-optimal, also bad cardinality estimates caused by the materialization: http://oracle-randolf.blogspot.com/2014/10/heuristic-temp-table-transformation.html
    Part 2: Projection and materialization issues: http://oracle-randolf.blogspot.com/2015/05/heuristic-temp-table-transformation-2.html
    Part 3: More cardinality estimates issues – part 1: http://oracle-randolf.blogspot.com/2015/05/temp-table-transformation-cardinality.html
    Part 4: More cardinality estimates issues – part 2, plus Exadata issues regarding offloading: http://oracle-randolf.blogspot.com/2015/05/temp-table-transformation-cardinality_29.html

    Randolf

    Comment by Randolf Geist — July 24, 2015 @ 2:30 pm BST Jul 24,2015 | Reply

  4. Hi Jonathan,
    just in addition.

    > My tentative conclusion is that the transformation is a heuristic one that follows the rule “two or more appearances of the subquery and some indication of row selection in the subquery rowsource”

    Tony Hasler also described this transformation as a heuristic one in his book “Expert Oracle SQL: Optimization, Deployment, and Statistics” in more detail. ( http://www.apress.com/9781430259770 ): “When legal, factored subquery materialization is a heuristic optimization. If a subquery is referenced once then it will not be materialized, but if it is referenced twice or more it normally will be. (Occasionally an INDEX FAST FULL SCAN will not be materialized unless hinted even when referenced multiple times.)”

    Chapter 18 also includes several examples when and why to override this heuristic rule.

    Regards
    Stefan

    Comment by Stefan Koehler — July 24, 2015 @ 4:40 pm BST Jul 24,2015 | Reply

  5. […] prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do […]

    Pingback by Subquery Factoring (10) | Oracle Scratchpad — July 27, 2015 @ 1:26 pm BST Jul 27,2015 | Reply

  6. […] The exception to the materialization “rule of two” with several extra links […]

    Pingback by CTE Catalogue | Oracle Scratchpad — June 10, 2020 @ 6:46 pm BST Jun 10,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.