Join Factorization

February 14, 2018

Join Factorization

Filed under: CBO,Oracle — Jonathan Lewis @ 3:38 pm GMT Feb 14,2018

This item is, by a roundabout route, a follow-up to yesterday’s note on a critical difference in cardinality estimates that appeared if you used the coalesce() function in its simplest form as a substitute for the nvl() function. Connor McDonald wrote a followup note about how using the nvl() function in a suitable predicate could lead to Oracle splitting a query into a union all (in version 12.2), which led me to go back to a note I’d written on the same topic about 10 years earlier where the precursor of this feature already existed but used concatenation instead of OR-expansion.

The script I’d used for my earlier article was actually one I’d written in February 2003 and tested fairly regularly since – which brings me to this article, because I finally tested my script against 12.2.0.1 to discover a very cute bit of optimisation.

The business of splitting a query into two parts can be used even when the queries are more complex and include joins; this doesn’t always happen automatically and sometimes has to be hinted (but that may be a costs/statistics thing) for example, from 12.1.0.2, a query and its execution plan:


select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

---------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |         |  1001 |   228K|    11   (0)| 00:00:01 |
|   1 |  CONCATENATION                          |         |       |       |            |          |
|*  2 |   FILTER                                |         |       |       |            |          |
|*  3 |    HASH JOIN                            |         |  1000 |   228K|     8   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                   | T2      |  1000 |   106K|     4   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                   | T1      |  1000 |   122K|     4   (0)| 00:00:01 |
|*  6 |   FILTER                                |         |       |       |            |          |
|   7 |    NESTED LOOPS                         |         |     1 |   234 |     3   (0)| 00:00:01 |
|   8 |     NESTED LOOPS                        |         |     1 |   234 |     3   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1      |     1 |   125 |     2   (0)| 00:00:01 |
|* 10 |       INDEX RANGE SCAN                  | T1_IDX1 |     1 |       |     1   (0)| 00:00:01 |
|* 11 |      INDEX UNIQUE SCAN                  | T2_PK   |     1 |       |     0   (0)| 00:00:01 |
|  12 |     TABLE ACCESS BY INDEX ROWID         | T2      |     1 |   109 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:V1 IS NULL)
   3 - access("T2"."N1"="T1"."N1")
   5 - filter("T1"."V1" IS NOT NULL)
   6 - filter(:V1 IS NOT NULL)
  10 - access("T1"."V1"=:V1)
  11 - access("T2"."N1"="T1"."N1")

You can see in this plan how Oracle has split the query into two queries combined through concatenation with filter operations at lines 2 (:v1 is null) and 6 (:v1 is not null) to allow the runtime engine to execute only the appropriate branch. You’ll also note that each branch can be optimised separately and in this case the two branches get dramatically different paths because of the enormous difference in the estimated volumes of data.

So let’s move up to 12.2.0.1 and see what happens to this query – but first I’m going to execute a cunning “alter session…” command which I’ll say more about later:


------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                 |  1001 |   180K|    11   (0)| 00:00:01 |
|   1 |  VIEW                                    | VW_ORE_F79C84EE |  1001 |   180K|    11   (0)| 00:00:01 |
|   2 |   UNION-ALL                              |                 |       |       |            |          |
|*  3 |    FILTER                                |                 |       |       |            |          |
|   4 |     NESTED LOOPS                         |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   5 |      NESTED LOOPS                        |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1              |     1 |   125 |     2   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_IDX1         |     1 |       |     1   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                  | T2_PK           |     1 |       |     0   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID         | T2              |     1 |   109 |     1   (0)| 00:00:01 |
|* 10 |    FILTER                                |                 |       |       |            |          |
|* 11 |     HASH JOIN                            |                 |  1000 |   228K|     8   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                   | T2              |  1000 |   106K|     4   (0)| 00:00:01 |
|* 13 |      TABLE ACCESS FULL                   | T1              |  1000 |   122K|     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(:V1 IS NOT NULL)
   7 - access("T1"."V1"=:V1)
   8 - access("T2"."N1"="T1"."N1")
  10 - filter(:V1 IS NULL)
  11 - access("T2"."N1"="T1"."N1")
  13 - filter("T1"."V1" IS NOT NULL)

There’s nothing terribly exciting about the change – except for the disappearance of the CONCATENATION operator and the appearance of the VIEW and UNION ALL operators to replace it (plus you’ll see that the two branches appear in the opposite order in the plan). But let’s try again, without doing that “alter session…”:


--------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|*  1 |  HASH JOIN                              |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                     | T2                 |  1000 |   106K|     4   (0)| 00:00:01 |
|   3 |   VIEW                                  | VW_JF_SET$A2355C8B |  1001 |   123K|     6   (0)| 00:00:01 |
|   4 |    UNION-ALL                            |                    |       |       |            |          |
|*  5 |     FILTER                              |                    |       |       |            |          |
|*  6 |      TABLE ACCESS FULL                  | T1                 |  1000 |   122K|     4   (0)| 00:00:01 |
|*  7 |     FILTER                              |                    |       |       |            |          |
|   8 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1                 |     1 |   125 |     2   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                  | T1_IDX1            |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="ITEM_1")
   5 - filter(:V1 IS NULL)
   6 - filter("T1"."V1" IS NOT NULL)
   7 - filter(:V1 IS NOT NULL)
   9 - access("T1"."V1"=:V1)

The plan now shows a view which is a union all involving only table t1 in both its branches. The result set from the view is then used as the probe table of a hash join with t2. You’ll note that the name of the view is now VW_JF_SET$A2355C8B – that’s JF for “Join Factorization”, and the alter session I executed to get the first plan was to disable the feature: ‘alter session set “_optimizer_join_factorization”= false;’.

Join factorization can occur when the optimizer sees a union all view that includes some tables that are common to both (all) branches of the query, and finds that it can move those tables outside the union all view while getting the same end result at a lower cost. In this case it happens to be a nice example of how the optimizer can transform then transform again to get to the lowest cost plan.

It’s worth noting that Join Factorization has been around since 11.2.x.x, and Or Expansion (in the form of concatenation) has been around for even longer – but it’s not until 12.2 that nvl() transforms through the newer OR-expansion, which is what allows it to transform onwards through Join Factorization.

You’ll note, by the way, that with this plan we always do a full tablescan of t2 whereas if we stop after just OR-expansion the tablescan is just a potential threat that may never (or hardly ever) be realised. That’s a point to check if you find that the transformation starts to appear inappropriately on an upgrade. There is a hint to disable the feature for a query, but it’s not trivial to get it right so if you do need to block the feature the smart hint (or SQL Patch) would be “opt_param(‘_optimizer_join_factorization’ ‘false’)”.

Footnote:

If you want to run the experiments yourself, here’s the script I used to generate the data. It’s more complicated than it needs to be because I use the same tables in several different tests:

rem
rem     Script:         null_plan_122.sql
rem     Author:         Jonathan Lewis
rem     Dated:          February 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1        Join Factorization
rem             12.1.0.2        Concatenation
rem
rem

drop table t2;
drop table t1;

-- @@setup  -- various set commands etc.

create table t1 (
        n1              number(5),
        n2              number(5),
        v1              varchar2(10),
        v2              varchar2(10),
        v3              varchar2(10),
        v4              varchar2(10),
        v5              varchar2(10),
        padding         varchar2(100),
        constraint t1_pk primary key(n1)
);

insert into t1
select
        rownum,
        rownum,
        rownum,
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        rpad('x',100)
from all_objects
where
        rownum <= 1000 -- > comment to avoid WordPress format mess
;

create unique index t1_n2 on t1(n2);

create index t1_idx1 on t1(v1);
create index t1_idx2 on t1(v2,v1);
create index t1_idx3 on t1(v3,v2,v1);

create table t2 (
        n1              number(5),
        v1              varchar2(10),
        padding         varchar2(100),
        constraint t2_pk primary key(n1)
);

insert into t2
select
        rownum,
        rownum,
        rpad('x',100)
from all_objects
where
        rownum <= 1000 -- > comment to avoid WordPress format mess
;

create index t2_idx on t2(v1);

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

variable n1 number
variable n2 number
variable v1 varchar2(10)
variable v2 varchar2(10)
variable v3 varchar2(10)

exec :n1 := null
exec :n2 := null
exec :v1 := null
exec :v2 := null
exec :v3 := null

spool null_plan_122

set autotrace traceonly explain

prompt  ============================================
prompt  One colx = nvl(:b1,colx) predicate with join
prompt  ============================================

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = false;

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = true;

set autotrace off

spool off

Comments (1)

1 Comment »

[…] Join Factorization – Jonathan Lewis […]

Pingback by Report query with optional parameters | Jeff Kemp on Oracle — February 15, 2018 @ 2:58 am GMT Feb 15,2018 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

February 14, 2018