This item is, by a roundabout route, a follow-up to yesterday’s note on a critical difference in cardinality estimates that appeared if you used the coalesce() function in its simplest form as a substitute for the nvl() function. Connor McDonald wrote a followup note about how using the nvl() function in a suitable predicate could lead to Oracle splitting a query into a union all (in version 12.2), which led me to go back to a note I’d written on the same topic about 10 years earlier where the precursor of this feature already existed but used concatenation instead of OR-expansion.
The script I’d used for my earlier article was actually one I’d written in February 2003 and tested fairly regularly since – which brings me to this article, because I finally tested my script against 12.2.0.1 to discover a very cute bit of optimisation.
The business of splitting a query into two parts can be used even when the queries are more complex and include joins; this doesn’t always happen automatically and sometimes has to be hinted (but that may be a costs/statistics thing) for example, from 12.1.0.2, a query and its execution plan:
select * from t1, t2 where t1.v1 = nvl(:v1,t1.v1) and t2.n1 = t1.n1 ; --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1001 | 228K| 11 (0)| 00:00:01 | | 1 | CONCATENATION | | | | | | |* 2 | FILTER | | | | | | |* 3 | HASH JOIN | | 1000 | 228K| 8 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 1000 | 106K| 4 (0)| 00:00:01 | |* 5 | TABLE ACCESS FULL | T1 | 1000 | 122K| 4 (0)| 00:00:01 | |* 6 | FILTER | | | | | | | 7 | NESTED LOOPS | | 1 | 234 | 3 (0)| 00:00:01 | | 8 | NESTED LOOPS | | 1 | 234 | 3 (0)| 00:00:01 | | 9 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 125 | 2 (0)| 00:00:01 | |* 10 | INDEX RANGE SCAN | T1_IDX1 | 1 | | 1 (0)| 00:00:01 | |* 11 | INDEX UNIQUE SCAN | T2_PK | 1 | | 0 (0)| 00:00:01 | | 12 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 109 | 1 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(:V1 IS NULL) 3 - access("T2"."N1"="T1"."N1") 5 - filter("T1"."V1" IS NOT NULL) 6 - filter(:V1 IS NOT NULL) 10 - access("T1"."V1"=:V1) 11 - access("T2"."N1"="T1"."N1")
You can see in this plan how Oracle has split the query into two queries combined through concatenation with filter operations at lines 2 (:v1 is null) and 6 (:v1 is not null) to allow the runtime engine to execute only the appropriate branch. You’ll also note that each branch can be optimised separately and in this case the two branches get dramatically different paths because of the enormous difference in the estimated volumes of data.
So let’s move up to 12.2.0.1 and see what happens to this query – but first I’m going to execute a cunning “alter session…” command which I’ll say more about later:
------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1001 | 180K| 11 (0)| 00:00:01 | | 1 | VIEW | VW_ORE_F79C84EE | 1001 | 180K| 11 (0)| 00:00:01 | | 2 | UNION-ALL | | | | | | |* 3 | FILTER | | | | | | | 4 | NESTED LOOPS | | 1 | 234 | 3 (0)| 00:00:01 | | 5 | NESTED LOOPS | | 1 | 234 | 3 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 125 | 2 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | T1_IDX1 | 1 | | 1 (0)| 00:00:01 | |* 8 | INDEX UNIQUE SCAN | T2_PK | 1 | | 0 (0)| 00:00:01 | | 9 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 109 | 1 (0)| 00:00:01 | |* 10 | FILTER | | | | | | |* 11 | HASH JOIN | | 1000 | 228K| 8 (0)| 00:00:01 | | 12 | TABLE ACCESS FULL | T2 | 1000 | 106K| 4 (0)| 00:00:01 | |* 13 | TABLE ACCESS FULL | T1 | 1000 | 122K| 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(:V1 IS NOT NULL) 7 - access("T1"."V1"=:V1) 8 - access("T2"."N1"="T1"."N1") 10 - filter(:V1 IS NULL) 11 - access("T2"."N1"="T1"."N1") 13 - filter("T1"."V1" IS NOT NULL)
There’s nothing terribly exciting about the change – except for the disappearance of the CONCATENATION operator and the appearance of the VIEW and UNION ALL operators to replace it (plus you’ll see that the two branches appear in the opposite order in the plan). But let’s try again, without doing that “alter session…”:
-------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1001 | 229K| 10 (0)| 00:00:01 | |* 1 | HASH JOIN | | 1001 | 229K| 10 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T2 | 1000 | 106K| 4 (0)| 00:00:01 | | 3 | VIEW | VW_JF_SET$A2355C8B | 1001 | 123K| 6 (0)| 00:00:01 | | 4 | UNION-ALL | | | | | | |* 5 | FILTER | | | | | | |* 6 | TABLE ACCESS FULL | T1 | 1000 | 122K| 4 (0)| 00:00:01 | |* 7 | FILTER | | | | | | | 8 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 125 | 2 (0)| 00:00:01 | |* 9 | INDEX RANGE SCAN | T1_IDX1 | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T2"."N1"="ITEM_1") 5 - filter(:V1 IS NULL) 6 - filter("T1"."V1" IS NOT NULL) 7 - filter(:V1 IS NOT NULL) 9 - access("T1"."V1"=:V1)
The plan now shows a view which is a union all involving only table t1 in both its branches. The result set from the view is then used as the probe table of a hash join with t2. You’ll note that the name of the view is now VW_JF_SET$A2355C8B – that’s JF for “Join Factorization”, and the alter session I executed to get the first plan was to disable the feature: ‘alter session set “_optimizer_join_factorization”= false;’.
Join factorization can occur when the optimizer sees a union all view that includes some tables that are common to both (all) branches of the query, and finds that it can move those tables outside the union all view while getting the same end result at a lower cost. In this case it happens to be a nice example of how the optimizer can transform then transform again to get to the lowest cost plan.
It’s worth noting that Join Factorization has been around since 11.2.x.x, and Or Expansion (in the form of concatenation) has been around for even longer – but it’s not until 12.2 that nvl() transforms through the newer OR-expansion, which is what allows it to transform onwards through Join Factorization.
You’ll note, by the way, that with this plan we always do a full tablescan of t2 whereas if we stop after just OR-expansion the tablescan is just a potential threat that may never (or hardly ever) be realised. That’s a point to check if you find that the transformation starts to appear inappropriately on an upgrade. There is a hint to disable the feature for a query, but it’s not trivial to get it right so if you do need to block the feature the smart hint (or SQL Patch) would be “opt_param(‘_optimizer_join_factorization’ ‘false’)”.
Footnote:
If you want to run the experiments yourself, here’s the script I used to generate the data. It’s more complicated than it needs to be because I use the same tables in several different tests:
rem rem Script: null_plan_122.sql rem Author: Jonathan Lewis rem Dated: February 2018 rem Purpose: rem rem Last tested rem 12.2.0.1 Join Factorization rem 12.1.0.2 Concatenation rem rem drop table t2; drop table t1; -- @@setup -- various set commands etc. create table t1 ( n1 number(5), n2 number(5), v1 varchar2(10), v2 varchar2(10), v3 varchar2(10), v4 varchar2(10), v5 varchar2(10), padding varchar2(100), constraint t1_pk primary key(n1) ); insert into t1 select rownum, rownum, rownum, trunc(100 * dbms_random.value), trunc(100 * dbms_random.value), trunc(100 * dbms_random.value), trunc(100 * dbms_random.value), rpad('x',100) from all_objects where rownum <= 1000 -- > comment to avoid WordPress format mess ; create unique index t1_n2 on t1(n2); create index t1_idx1 on t1(v1); create index t1_idx2 on t1(v2,v1); create index t1_idx3 on t1(v3,v2,v1); create table t2 ( n1 number(5), v1 varchar2(10), padding varchar2(100), constraint t2_pk primary key(n1) ); insert into t2 select rownum, rownum, rpad('x',100) from all_objects where rownum <= 1000 -- > comment to avoid WordPress format mess ; create index t2_idx on t2(v1); begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); dbms_stats.gather_table_stats( ownname => user, tabname =>'T2', method_opt => 'for all columns size 1' ); end; / variable n1 number variable n2 number variable v1 varchar2(10) variable v2 varchar2(10) variable v3 varchar2(10) exec :n1 := null exec :n2 := null exec :v1 := null exec :v2 := null exec :v3 := null spool null_plan_122 set autotrace traceonly explain prompt ============================================ prompt One colx = nvl(:b1,colx) predicate with join prompt ============================================ select * from t1, t2 where t1.v1 = nvl(:v1,t1.v1) and t2.n1 = t1.n1 ; alter session set "_optimizer_join_factorization" = false; select * from t1, t2 where t1.v1 = nvl(:v1,t1.v1) and t2.n1 = t1.n1 ; alter session set "_optimizer_join_factorization" = true; set autotrace off spool off
[…] Join Factorization – Jonathan Lewis […]
Pingback by Report query with optional parameters | Jeff Kemp on Oracle — February 15, 2018 @ 2:58 am GMT Feb 15,2018 |