Oracle Scratchpad

Transitive Closure

I have mentioned transitive closure in the past, and the comments on this blog item go into some depth about some of the effects of transitive closure.  However, a recent incoming link has prompted me to point out a couple of further details about the appearance (or non-appearance) of transitive closure.

We start with three (virtually) identical tables, created and populated as follows:


rem
rem     Script:         transitive_loop.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem

create table t1 (
	col1	number,
	v1	varchar2(200)
);           

insert into t1
select
	mod(rownum,100),
	rpad('x',200)
from
	all_objects
where
	rownum <= 2000
;           

commit;

The t2 and t3 tables are identical, with the obvious changes to column names. As a first test, we run the following query with autotrace enabled (using 10gR2 to get the dbms_xplan version of the output):


select t1.*, t2.*, t3.*
from
	t1, t2, t3
where
	t2.col2 = t1.col1
and	t3.col3 = t2.col2
; 

Execution Plan
----------------------------------------------------------
Plan hash value: 1573120526
------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   800K|   464M|   220 |
|*  1 |  HASH JOIN          |      |   800K|   464M|   220 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   396K|     7 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    25 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   396K|     7 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   396K|     7 |
------------------------------------------------------------          

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2")
   3 - access("T2"."COL2"="T1"."COL1")

Check the predicate list. Although transitive closure tells us that Oracle can take “col1 = col2 and col2 = {constant}” and infer that “col1 = {constant}”, it has not used our two predicates in the query above to infer that “t1.col1 = t3.col3”. Transitive closure is relevant only if there are some constants in the mixture. So let’s demonstrate this by adding one more predicate to our original query – change the where clause to the following and check the new execution plan:



where
	t1.col1 = 99
and	t2.col2 = t1.col1
and	t3.col3 = t2.col2
;     

Execution Plan
----------------------------------------------------------
Plan hash value: 1573120526
------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  8000 |  4757K|    24 |
|*  1 |  HASH JOIN          |      |  8000 |  4757K|    24 |
|*  2 |   TABLE ACCESS FULL | T3   |    20 |  4060 |     7 |
|*  3 |   HASH JOIN         |      |   400 |   158K|    16 |
|*  4 |    TABLE ACCESS FULL| T1   |    20 |  4060 |     7 |
|*  5 |    TABLE ACCESS FULL| T2   |    20 |  4060 |     7 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2")
   2 - filter("T3"."COL3"=99)
   3 - access("T2"."COL2"="T1"."COL1")
   4 - filter("T1"."COL1"=99)
   5 - filter("T2"."COL2"=99)

The plan (and its plan_hash_value) hasn’t changed of course, though if we had had some indexes in place we might have seen them coming into play. But notice how adding the one predicate “col1 = 99” has allowed Oracle to generate two more predicates – for col2 and col3 respectively. Note also how the estimated number of rows has dropped from 800K to 8,000.

Let’s try a second test: since the optimizer did not generate the predicate “t1.col1 = t3.col3” let’s see what happens if we add it by hand.


where
	t2.col2 = t1.col1
and	t3.col3 = t2.col2
and	t1.col1 = t3.col3
;   

Execution Plan
----------------------------------------------------------
Plan hash value: 1573120526
------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  8000 |  4757K|   144 |
|*  1 |  HASH JOIN          |      |  8000 |  4757K|   144 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   396K|     7 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    25 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   396K|     7 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   396K|     7 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2" AND "T1"."COL1"="T3"."COL3")
   3 - access("T2"."COL2"="T1"."COL1")

The plan still hasn’t changed of course; but look what our extra predicate has done to the estimated number of rows: once again it has dropped from 800K to 8,000. Our extra predicate has introduced an extra selectivity factor of 1/100 (there are 100 distinct values for t1.col1 and t3.col3) to the formula.

You might see this as a potential advantage, of course. If you can exert some influence over the optimizer’s arithmetic by adding redundant (but logically correct) predicates, it gives you a tool to deal with some of the problems that you might meet while trouble-shooting awkward SQL statements.

However, this approach does introduce a risk: the current behaviour is a mistake – the predicate is redundant, and should not affect the selectivity. If you take advantage of this design error in the optimizer, one day you will have to pay the price when Oracle finally corrects the error and the arithmetic changes to behave as if the predicate did not exist. So if you do fix a problem by adding redundant predicates, make sure you document what you did and why, so that it’s easier for the next person to fix when it all goes wrong again.