## May 18, 2010

### double trouble

Filed under: Execution plans,Performance,Tuning — Jonathan Lewis @ 7:06 pm BST May 18,2010

In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4,  provided the type of answer I was looking for. Sometimes it is more efficient to get a small amount of data from a table on a first pass then go back and get the rest of the data on a second pass – especially if the first pass is an ‘index only’ operation.

I’ve created a little demonstration that gives you some idea of the approach:

```
create table t1
as
with generator as (
select	--+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum					id,
mod(rownum,100)				mod1,
trunc(dbms_random.value(0,10000))	random1,
from
generator	v1,
generator	v2
where
rownum <= 100000
;

create table t2
as
with generator as (
select	--+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum					id,
mod(rownum,100)				mod2,
trunc(dbms_random.value(0,10000))	random2,
from
generator	v1,
generator	v2
where
rownum <= 100000
;

create index t1_i1 on t1(mod1, random1);
create index t2_i1 on t2(mod2, random2);

```

This creates two tables of 100,000 (fairly short) rows. Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 10 rows per value. When I give Oracle the following query, it overestimates the final result set and chooses what I know to be a relatively resource-intensive execution plan:

```select
from
t1,	t2
where
t1.mod1 = 50
and	t2.random2 = t1.random1
and	t2.mod2 = 50
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1045 |   138K|   388 |
|*  1 |  HASH JOIN         |      |  1045 |   138K|   388 |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 68000 |   193 |
|*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   193 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."RANDOM2"="T1"."RANDOM1")
2 - filter("T1"."MOD1"=50)
3 - filter("T2"."MOD2"=50)

```

This plan is dictated largely by the fact that I have to collect quite a lot of data from both tables then eliminate a large fraction of the data I have collected. This pattern is the driver for what I am about to do: I know that I want a small volume of data eventually but if I have to go to the table at every step of the plan then I will have to do a lot of redundant work and carry a lot of redundant data at some point. Remember – it’s often the case that “visiting the table” is the expensive part of any query.

```select
/*+
use_nl(t3)
rowid(t3)
*/
from
t1,
t2,
t1	t3
where
t1.mod1 = 50
and	t2.random2 = t1.random1
and	t2.mod2 = 50
and	t3.rowid = t1.rowid
;

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1045 |   163K|  1244 |
|   1 |  NESTED LOOPS               |       |  1045 |   163K|  1244 |
|*  2 |   HASH JOIN                 |       |  1045 | 90915 |   199 |
|*  3 |    INDEX RANGE SCAN         | T1_I1 |  1000 | 19000 |     4 |
|*  4 |    TABLE ACCESS FULL        | T2    |  1000 | 68000 |   193 |
|   5 |   TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
---------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL\$1
3 - SEL\$1 / T1@SEL\$1
4 - SEL\$1 / T2@SEL\$1
5 - SEL\$1 / T3@SEL\$1

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T2"."RANDOM2"="T1"."RANDOM1")
3 - access("T1"."MOD1"=50)
4 - filter("T2"."MOD2"=50)

```

I don’t think the optimizer can generate a plan like this at present – but I may be wrong. I’ve reduced my workload by taking advantage of an existing index on table t1 to do a range scan that picks up only the columns that I need to join to t2. In this case the t2 access path was still a full tablescan – but even so I have reduced the workload against t1, and by the time I revisit it by rowid I will only be visiting the (relatively) small number of rows I really need.

(Left as an exercise to the reader: I could have written the query as a four part join, visiting both table segments by rowid for just those rows that I really needed; have a go, and check that you’ve got it right. Don’t forget that any references to the “non-index” columns that appear in the query have to be changed to reference the second occurrence of the table – note how I’ve changed t1.padding in my original query to t3.padding in the rewrite.)

Footnote:
If you think this type of path is a little odd take a look at the typical stucture of a nested loop join that appears under “nlj_batching” in 11g (this isnt the same t1 and t2 as above, by the way):

```select
/*+ ordered use_nl(t1) index(t1(n1)) */
t2.n1, t1.n2
from    t2,t1
where
t2.n2 = 45
and     t1.n1 = t2.n1;

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   225 |
|   1 |  NESTED LOOPS                |       |       |
|   2 |   NESTED LOOPS               |       |   225 |
|*  3 |    TABLE ACCESS FULL         | T2    |    15 |
|*  4 |    INDEX RANGE SCAN          | T1_I1 |    15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |
------------------------------------------------------

```

Notice how Oracle can present a single join as two nested loops – one into the index and a second into the table. This is why I think there may be options within the optimizer to do my little trick automatically – if not now, then soon.

### Update June 2012

I’ve just had an exchange of email with an Oak Table member who has pointed me to US patent 8103658 (dated November 2009) – which looks like a remarkably good description of this technique. So maybe the method will become an automatic option for the optimizer some time in the next couple of years.

1. Jonathan,

thank you for this interesting note.

a. question
how do you define “more efficient” ?

b. note
I tried to reproduce your results on
11.2.0.1.0 – 64bit
Red Hat Enterprise Linux AS release 5.3
(system statistics on)
and I get

```11.2.0.1.0 > set autotr traceonly
11.2.0.1.0 > select
/*+
use_nl(t3)
rowid(t3)
*/
from
t1,
t2,
t1	t3
where
t1.mod1 = 50
and	t2.random2 = t1.random1
and	t2.mod2 = 50
and	t3.rowid = t1.ro  2  wid
;
3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18
98 rows selected.

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |  1045 |   151K|   259   (1)| 01:51:14 |
|*  1 |  HASH JOIN	   |	  |  1045 |   151K|   259   (1)| 01:51:14 |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 80000 |   129   (0)| 00:55:31 |
|*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   129   (0)| 00:55:31 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T2"."RANDOM2"="T1"."RANDOM1")
2 - filter("T1"."MOD1"=50)
3 - filter("T2"."MOD2"=50)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
2485  consistent gets
0  redo size
2464  bytes sent via SQL*Net to client
590  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
98  rows processed
```

Comment by Sokrates — May 18, 2010 @ 7:33 pm BST May 18,2010

• Sokrates,

(a): “more efficient” – generally I would interpret this as ‘gets the same result by doing less work’

(b): interesting – I wonder how clever 10g might get with the same type of strategy. It looks like 11.2 has manage to invoke “Join elimination”. I don’t have a copy of 11.2 at hand currently, but I’d check the 10053 trace for references to join elimination. If this is the case then you might want to check the effect of adding the NO_ELIMINATE_JOIN to the query (you may have to add a qb_name hint to the main query block (or make explicit reference to sel\$1) so that you have a named query block in the no_eliminate_join hint.

Comment by Jonathan Lewis — May 18, 2010 @ 11:34 pm BST May 18,2010

• Bingo !

10053 trace shows:
``` JE: Considering Join Elimination on query block SEL\$1 (#0) ************************* Join Elimination (JE) ************************* SQL:******* UNPARSED QUERY IS ******* SELECT /*+ LEADING ("T1" "T2" "T3") ROWID ("T3") USE_NL ("T3") */ "T3"."PADDING" "PADDING","T2"."PADDING" "PADDING" FROM "SOKRATES"."T1" "T1","SOKRATES"."T2" "T2","SOKRATES"."T1" "T3" WHERE "T1"."MOD1"=50 AND "T2"."RANDOM2"="T1"."RANDOM1" AND "T2"."MOD2"=50 AND "T3".ROWID="T1".ROWID JE: cfro: T1 objn:673336 col#:1001 dfro:T1 dcol#:1001 JE: cfro: T1 objn:673336 col#:1001 dfro:T1 dcol#:1001 Query block (0x2b6a8b3c6638) before join elimination: SQL:******* UNPARSED QUERY IS ******* SELECT /*+ LEADING ("T1" "T2" "T3") ROWID ("T3") USE_NL ("T3") */ "T3"."PADDING" "PADDING","T2"."PADDING" "PADDING" FROM "SOKRATES"."T1" "T1","SOKRATES"."T2" "T2","SOKRATES"."T1" "T3" WHERE "T1"."MOD1"=50 AND "T2"."RANDOM2"="T1"."RANDOM1" AND "T2"."MOD2"=50 AND "T1".ROWID="T3".ROWID JE: eliminate table: T1 (T3) JE: Replaced column: T3.PADDING with column: T1.PADDING Registered qb: SEL\$E9892BDE 0x8b3c6638 (JOIN REMOVED FROM QUERY BLOCK SEL\$1; SEL\$1; "T3"@"SEL\$1") --------------------- QUERY BLOCK SIGNATURE --------------------- signature (): qb_name=SEL\$E9892BDE nbfros=2 flg=0 fro(0): flg=0 objn=673336 hint_alias="T1"@"SEL\$1" fro(1): flg=0 objn=673337 hint_alias="T2"@"SEL\$1"```

``` ```

however, I don’t succeed in using NO_ELIMINATE_JOIN in order to let the Optimizer not eliminate the join.

Comment by Sokrates — May 19, 2010 @ 9:19 am BST May 19,2010

• Sokrates,

Did you try no_eliminate_join(t3) ? I failed to point out that you have to reference the alias of the table whose join should not be eliminated.

Comment by Jonathan Lewis — May 19, 2010 @ 9:52 am BST May 19,2010

2. you’re using hints anyway, so wouldn’t the following do the trick:

```select
/*+ INDEX (t1 t1_i1) INDEX(t2 t2_i1) */
from
t1,  t2
where
t1.mod1 = 50
and  t2.random2 = t1.random1
and  t2.mod2 = 50
;
```

Comment by Michael V — May 19, 2010 @ 8:57 am BST May 19,2010

• Michael,
If the query uses the two indexes in the way you suggest it has to visit all the rows that match mod1 = 50 before discarding about 90% of them after probing the t2 index. The intention is to avoid visiting the table for that 90%.

Don’t worry about trying to optimise this specific statement, by the way, it’s too trivial and too small to worry about – I’ve just used it to demonstrate the mechanism. The suggested exercise of turning it into a 4-table join is also not a realistic one, since a simple index hint to t2 will (in this case) allow Oracle to visit nothing more than the rows that are really needed and we don’t need to do anything clever to limit the number of block visits to t2.

Comment by Jonathan Lewis — May 19, 2010 @ 9:34 am BST May 19,2010

3. NO_ELIMINATE_JOIN worked in this form:

`no_eliminate_join(t1) no_eliminate_join(t3)`

Comment by Timur Akhmadeev — May 19, 2010 @ 10:17 am BST May 19,2010

• Timur,

Thanks.

An important trap in a case like this is to following human thinking rather than machine “thinking”. My first thought was “Oracle is eliminating t3”, but as your suggestion shows – the problem has symmetry, if you stop Oracle from eliminating t3 it can rewrite the query to eliminate t1.

Comment by Jonathan Lewis — May 23, 2010 @ 1:16 pm BST May 23,2010

4. you wrote
“Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 100 rows per value.”

correct is:
the random columns which return approximately 10 rows per value.

``` select avg(co) from (select random1, count(*) co from t1 group by random1);```

``` ```

``` AVG(CO) ---------- 10 ```

and there are 10.000 distinct random values.

I assume it was a typo, when you replace “trunc(dbms_random.value(0,10000))” by “trunc(dbms_random.value(0,1000))”, your example still works.

However, I still don’t understand your statement “[Oracle] overestimates the final result set”
How do you see it overestimates the final result set ?

Comment by Sokrates — May 20, 2010 @ 9:42 am BST May 20,2010

• Sokrates,

Thanks for that – now corrected.
It was a typo in the original script, which I cloned for the second table. The rest of the posting is consistent with the script.
I saw it overestimating the result because I ran the query and got about 96 rows when the optimizer was predicting 1,000. (If you change the script, though, the actual result happens to come quite close to the prediction.)

Comment by Jonathan Lewis — May 23, 2010 @ 1:19 pm BST May 23,2010

5. This SQL is close to 4-table join, using “WITH” clause and materialize hint. I tested in a 11.2 database. In my testing, it did about 300 consistent gets, about 1400 less than 3-table join one. It only processed the rows that were in the final result set.

```with
v1 as (
select --+ materialize
rowid r1, mod1, random1
from t1
where mod1 = 50
),
v2 as (
select --+ materialize
rowid r2, mod2, random2
from t2
where mod2 = 50
),
v3 as (
select --+ materialize
v1.r1 r1, v2.r2 r2
from v1, v2
where v1.random1 = v2.random2
)
select --+ leading(v3 t1 v2) use_nl(v3) use_nl(t1)
from
t1, t2, v3
where
t1.rowid = v3.r1
and	t2.rowid = v3.r2
/
```

Comment by Bo — May 21, 2010 @ 12:03 am BST May 21,2010

• Bo,

Thanks for the example.

I really like the way you’ve used subquery factoring to introduce clarity to what you’re doing here – and your solution is valid. (The hint use_nl(v3) is presumably a typo for use_nl(t2) but on my machine the nested loop to t2 still appeared with the logical I/O you reported.)

A couple of general points, though. I would avoid using /*+ materialize */ unless really necessary since this will write and read to the temporary tablespace – the direct I/O may have a higher price than the saving you generate. Secondly, as a matter of personal taste – I would not have introduced the third subquery: the first two subqueries add clarity to the solution but, to me, the third one doesn’t add value. For my reference set of scripts, I’ve taken your idea and written a query that ends up joining v1, v2, t1, t2.

Comment by Jonathan Lewis — May 23, 2010 @ 1:27 pm BST May 23,2010

6. Hi Jonathan,
reading your interesting post I thought what would happen if I include an order as follows

```create table t1
as
SELECT * FROM (
with generator as (
select  --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum                  id,
mod(rownum,100)             mod1,
trunc(dbms_random.value(0,10000))   random1,
from
generator   v1,
generator   v2
where
rownum <= 100000)
ORDER BY MOD1, RANDOM1
;

create table t2
as
SELECT * FROM (
with generator as (
select  --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum                  id,
mod(rownum,100)             mod2,
trunc(dbms_random.value(0,10000))   random2,
from
generator   v1,
generator   v2
where
rownum <= 100000
)
ORDER BY  MOD2, RANDOM2
;
```

I'm expecting an improvement in the clustering factor of two indexes and that the data in the table are distributed in adjacent blocks.

I'm expecting that Oracle then uses the two indexes so efficient and that I should not recur to the use hints (A forcing more than a suggestion, which I do not like do unless you have a need), or use
this his valuable suggestions.

I'd like to have your confirmation on this consideration

thanks

Comment by Donatello Settembrino — May 21, 2010 @ 8:38 am BST May 21,2010

7. Donatello,

You are correct. If we have a system where the data we want is well placed, then we don’t have to play games to create “damage-limiting” access paths. But in the general case we usually find that if we physically arrange the data to suit one set of queries we have other queries that want it arranged differently.

Comment by Jonathan Lewis — May 23, 2010 @ 1:31 pm BST May 23,2010

8. Jonathan,
I missed the quiz night !.

Previously for these sql’s i used to use a subquery instead of adding the 3rd table.

This was the original query used by developers

```SELECT 	a.address1, a.address2, a.address3, b.cust_day_phone, b.DOB
CUSTOMER b,
CUST_CONTACT_REL c
WHERE 	a.cms_acct_nbr = b.cms_acct_nbr
AND	b.cms_acct_nbr = c.cms_acct_nbr
AND 	a.STATE = 'ABCD'
AND 	a.CITY = 'DEF'
AND	b.F_NAME = 'GHI'
```

After applying the predicates, i get 45093 rows from CUSTOMER table and 4214808 rows from ADDRESS table. After joining them i get 1018.

I created the following indexes

composite index on STATE, CITY & CMS_ACCT_NBR on ADDRESS table.
composite index on F_NAME, CMS_ACCT_NBR on CUSTOMER table.

To use hash join after a range scan on these indexes i used this subquery

```SELECT 	a.address1, a.address2, a.address3, b.cust_day_phone, b.DOB
FROM 	(
select	a1.cms_acct_nbr
CUSTOMER    b1
WHERE 	a1.STATE = 'AP'
AND   	b1.F_NAME = 'VEERA'
AND   	a1.cms_acct_nbr = b1.cms_acct_nbr
) iv,
CUSTOMER b,
CUST_CONTACT_REL c
WHERE 	iv.cms_acct_nbr = a.cms_acct_nbr
AND	a.cms_acct_nbr = b.cms_acct_nbr
AND	b.cms_acct_nbr = c.cms_acct_nbr
```

I missed the idea, i could have used the rowid’s

thanks & regards
srivenu

Comment by srivenu — May 28, 2010 @ 1:04 pm BST May 28,2010

• Srivenu,

I missed the idea, i could have used the rowid’s

The use of rowids is just a little bonus – you got the important bit, which is recognising that you can use an index as if it were a just another table.

Comment by Jonathan Lewis — May 29, 2010 @ 10:10 am BST May 29,2010

9. […] Manual Optimisation Filed under: Execution plans,Hints,Indexing,Tuning — Jonathan Lewis @ 6:00 pm UTC Oct 8,2010 Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times. […]

Pingback by Manual Optimisation « Oracle Scratchpad — October 8, 2010 @ 6:04 pm BST Oct 8,2010

10. […] another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution […]

Pingback by Index Join – 2 « Oracle Scratchpad — November 26, 2010 @ 6:39 pm BST Nov 26,2010

11. […] while ago I published a note explaining how it was possible to find queries which ran faster if you manually de-coupled the index and table accesses. Here’s a further example that came up in discussion on a client site recently. The query […]

Pingback by Star Transformation « Oracle Scratchpad — April 22, 2011 @ 6:16 pm BST Apr 22,2011

12. In the situation that you’ve described it this post it make sence to put the sorting before join by rowid.
I’ve changed distribution of the data (marked by — *) so that advantage will be particularly noticeable.
As I run the script on the 11.2.0.1 I will put “_optimizer_join_elimination_enabled” into the your variant of the query.
I have also provided the query with enhancement for both tables – the second query. I’ve copied the outline data into the hint.
And the third query is the same query but with ordering by rowid before the joining.
The results for the “consistent gets” are 2136 for first, 1365 for second, 34 for third.
Of course, the distribution is too specific, in the real situation the advantage may not be so large.

The idea is that Oracle have to jump between the few blocks because of after the HJ the rowids will be misordered.

Test script:

```drop table t1 purge;
drop table t2 purge;

create table t1
as
with generator as (
select  --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
trunc(rownum / 1000) mod1, -- *
trunc(dbms_random.value(0,1000)) random1, -- *
from
generator  v1,
generator  v2
where
rownum <= 100000
;

create table t2
as
with generator as (
select  --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
trunc(rownum / 1000) mod2, -- *
trunc(dbms_random.value(0,1000)) random2, -- *
from
generator  v1,
generator  v2
where
rownum <= 100000
;

create index t1_i1 on t1(mod1, random1);
create index t2_i1 on t2(mod2, random2);

begin
dbms_stats.gather_table_stats(user, 't1', estimate_percent => 100, method_opt => 'for all columns size 1');
dbms_stats.gather_table_stats(user, 't2', estimate_percent => 100, method_opt => 'for all columns size 1');
end;
/

alter session set "_optimizer_cost_model" = io;

set autot on

-- first

select /*+
opt_param('_optimizer_join_elimination_enabled' 'false')
use_nl(t3)
rowid(t3)
*/
from t1
, t2
, t1 t3
where t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
and t3.rowid = t1.rowid;

-- second

select  /*+
OUTLINE_LEAF(@"SEL\$3")
OUTLINE_LEAF(@"SEL\$2")
OUTLINE_LEAF(@"SEL\$1")
NO_ACCESS(@"SEL\$1" "TT"@"SEL\$1")
USE_NL(@"SEL\$1" "T4"@"SEL\$1")
NO_ACCESS(@"SEL\$2" "T"@"SEL\$2")
USE_NL(@"SEL\$2" "T3"@"SEL\$2")
*/
from
(
select
t.rid2
from
(
select
t1.rowid rid1,
t2.rowid rid2
from t1, t2
where t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
) t
, t1 t3
where  t3.rowid = t.rid1
) tt
, t2 t4
where tt.rid2 = t4.rowid;

-- third

select /*+
OPT_PARAM('_optimizer_order_by_elimination_enabled' 'false')
OUTLINE_LEAF(@"SEL\$3")
OUTLINE_LEAF(@"SEL\$2")
OUTLINE_LEAF(@"SEL\$1")
NO_ACCESS(@"SEL\$1" "TT"@"SEL\$1")
ROWID(@"SEL\$1" "T4"@"SEL\$1")
USE_NL(@"SEL\$1" "T4"@"SEL\$1")
NO_ACCESS(@"SEL\$2" "T"@"SEL\$2")
ROWID(@"SEL\$2" "T3"@"SEL\$2")
USE_NL(@"SEL\$2" "T3"@"SEL\$2")
INDEX(@"SEL\$3" "T1"@"SEL\$3" ("T1"."MOD1" "T1"."RANDOM1"))
INDEX(@"SEL\$3" "T2"@"SEL\$3" ("T2"."MOD2" "T2"."RANDOM2"))
USE_HASH(@"SEL\$3" "T2"@"SEL\$3")
*/
from
(
select
t.rid2
from
(
select
t1.rowid rid1,
t2.rowid rid2
from t1, t2
where t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
order by rid1
) t
, t1 t3
where t3.rowid = t.rid1
order by rid2
) tt
, t2 t4
where tt.rid2 = t4.rowid;```

Results:

```
SQL*Plus: Release 11.2.0.1.0 Production on Tue Jan 31 21:08:42 2012

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> drop table t1 purge;

Table dropped.

SQL> drop table t2 purge;

Table dropped.

SQL>
SQL> create table t1
2  as
3  with generator as (
4    select  --+ materialize
5      rownum id
6    from dual
7    connect by
8      rownum <= 10000
9  )
10  select
11    rownum id,
12    trunc(rownum / 1000) mod1, -- *
13    trunc(dbms_random.value(0,1000)) random1, -- *
16  from
17    generator  v1,
18    generator  v2
19  where
20    rownum <= 100000
21  ;

Table created.

SQL>
SQL> create table t2
2  as
3  with generator as (
4    select  --+ materialize
5      rownum id
6    from dual
7    connect by
8      rownum <= 10000
9  )
10  select
11    rownum id,
12    trunc(rownum / 1000) mod2, -- *
13    trunc(dbms_random.value(0,1000)) random2, -- *
16  from
17    generator  v1,
18    generator  v2
19  where
20    rownum <= 100000
21  ;

Table created.

SQL>
SQL> create index t1_i1 on t1(mod1, random1);

Index created.

SQL> create index t2_i1 on t2(mod2, random2);

Index created.

SQL>
SQL> begin
2    dbms_stats.gather_table_stats(user, 't1', estimate_percent => 100, method_opt => 'for all columns size 1');
3    dbms_stats.gather_table_stats(user, 't2', estimate_percent => 100, method_opt => 'for all columns size 1');
4  end;
5  /

PL/SQL procedure successfully completed.

SQL>
SQL> alter session set "_optimizer_cost_model" = io;

Session altered.

SQL>
SQL> set autot on
SQL>
SQL> -- first
SQL>
SQL> select /*+
2           opt_param('_optimizer_join_elimination_enabled' 'false')
4           use_nl(t3)
5           rowid(t3)
6         */
8  from t1
9     , t2
10     , t1 t3
11  where t1.mod1 = 50
12    and t2.random2 = t1.random1
13    and t2.mod2 = 50
14    and t3.rowid = t1.rowid;

CNT1       CNT2
---------- ----------
974        974

Execution Plan
----------------------------------------------------------
Plan hash value: 3542293915

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   160 |  1753 |
|   1 |  SORT AGGREGATE              |       |     1 |   160 |       |
|   2 |   NESTED LOOPS               |       |  1554 |   242K|  1753 |
|*  3 |    HASH JOIN                 |       |  1554 |   132K|   199 |
|*  4 |     INDEX RANGE SCAN         | T1_I1 |   990 | 18810 |     4 |
|*  5 |     TABLE ACCESS FULL        | T2    |   990 | 67320 |   193 |
|   6 |    TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T2"."RANDOM2"="T1"."RANDOM1")
4 - access("T1"."MOD1"=50)
5 - filter("T2"."MOD2"=50)

Note
-----
- cpu costing is off (consider enabling it)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
2136  consistent gets
0  redo size
481  bytes sent via SQL*Net to client
416  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

SQL>
SQL> -- second
SQL>
SQL> select  /*+
2            OUTLINE_LEAF(@"SEL\$3")
3            OUTLINE_LEAF(@"SEL\$2")
4            OUTLINE_LEAF(@"SEL\$1")
5            NO_ACCESS(@"SEL\$1" "TT"@"SEL\$1")
7            USE_NL(@"SEL\$1" "T4"@"SEL\$1")
8            NO_ACCESS(@"SEL\$2" "T"@"SEL\$2")
10            USE_NL(@"SEL\$2" "T3"@"SEL\$2")
12          */
14  from
15  (
16    select
18          t.rid2
19    from
20    (
21      select
22             t1.rowid rid1,
23             t2.rowid rid2
24      from t1, t2
25      where t1.mod1 = 50
26        and t2.random2 = t1.random1
27        and t2.mod2 = 50
28    ) t
29    , t1 t3
30    where  t3.rowid = t.rid1
31  ) tt
32  , t2 t4
33  where tt.rid2 = t4.rowid;

CNT1       CNT2
---------- ----------
974        974

Execution Plan
----------------------------------------------------------
Plan hash value: 1174882150

------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |     1 |   117 |  3118 |
|   1 |  SORT AGGREGATE                |       |     1 |   117 |       |
|   2 |   NESTED LOOPS                 |       |  1554 |   177K|  3118 |
|   3 |    VIEW                        |       |  1554 | 68376 |  1564 |
|   4 |     NESTED LOOPS               |       |  1554 |   147K|  1564 |
|   5 |      VIEW                      |       |  1554 | 37296 |    10 |
|*  6 |       HASH JOIN                |       |  1554 | 59052 |    10 |
|*  7 |        INDEX RANGE SCAN        | T1_I1 |   990 | 18810 |     4 |
|*  8 |        INDEX RANGE SCAN        | T2_I1 |   990 | 18810 |     4 |
|   9 |      TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
|  10 |    TABLE ACCESS BY USER ROWID  | T2    |     1 |    73 |     1 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("T2"."RANDOM2"="T1"."RANDOM1")
7 - access("T1"."MOD1"=50)
8 - access("T2"."MOD2"=50)

Note
-----
- cpu costing is off (consider enabling it)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
1365  consistent gets
0  redo size
481  bytes sent via SQL*Net to client
416  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

SQL>
SQL> -- third
SQL>
SQL> select /*+
2           OPT_PARAM('_optimizer_order_by_elimination_enabled' 'false')
3           OUTLINE_LEAF(@"SEL\$3")
4           OUTLINE_LEAF(@"SEL\$2")
5           OUTLINE_LEAF(@"SEL\$1")
6           NO_ACCESS(@"SEL\$1" "TT"@"SEL\$1")
7           ROWID(@"SEL\$1" "T4"@"SEL\$1")
9           USE_NL(@"SEL\$1" "T4"@"SEL\$1")
10           NO_ACCESS(@"SEL\$2" "T"@"SEL\$2")
11           ROWID(@"SEL\$2" "T3"@"SEL\$2")
13           USE_NL(@"SEL\$2" "T3"@"SEL\$2")
14           INDEX(@"SEL\$3" "T1"@"SEL\$3" ("T1"."MOD1" "T1"."RANDOM1"))
15           INDEX(@"SEL\$3" "T2"@"SEL\$3" ("T2"."MOD2" "T2"."RANDOM2"))
17          USE_HASH(@"SEL\$3" "T2"@"SEL\$3")
18         */
20  from
21  (
22    select
24          t.rid2
25    from (select
26                 t1.rowid rid1,
27                 t2.rowid rid2
28          from t1, t2
29          where t1.mod1 = 50
30            and t2.random2 = t1.random1
31            and t2.mod2 = 50
32          order by rid1) t
33       , t1 t3
34    where    t3.rowid = t.rid1
35    order by rid2
36  ) tt
37  , t2 t4
38  where tt.rid2 = t4.rowid;

CNT1       CNT2
---------- ----------
974        974

Execution Plan
----------------------------------------------------------
Plan hash value: 2600095256

-------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |       |     1 |   117 |  3210 |
|   1 |  SORT AGGREGATE                 |       |     1 |   117 |       |
|   2 |   NESTED LOOPS                  |       |  1554 |   177K|  3210 |
|   3 |    VIEW                         |       |  1554 | 68376 |  1656 |
|   4 |     SORT ORDER BY               |       |  1554 |   147K|  1656 |
|   5 |      NESTED LOOPS               |       |  1554 |   147K|  1603 |
|   6 |       VIEW                      |       |  1554 | 37296 |    49 |
|   7 |        SORT ORDER BY            |       |  1554 | 59052 |    49 |
|*  8 |         HASH JOIN               |       |  1554 | 59052 |    10 |
|*  9 |          INDEX RANGE SCAN       | T1_I1 |   990 | 18810 |     4 |
|* 10 |          INDEX RANGE SCAN       | T2_I1 |   990 | 18810 |     4 |
|  11 |       TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
|  12 |    TABLE ACCESS BY USER ROWID   | T2    |     1 |    73 |     1 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("T2"."RANDOM2"="T1"."RANDOM1")
9 - access("T1"."MOD1"=50)
10 - access("T2"."MOD2"=50)

Note
-----
- cpu costing is off (consider enabling it)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
34  consistent gets
0  redo size
481  bytes sent via SQL*Net to client
416  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
2  sorts (memory)
0  sorts (disk)
1  rows processed

SQL>```

Comment by Valentin Nikotin — January 31, 2012 @ 7:21 pm BST Jan 31,2012

13. […] my old “two-step” approach to visiting tables and indexes. Get the rowids you really need, and visit the table later. In this case, though, I’ve sorted […]

Pingback by Compression Units – 5 « Oracle Scratchpad — August 19, 2012 @ 6:03 pm BST Aug 19,2012

14. […] written several examples of how we can optimise SQL manually by rewriting it to introduce extra copies of some of the tables (typically in a fashion analogous to the optimizer’s mechanism for star […]

Pingback by Table Duplication | Oracle Scratchpad — January 3, 2015 @ 11:54 am BST Jan 3,2015

15. […] of course, is just applying two principles that I’ve discussed before: (a) be selective about using a table twice to reduce the workload, (b) aggregate early if you can reduce the scale of the […]

Pingback by Cartesian join | Oracle Scratchpad — April 15, 2015 @ 6:41 pm BST Apr 15,2015