Oracle Scratchpad

May 18, 2010

double trouble

Filed under: Execution plans,Performance,Tuning — Jonathan Lewis @ 7:06 pm BST May 18,2010

In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4,  provided the type of answer I was looking for. Sometimes it is more efficient to get a small amount of data from a table on a first pass then go back and get the rest of the data on a second pass – especially if the first pass is an ‘index only’ operation.

I’ve created a little demonstration that gives you some idea of the approach:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum					id,
	mod(rownum,100)				mod1,
	trunc(dbms_random.value(0,10000))	random1,
	lpad(rownum,10,'0')			small_vc,
	rpad('x',60)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum					id,
	mod(rownum,100)				mod2,
	trunc(dbms_random.value(0,10000))	random2,
	lpad(rownum,10,'0')			small_vc,
	rpad('x',60)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

create index t1_i1 on t1(mod1, random1);
create index t2_i1 on t2(mod2, random2);

This creates two tables of 100,000 (fairly short) rows. Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 10 rows per value. When I give Oracle the following query, it overestimates the final result set and chooses what I know to be a relatively resource-intensive execution plan:

select
	t1.padding,
	t2.padding
from
	t1,	t2
where
	t1.mod1 = 50
and	t2.random2 = t1.random1
and	t2.mod2 = 50
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1045 |   138K|   388 |
|*  1 |  HASH JOIN         |      |  1045 |   138K|   388 |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 68000 |   193 |
|*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   193 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."RANDOM2"="T1"."RANDOM1")
   2 - filter("T1"."MOD1"=50)
   3 - filter("T2"."MOD2"=50)

This plan is dictated largely by the fact that I have to collect quite a lot of data from both tables then eliminate a large fraction of the data I have collected. This pattern is the driver for what I am about to do: I know that I want a small volume of data eventually but if I have to go to the table at every step of the plan then I will have to do a lot of redundant work and carry a lot of redundant data at some point. Remember – it’s often the case that “visiting the table” is the expensive part of any query.

select
	/*+
		leading(t1 t2 t3)
		use_nl(t3)
		rowid(t3)
	*/
	t3.padding,
	t2.padding
from
	t1,
	t2,
	t1	t3
where
	t1.mod1 = 50
and	t2.random2 = t1.random1
and	t2.mod2 = 50
and	t3.rowid = t1.rowid
;

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1045 |   163K|  1244 |
|   1 |  NESTED LOOPS               |       |  1045 |   163K|  1244 |
|*  2 |   HASH JOIN                 |       |  1045 | 90915 |   199 |
|*  3 |    INDEX RANGE SCAN         | T1_I1 |  1000 | 19000 |     4 |
|*  4 |    TABLE ACCESS FULL        | T2    |  1000 | 68000 |   193 |
|   5 |   TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
---------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   3 - SEL$1 / T1@SEL$1
   4 - SEL$1 / T2@SEL$1
   5 - SEL$1 / T3@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."RANDOM2"="T1"."RANDOM1")
   3 - access("T1"."MOD1"=50)
   4 - filter("T2"."MOD2"=50)

I don’t think the optimizer can generate a plan like this at present – but I may be wrong. I’ve reduced my workload by taking advantage of an existing index on table t1 to do a range scan that picks up only the columns that I need to join to t2. In this case the t2 access path was still a full tablescan – but even so I have reduced the workload against t1, and by the time I revisit it by rowid I will only be visiting the (relatively) small number of rows I really need.

(Left as an exercise to the reader: I could have written the query as a four part join, visiting both table segments by rowid for just those rows that I really needed; have a go, and check that you’ve got it right. Don’t forget that any references to the “non-index” columns that appear in the query have to be changed to reference the second occurrence of the table – note how I’ve changed t1.padding in my original query to t3.padding in the rewrite.)

Footnote:
If you think this type of path is a little odd take a look at the typical stucture of a nested loop join that appears under “nlj_batching” in 11g (this isnt the same t1 and t2 as above, by the way):

select
        /*+ ordered use_nl(t1) index(t1(n1)) */
        t2.n1, t1.n2
from    t2,t1
where
        t2.n2 = 45
and     t1.n1 = t2.n1;

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   225 |
|   1 |  NESTED LOOPS                |       |       |
|   2 |   NESTED LOOPS               |       |   225 |
|*  3 |    TABLE ACCESS FULL         | T2    |    15 |
|*  4 |    INDEX RANGE SCAN          | T1_I1 |    15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |
------------------------------------------------------

Notice how Oracle can present a single join as two nested loops – one into the index and a second into the table. This is why I think there may be options within the optimizer to do my little trick automatically – if not now, then soon.

Update June 2012

I’ve just had an exchange of email with an Oak Table member who has pointed me to US patent 8103658 (dated November 2009) – which looks like a remarkably good description of this technique. So maybe the method will become an automatic option for the optimizer some time in the next couple of years.

21 Comments »

  1. Jonathan,

    thank you for this interesting note.

    a. question
    how do you define “more efficient” ?

    b. note
    I tried to reproduce your results on
    11.2.0.1.0 – 64bit
    Red Hat Enterprise Linux AS release 5.3
    (system statistics on)
    and I get

    11.2.0.1.0 > set autotr traceonly
    11.2.0.1.0 > select
    	/*+
    		leading(t1 t2 t3)
    		use_nl(t3)
    		rowid(t3)
    	*/
    	t3.padding,
    	t2.padding
    from
    	t1,
    	t2,
    	t1	t3
    where
    	t1.mod1 = 50
    and	t2.random2 = t1.random1
    and	t2.mod2 = 50
    and	t3.rowid = t1.ro  2  wid
    ;
      3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18  
    98 rows selected.
    
    Elapsed: 00:00:00.04
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1838229974
    
    ---------------------------------------------------------------------------
    | Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |	  |  1045 |   151K|   259   (1)| 01:51:14 |
    |*  1 |  HASH JOIN	   |	  |  1045 |   151K|   259   (1)| 01:51:14 |
    |*  2 |   TABLE ACCESS FULL| T1   |  1000 | 80000 |   129   (0)| 00:55:31 |
    |*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   129   (0)| 00:55:31 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("T2"."RANDOM2"="T1"."RANDOM1")
       2 - filter("T1"."MOD1"=50)
       3 - filter("T2"."MOD2"=50)
    
    
    Statistics
    ----------------------------------------------------------
    	  0  recursive calls
    	  0  db block gets
           2485  consistent gets
    	  0  physical reads
    	  0  redo size
           2464  bytes sent via SQL*Net to client
    	590  bytes received via SQL*Net from client
    	  8  SQL*Net roundtrips to/from client
    	  0  sorts (memory)
    	  0  sorts (disk)
    	 98  rows processed
    

    Comment by Sokrates — May 18, 2010 @ 7:33 pm BST May 18,2010 | Reply

    • Sokrates,

      (a): “more efficient” – generally I would interpret this as ‘gets the same result by doing less work’

      (b): interesting – I wonder how clever 10g might get with the same type of strategy. It looks like 11.2 has manage to invoke “Join elimination”. I don’t have a copy of 11.2 at hand currently, but I’d check the 10053 trace for references to join elimination. If this is the case then you might want to check the effect of adding the NO_ELIMINATE_JOIN to the query (you may have to add a qb_name hint to the main query block (or make explicit reference to sel$1) so that you have a named query block in the no_eliminate_join hint.

      Comment by Jonathan Lewis — May 18, 2010 @ 11:34 pm BST May 18,2010 | Reply

      • Bingo !

        10053 trace shows:

        JE: Considering Join Elimination on query block SEL$1 (#0)
        *************************
        Join Elimination (JE)
        *************************
        SQL:******* UNPARSED QUERY IS *******
        SELECT /*+ LEADING ("T1" "T2" "T3") ROWID ("T3") USE_NL ("T3") */ "T3"."PADDING" "PADDING","T2"."PADDING" "PADDING" FROM "SOKRATES"."T1" "T1","SOKRATES"."T2" "T2","SOKRATES"."T1" "T3" WHERE "T1"."MOD1"=50 AND "T2"."RANDOM2"="T1"."RANDOM1" AND "T2"."MOD2"=50 AND "T3".ROWID="T1".ROWID
        JE: cfro: T1 objn:673336 col#:1001 dfro:T1 dcol#:1001
        JE: cfro: T1 objn:673336 col#:1001 dfro:T1 dcol#:1001
        Query block (0x2b6a8b3c6638) before join elimination:
        SQL:******* UNPARSED QUERY IS *******
        SELECT /*+ LEADING ("T1" "T2" "T3") ROWID ("T3") USE_NL ("T3") */ "T3"."PADDING" "PADDING","T2"."PADDING" "PADDING" FROM "SOKRATES"."T1" "T1","SOKRATES"."T2" "T2","SOKRATES"."T1" "T3" WHERE "T1"."MOD1"=50 AND "T2"."RANDOM2"="T1"."RANDOM1" AND "T2"."MOD2"=50 AND "T1".ROWID="T3".ROWID
        JE: eliminate table: T1 (T3)
        JE: Replaced column: T3.PADDING with column: T1.PADDING
        Registered qb: SEL$E9892BDE 0x8b3c6638 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; "T3"@"SEL$1")
        ---------------------
        QUERY BLOCK SIGNATURE
        ---------------------
        signature (): qb_name=SEL$E9892BDE nbfros=2 flg=0
        fro(0): flg=0 objn=673336 hint_alias="T1"@"SEL$1"
        fro(1): flg=0 objn=673337 hint_alias="T2"@"SEL$1"

        however, I don’t succeed in using NO_ELIMINATE_JOIN in order to let the Optimizer not eliminate the join.

        Comment by Sokrates — May 19, 2010 @ 9:19 am BST May 19,2010 | Reply

        • Sokrates,

          Did you try no_eliminate_join(t3) ? I failed to point out that you have to reference the alias of the table whose join should not be eliminated.

          Comment by Jonathan Lewis — May 19, 2010 @ 9:52 am BST May 19,2010

  2. you’re using hints anyway, so wouldn’t the following do the trick:

    select
      /*+ INDEX (t1 t1_i1) INDEX(t2 t2_i1) */
      t1.padding,
      t2.padding
    from
      t1,  t2
    where
      t1.mod1 = 50
    and  t2.random2 = t1.random1
    and  t2.mod2 = 50
    ;
    

    Comment by Michael V — May 19, 2010 @ 8:57 am BST May 19,2010 | Reply

    • Michael,
      If the query uses the two indexes in the way you suggest it has to visit all the rows that match mod1 = 50 before discarding about 90% of them after probing the t2 index. The intention is to avoid visiting the table for that 90%.

      Don’t worry about trying to optimise this specific statement, by the way, it’s too trivial and too small to worry about – I’ve just used it to demonstrate the mechanism. The suggested exercise of turning it into a 4-table join is also not a realistic one, since a simple index hint to t2 will (in this case) allow Oracle to visit nothing more than the rows that are really needed and we don’t need to do anything clever to limit the number of block visits to t2.

      Comment by Jonathan Lewis — May 19, 2010 @ 9:34 am BST May 19,2010 | Reply

  3. NO_ELIMINATE_JOIN worked in this form:

    no_eliminate_join(t1) no_eliminate_join(t3)

    Comment by Timur Akhmadeev — May 19, 2010 @ 10:17 am BST May 19,2010 | Reply

    • Timur,

      Thanks.

      An important trap in a case like this is to following human thinking rather than machine “thinking”. My first thought was “Oracle is eliminating t3″, but as your suggestion shows – the problem has symmetry, if you stop Oracle from eliminating t3 it can rewrite the query to eliminate t1.

      Comment by Jonathan Lewis — May 23, 2010 @ 1:16 pm BST May 23,2010 | Reply

  4. you wrote
    “Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 100 rows per value.”

    correct is:
    the random columns which return approximately 10 rows per value.


    select avg(co) from (select random1, count(*) co from t1 group by random1);

    AVG(CO)
    ----------
    10

    and there are 10.000 distinct random values.

    I assume it was a typo, when you replace “trunc(dbms_random.value(0,10000))” by “trunc(dbms_random.value(0,1000))”, your example still works.

    However, I still don’t understand your statement “[Oracle] overestimates the final result set”
    How do you see it overestimates the final result set ?

    Comment by Sokrates — May 20, 2010 @ 9:42 am BST May 20,2010 | Reply

    • Sokrates,

      Thanks for that – now corrected.
      It was a typo in the original script, which I cloned for the second table. The rest of the posting is consistent with the script.
      I saw it overestimating the result because I ran the query and got about 96 rows when the optimizer was predicting 1,000. (If you change the script, though, the actual result happens to come quite close to the prediction.)

      Comment by Jonathan Lewis — May 23, 2010 @ 1:19 pm BST May 23,2010 | Reply

  5. This SQL is close to 4-table join, using “WITH” clause and materialize hint. I tested in a 11.2 database. In my testing, it did about 300 consistent gets, about 1400 less than 3-table join one. It only processed the rows that were in the final result set.

    with
    v1 as (
    	select --+ materialize
    	rowid r1, mod1, random1
    	from t1
    	where mod1 = 50
    ),
    v2 as (
    	select --+ materialize
    	rowid r2, mod2, random2
    	from t2
    	where mod2 = 50
    ),
    v3 as (
    	select --+ materialize
    	v1.r1 r1, v2.r2 r2
    	from v1, v2
    	where v1.random1 = v2.random2
    )
    select --+ leading(v3 t1 v2) use_nl(v3) use_nl(t1)
    	t1.padding,
    	t2.padding
    from
    	t1, t2, v3
    where
    	t1.rowid = v3.r1
    and	t2.rowid = v3.r2
    /
    

    Comment by Bo — May 21, 2010 @ 12:03 am BST May 21,2010 | Reply

    • Bo,

      Thanks for the example.

      I really like the way you’ve used subquery factoring to introduce clarity to what you’re doing here – and your solution is valid. (The hint use_nl(v3) is presumably a typo for use_nl(t2) but on my machine the nested loop to t2 still appeared with the logical I/O you reported.)

      A couple of general points, though. I would avoid using /*+ materialize */ unless really necessary since this will write and read to the temporary tablespace – the direct I/O may have a higher price than the saving you generate. Secondly, as a matter of personal taste – I would not have introduced the third subquery: the first two subqueries add clarity to the solution but, to me, the third one doesn’t add value. For my reference set of scripts, I’ve taken your idea and written a query that ends up joining v1, v2, t1, t2.

      Comment by Jonathan Lewis — May 23, 2010 @ 1:27 pm BST May 23,2010 | Reply

  6. Hi Jonathan,
    reading your interesting post I thought what would happen if I include an order as follows

    create table t1
    	as
    	SELECT * FROM (
            with generator as (
    	    select  --+ materialize
    	        rownum id
    	    from dual
    	    connect by
    	        rownum <= 10000
    	)
    	select
    	    rownum                  id,
    	    mod(rownum,100)             mod1,
    	    trunc(dbms_random.value(0,10000))   random1,
    	    lpad(rownum,10,'0')         small_vc,
    	    rpad('x',60)                padding
    	from
    	    generator   v1,
    	    generator   v2
    	where
    	    rownum <= 100000)
    ORDER BY MOD1, RANDOM1
    	;
    	 
    	create table t2
    	as
    SELECT * FROM (
    	with generator as (
    	    select  --+ materialize
    	        rownum id
    	    from dual
    	    connect by
    	        rownum <= 10000
    	)
    	select
    	    rownum                  id,
    	    mod(rownum,100)             mod2,
    	    trunc(dbms_random.value(0,10000))   random2,
    	    lpad(rownum,10,'0')         small_vc,
    	    rpad('x',60)                padding
    	from
    	    generator   v1,
    	    generator   v2
    	where
    	    rownum <= 100000
    )
    ORDER BY  MOD2, RANDOM2
    	;
    

    I'm expecting an improvement in the clustering factor of two indexes and that the data in the table are distributed in adjacent blocks.

    I'm expecting that Oracle then uses the two indexes so efficient and that I should not recur to the use hints (A forcing more than a suggestion, which I do not like do unless you have a need), or use
    this his valuable suggestions.

    I'd like to have your confirmation on this consideration

    thanks

    Comment by Donatello Settembrino — May 21, 2010 @ 8:38 am BST May 21,2010 | Reply

  7. Donatello,

    You are correct. If we have a system where the data we want is well placed, then we don’t have to play games to create “damage-limiting” access paths. But in the general case we usually find that if we physically arrange the data to suit one set of queries we have other queries that want it arranged differently.

    Comment by Jonathan Lewis — May 23, 2010 @ 1:31 pm BST May 23,2010 | Reply

  8. Jonathan,
    I missed the quiz night !.

    Previously for these sql’s i used to use a subquery instead of adding the 3rd table.

    This was the original query used by developers

    SELECT 	a.address1, a.address2, a.address3, b.cust_day_phone, b.DOB
    FROM 	ADDRESS a, 
    	CUSTOMER b,
    	CUST_CONTACT_REL c
    WHERE 	a.cms_acct_nbr = b.cms_acct_nbr
    AND	b.cms_acct_nbr = c.cms_acct_nbr
    AND 	a.STATE = 'ABCD'
    AND 	a.CITY = 'DEF'
    AND	b.F_NAME = 'GHI' 
    AND 	c.address_type = 'B'
    

    After applying the predicates, i get 45093 rows from CUSTOMER table and 4214808 rows from ADDRESS table. After joining them i get 1018.

    I created the following indexes

    composite index on STATE, CITY & CMS_ACCT_NBR on ADDRESS table.
    composite index on F_NAME, CMS_ACCT_NBR on CUSTOMER table.

    To use hash join after a range scan on these indexes i used this subquery

    SELECT 	a.address1, a.address2, a.address3, b.cust_day_phone, b.DOB
    FROM 	(
          	select	a1.cms_acct_nbr
          	from 	ADDRESS     a1,
                	CUSTOMER    b1
          	WHERE 	a1.STATE = 'AP'          
          	AND   	a1.CITY = 'HYDERABAD'
          	AND   	b1.F_NAME = 'VEERA'  
          	AND   	a1.cms_acct_nbr = b1.cms_acct_nbr
          	) iv,
    	ADDRESS a, 
    	CUSTOMER b,
    	CUST_CONTACT_REL c
    WHERE 	iv.cms_acct_nbr = a.cms_acct_nbr
    AND	a.cms_acct_nbr = b.cms_acct_nbr
    AND	b.cms_acct_nbr = c.cms_acct_nbr
    AND 	c.address_type = 'B'
    

    I missed the idea, i could have used the rowid’s

    thanks & regards
    srivenu

    Comment by srivenu — May 28, 2010 @ 1:04 pm BST May 28,2010 | Reply

    • Srivenu,

      I missed the idea, i could have used the rowid’s

      The use of rowids is just a little bonus – you got the important bit, which is recognising that you can use an index as if it were a just another table.

      Comment by Jonathan Lewis — May 29, 2010 @ 10:10 am BST May 29,2010 | Reply

  9. [...] Manual Optimisation Filed under: Execution plans,Hints,Indexing,Tuning — Jonathan Lewis @ 6:00 pm UTC Oct 8,2010 Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times. [...]

    Pingback by Manual Optimisation « Oracle Scratchpad — October 8, 2010 @ 6:04 pm BST Oct 8,2010 | Reply

  10. [...] another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution [...]

    Pingback by Index Join – 2 « Oracle Scratchpad — November 26, 2010 @ 6:39 pm BST Nov 26,2010 | Reply

  11. [...] while ago I published a note explaining how it was possible to find queries which ran faster if you manually de-coupled the index and table accesses. Here’s a further example that came up in discussion on a client site recently. The query [...]

    Pingback by Star Transformation « Oracle Scratchpad — April 22, 2011 @ 6:16 pm BST Apr 22,2011 | Reply

  12. In the situation that you’ve described it this post it make sence to put the sorting before join by rowid.
    I’ve changed distribution of the data (marked by — *) so that advantage will be particularly noticeable.
    As I run the script on the 11.2.0.1 I will put “_optimizer_join_elimination_enabled” into the your variant of the query.
    I have also provided the query with enhancement for both tables – the second query. I’ve copied the outline data into the hint.
    And the third query is the same query but with ordering by rowid before the joining.
    The results for the “consistent gets” are 2136 for first, 1365 for second, 34 for third.
    Of course, the distribution is too specific, in the real situation the advantage may not be so large.

    The idea is that Oracle have to jump between the few blocks because of after the HJ the rowids will be misordered.

    Test script:

    drop table t1 purge;
    drop table t2 purge;
    
    create table t1
    as
    with generator as (
      select  --+ materialize
        rownum id
      from dual
      connect by
        rownum <= 10000
    )
    select
      rownum id,
      trunc(rownum / 1000) mod1, -- *
      trunc(dbms_random.value(0,1000)) random1, -- *
      lpad(rownum,10,'0') small_vc,
      rpad('x',60) padding
    from
      generator  v1,
      generator  v2
    where
      rownum <= 100000
    ;
    
    create table t2
    as
    with generator as (
      select  --+ materialize
        rownum id
      from dual
      connect by
        rownum <= 10000
    )
    select
      rownum id,
      trunc(rownum / 1000) mod2, -- *
      trunc(dbms_random.value(0,1000)) random2, -- *
      lpad(rownum,10,'0') small_vc,
      rpad('x',60) padding
    from
      generator  v1,
      generator  v2
    where
      rownum <= 100000
    ;
    
    create index t1_i1 on t1(mod1, random1);
    create index t2_i1 on t2(mod2, random2);
    
    begin
      dbms_stats.gather_table_stats(user, 't1', estimate_percent => 100, method_opt => 'for all columns size 1');
      dbms_stats.gather_table_stats(user, 't2', estimate_percent => 100, method_opt => 'for all columns size 1');
    end;
    /
    
    alter session set "_optimizer_cost_model" = io;
    
    set autot on
    
    -- first
    
    select /*+
             opt_param('_optimizer_join_elimination_enabled' 'false')
             leading(t1 t2 t3)
             use_nl(t3)
             rowid(t3)
           */
           count(t3.padding) cnt1, count(t2.padding) cnt2
    from t1
       , t2
       , t1 t3
    where t1.mod1 = 50
      and t2.random2 = t1.random1
      and t2.mod2 = 50
      and t3.rowid = t1.rowid;
    
    -- second
    
    select  /*+
              OUTLINE_LEAF(@"SEL$3")
              OUTLINE_LEAF(@"SEL$2")
              OUTLINE_LEAF(@"SEL$1")
              NO_ACCESS(@"SEL$1" "TT"@"SEL$1")
              LEADING(@"SEL$1" "TT"@"SEL$1" "T4"@"SEL$1")
              USE_NL(@"SEL$1" "T4"@"SEL$1")
              NO_ACCESS(@"SEL$2" "T"@"SEL$2")
              LEADING(@"SEL$2" "T"@"SEL$2" "T3"@"SEL$2")
              USE_NL(@"SEL$2" "T3"@"SEL$2")
              LEADING(@"SEL$3" "T1"@"SEL$3" "T2"@"SEL$3")
            */
            count(tt.padding) cnt1, count(t4.padding) cnt2
    from
    (
      select 
        t3.padding, 
        t.rid2
      from
      (
        select
          t1.rowid rid1,
          t2.rowid rid2
        from t1, t2
        where t1.mod1 = 50
          and t2.random2 = t1.random1
          and t2.mod2 = 50
      ) t
      , t1 t3
    where  t3.rowid = t.rid1
    ) tt
    , t2 t4
    where tt.rid2 = t4.rowid;
    
    -- third
    
    select /*+
             OPT_PARAM('_optimizer_order_by_elimination_enabled' 'false')
             OUTLINE_LEAF(@"SEL$3")
             OUTLINE_LEAF(@"SEL$2")
             OUTLINE_LEAF(@"SEL$1")
             NO_ACCESS(@"SEL$1" "TT"@"SEL$1")
             ROWID(@"SEL$1" "T4"@"SEL$1")
             LEADING(@"SEL$1" "TT"@"SEL$1" "T4"@"SEL$1")
             USE_NL(@"SEL$1" "T4"@"SEL$1")
             NO_ACCESS(@"SEL$2" "T"@"SEL$2")
             ROWID(@"SEL$2" "T3"@"SEL$2")
             LEADING(@"SEL$2" "T"@"SEL$2" "T3"@"SEL$2")
             USE_NL(@"SEL$2" "T3"@"SEL$2")
             INDEX(@"SEL$3" "T1"@"SEL$3" ("T1"."MOD1" "T1"."RANDOM1"))
             INDEX(@"SEL$3" "T2"@"SEL$3" ("T2"."MOD2" "T2"."RANDOM2"))
             LEADING(@"SEL$3" "T1"@"SEL$3" "T2"@"SEL$3")
             USE_HASH(@"SEL$3" "T2"@"SEL$3")
           */
           count(tt.padding) cnt1, count(t4.padding) cnt2
    from
    (
      select
        t3.padding,
        t.rid2
      from 
      (
        select
          t1.rowid rid1,
          t2.rowid rid2
        from t1, t2
        where t1.mod1 = 50
          and t2.random2 = t1.random1
          and t2.mod2 = 50
        order by rid1
      ) t
      , t1 t3
      where t3.rowid = t.rid1
      order by rid2
    ) tt
    , t2 t4
    where tt.rid2 = t4.rowid;

    Results:

    
    SQL*Plus: Release 11.2.0.1.0 Production on Tue Jan 31 21:08:42 2012
    
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    
    SQL> drop table t1 purge;
    
    Table dropped.
    
    SQL> drop table t2 purge;
    
    Table dropped.
    
    SQL>
    SQL> create table t1
    2  as
    3  with generator as (
    4    select  --+ materialize
    5      rownum id
    6    from dual
    7    connect by
    8      rownum <= 10000
    9  )
    10  select
    11    rownum id,
    12    trunc(rownum / 1000) mod1, -- *
    13    trunc(dbms_random.value(0,1000)) random1, -- *
    14    lpad(rownum,10,'0') small_vc,
    15    rpad('x',60) padding
    16  from
    17    generator  v1,
    18    generator  v2
    19  where
    20    rownum <= 100000
    21  ;
    
    Table created.
    
    SQL>
    SQL> create table t2
    2  as
    3  with generator as (
    4    select  --+ materialize
    5      rownum id
    6    from dual
    7    connect by
    8      rownum <= 10000
    9  )
    10  select
    11    rownum id,
    12    trunc(rownum / 1000) mod2, -- *
    13    trunc(dbms_random.value(0,1000)) random2, -- *
    14    lpad(rownum,10,'0') small_vc,
    15    rpad('x',60) padding
    16  from
    17    generator  v1,
    18    generator  v2
    19  where
    20    rownum <= 100000
    21  ;
    
    Table created.
    
    SQL>
    SQL> create index t1_i1 on t1(mod1, random1);
    
    Index created.
    
    SQL> create index t2_i1 on t2(mod2, random2);
    
    Index created.
    
    SQL>
    SQL> begin
    2    dbms_stats.gather_table_stats(user, 't1', estimate_percent => 100, method_opt => 'for all columns size 1');
    3    dbms_stats.gather_table_stats(user, 't2', estimate_percent => 100, method_opt => 'for all columns size 1');
    4  end;
    5  /
    
    PL/SQL procedure successfully completed.
    
    SQL>
    SQL> alter session set "_optimizer_cost_model" = io;
    
    Session altered.
    
    SQL>
    SQL> set autot on
    SQL>
    SQL> -- first
    SQL>
    SQL> select /*+
    2           opt_param('_optimizer_join_elimination_enabled' 'false')
    3           leading(t1 t2 t3)
    4           use_nl(t3)
    5           rowid(t3)
    6         */
    7         count(t3.padding) cnt1, count(t2.padding) cnt2
    8  from t1
    9     , t2
    10     , t1 t3
    11  where t1.mod1 = 50
    12    and t2.random2 = t1.random1
    13    and t2.mod2 = 50
    14    and t3.rowid = t1.rowid;
    
    CNT1       CNT2
    ---------- ----------
    974        974
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3542293915
    
    ----------------------------------------------------------------------
    | Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
    ----------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |       |     1 |   160 |  1753 |
    |   1 |  SORT AGGREGATE              |       |     1 |   160 |       |
    |   2 |   NESTED LOOPS               |       |  1554 |   242K|  1753 |
    |*  3 |    HASH JOIN                 |       |  1554 |   132K|   199 |
    |*  4 |     INDEX RANGE SCAN         | T1_I1 |   990 | 18810 |     4 |
    |*  5 |     TABLE ACCESS FULL        | T2    |   990 | 67320 |   193 |
    |   6 |    TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
    ----------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    3 - access("T2"."RANDOM2"="T1"."RANDOM1")
    4 - access("T1"."MOD1"=50)
    5 - filter("T2"."MOD2"=50)
    
    Note
    -----
    - cpu costing is off (consider enabling it)
    
    Statistics
    ----------------------------------------------------------
    1  recursive calls
    0  db block gets
    2136  consistent gets
    0  physical reads
    0  redo size
    481  bytes sent via SQL*Net to client
    416  bytes received via SQL*Net from client
    2  SQL*Net roundtrips to/from client
    0  sorts (memory)
    0  sorts (disk)
    1  rows processed
    
    SQL>
    SQL> -- second
    SQL>
    SQL> select  /*+
    2            OUTLINE_LEAF(@"SEL$3")
    3            OUTLINE_LEAF(@"SEL$2")
    4            OUTLINE_LEAF(@"SEL$1")
    5            NO_ACCESS(@"SEL$1" "TT"@"SEL$1")
    6            LEADING(@"SEL$1" "TT"@"SEL$1" "T4"@"SEL$1")
    7            USE_NL(@"SEL$1" "T4"@"SEL$1")
    8            NO_ACCESS(@"SEL$2" "T"@"SEL$2")
    9            LEADING(@"SEL$2" "T"@"SEL$2" "T3"@"SEL$2")
    10            USE_NL(@"SEL$2" "T3"@"SEL$2")
    11            LEADING(@"SEL$3" "T1"@"SEL$3" "T2"@"SEL$3")
    12          */
    13          count(tt.padding) cnt1, count(t4.padding) cnt2
    14  from
    15  (
    16    select
    17          t3.padding,
    18          t.rid2
    19    from
    20    (
    21      select
    22             t1.rowid rid1,
    23             t2.rowid rid2
    24      from t1, t2
    25      where t1.mod1 = 50
    26        and t2.random2 = t1.random1
    27        and t2.mod2 = 50
    28    ) t
    29    , t1 t3
    30    where  t3.rowid = t.rid1
    31  ) tt
    32  , t2 t4
    33  where tt.rid2 = t4.rowid;
    
    CNT1       CNT2
    ---------- ----------
    974        974
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1174882150
    
    ------------------------------------------------------------------------
    | Id  | Operation                      | Name  | Rows  | Bytes | Cost  |
    ------------------------------------------------------------------------
    |   0 | SELECT STATEMENT               |       |     1 |   117 |  3118 |
    |   1 |  SORT AGGREGATE                |       |     1 |   117 |       |
    |   2 |   NESTED LOOPS                 |       |  1554 |   177K|  3118 |
    |   3 |    VIEW                        |       |  1554 | 68376 |  1564 |
    |   4 |     NESTED LOOPS               |       |  1554 |   147K|  1564 |
    |   5 |      VIEW                      |       |  1554 | 37296 |    10 |
    |*  6 |       HASH JOIN                |       |  1554 | 59052 |    10 |
    |*  7 |        INDEX RANGE SCAN        | T1_I1 |   990 | 18810 |     4 |
    |*  8 |        INDEX RANGE SCAN        | T2_I1 |   990 | 18810 |     4 |
    |   9 |      TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
    |  10 |    TABLE ACCESS BY USER ROWID  | T2    |     1 |    73 |     1 |
    ------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    6 - access("T2"."RANDOM2"="T1"."RANDOM1")
    7 - access("T1"."MOD1"=50)
    8 - access("T2"."MOD2"=50)
    
    Note
    -----
    - cpu costing is off (consider enabling it)
    
    Statistics
    ----------------------------------------------------------
    1  recursive calls
    0  db block gets
    1365  consistent gets
    0  physical reads
    0  redo size
    481  bytes sent via SQL*Net to client
    416  bytes received via SQL*Net from client
    2  SQL*Net roundtrips to/from client
    0  sorts (memory)
    0  sorts (disk)
    1  rows processed
    
    SQL>
    SQL> -- third
    SQL>
    SQL> select /*+
    2           OPT_PARAM('_optimizer_order_by_elimination_enabled' 'false')
    3           OUTLINE_LEAF(@"SEL$3")
    4           OUTLINE_LEAF(@"SEL$2")
    5           OUTLINE_LEAF(@"SEL$1")
    6           NO_ACCESS(@"SEL$1" "TT"@"SEL$1")
    7           ROWID(@"SEL$1" "T4"@"SEL$1")
    8           LEADING(@"SEL$1" "TT"@"SEL$1" "T4"@"SEL$1")
    9           USE_NL(@"SEL$1" "T4"@"SEL$1")
    10           NO_ACCESS(@"SEL$2" "T"@"SEL$2")
    11           ROWID(@"SEL$2" "T3"@"SEL$2")
    12           LEADING(@"SEL$2" "T"@"SEL$2" "T3"@"SEL$2")
    13           USE_NL(@"SEL$2" "T3"@"SEL$2")
    14           INDEX(@"SEL$3" "T1"@"SEL$3" ("T1"."MOD1" "T1"."RANDOM1"))
    15           INDEX(@"SEL$3" "T2"@"SEL$3" ("T2"."MOD2" "T2"."RANDOM2"))
    16           LEADING(@"SEL$3" "T1"@"SEL$3" "T2"@"SEL$3")
    17          USE_HASH(@"SEL$3" "T2"@"SEL$3")
    18         */
    19         count(tt.padding) cnt1, count(t4.padding) cnt2
    20  from
    21  (
    22    select
    23          t3.padding,
    24          t.rid2
    25    from (select
    26                 t1.rowid rid1,
    27                 t2.rowid rid2
    28          from t1, t2
    29          where t1.mod1 = 50
    30            and t2.random2 = t1.random1
    31            and t2.mod2 = 50
    32          order by rid1) t
    33       , t1 t3
    34    where    t3.rowid = t.rid1
    35    order by rid2
    36  ) tt
    37  , t2 t4
    38  where tt.rid2 = t4.rowid;
    
    CNT1       CNT2
    ---------- ----------
    974        974
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2600095256
    
    -------------------------------------------------------------------------
    | Id  | Operation                       | Name  | Rows  | Bytes | Cost  |
    -------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                |       |     1 |   117 |  3210 |
    |   1 |  SORT AGGREGATE                 |       |     1 |   117 |       |
    |   2 |   NESTED LOOPS                  |       |  1554 |   177K|  3210 |
    |   3 |    VIEW                         |       |  1554 | 68376 |  1656 |
    |   4 |     SORT ORDER BY               |       |  1554 |   147K|  1656 |
    |   5 |      NESTED LOOPS               |       |  1554 |   147K|  1603 |
    |   6 |       VIEW                      |       |  1554 | 37296 |    49 |
    |   7 |        SORT ORDER BY            |       |  1554 | 59052 |    49 |
    |*  8 |         HASH JOIN               |       |  1554 | 59052 |    10 |
    |*  9 |          INDEX RANGE SCAN       | T1_I1 |   990 | 18810 |     4 |
    |* 10 |          INDEX RANGE SCAN       | T2_I1 |   990 | 18810 |     4 |
    |  11 |       TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
    |  12 |    TABLE ACCESS BY USER ROWID   | T2    |     1 |    73 |     1 |
    -------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
    8 - access("T2"."RANDOM2"="T1"."RANDOM1")
    9 - access("T1"."MOD1"=50)
    10 - access("T2"."MOD2"=50)
    
    Note
    -----
    - cpu costing is off (consider enabling it)
    
    Statistics
    ----------------------------------------------------------
    1  recursive calls
    0  db block gets
    34  consistent gets
    0  physical reads
    0  redo size
    481  bytes sent via SQL*Net to client
    416  bytes received via SQL*Net from client
    2  SQL*Net roundtrips to/from client
    2  sorts (memory)
    0  sorts (disk)
    1  rows processed
    
    SQL>

    Comment by Valentin Nikotin — January 31, 2012 @ 7:21 pm BST Jan 31,2012 | Reply

  13. [...] my old “two-step” approach to visiting tables and indexes. Get the rowids you really need, and visit the table later. In this case, though, I’ve sorted [...]

    Pingback by Compression Units – 5 « Oracle Scratchpad — August 19, 2012 @ 6:03 pm BST Aug 19,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,514 other followers