Oracle Scratchpad

Join Ordering – pt1

[Go to part 2

A couple of days ago, I offered up the following SQL for review, asking you to consider how many strategies there might be for executing it. When I asked the question, I wasn’t actually asking about technical options – I just wanted to prompt people into thinking about what the query might mean, as the purpose of a query like this will have a big impact on the best choice of execution path.

select
	/*+ qb_name(main) */
	t1.v1
from
	t1, t3
where
	t1.n2 = 100
and	t3.n1 = t1.n1
and	t3.n2 = 100
and	exists (
		select
			/*+ qb_name(sub2) */
			t2.id
		from	t2
		where	t2.n1 = 15
		and	t2.id = t1.id
	)
and	exists (
		select
			/*+ qb_name(sub4) */
			t4.id
		from	t4
		where	t4.n1 = 15
		and	t4.id = t3.id
	)
;              

Bear in mind that this specific query is simply intended as a framework, and the possibilities I will be describing are dependent on the version of the optimizer, the data, the constraints, the indexing, and variations in the nature of the predicates.

In particular, Oracle’s ability to unnest the subqueries can vary dramatically with version as well depending on the extra bits and pieces you might have included in a slightly more generic query; and I may choose to change exists to not exists, or an and to an or for the purposes of discussion.

The number of possibilities, and the commentary that goes with them is so great that I’m going to write a series of “chapters” about the query, and publish them as standalone pages, with entries on the main menu. This is the first such chapter.

Simple Joins and Subqueries

In the basic case, and ignoring the possibility of unnesting subqueries, the optimizer will first join the tables in the outer from clause, and postpone the subqueries until the end of the main query.

This may, or may not, be a good thing. One of the critical strategies for optimising SQL is to ensure that you “eliminate early” – which often means that in the middle of a multi-table join you should pick the “next” table as the one that is the best compromise between an efficient access path and the effect it has on reducing the size of the current result set.

For example, if t1 was a table of order lines, then t3 might be a table of orders, and t2 (in the first subquery) might be a table of delivery notes which ( in this hypothetical design) relate to order lines rather than orders

So, if you have identified a list of order lines, should you join to the orders table because that has a date check on the order that eliminates 90% of the order lines you have selected so far; or should you run the subquery against the delivery notes table because that eliminates 99% of the order lines ?

Clearly, it seems sensible to eliminate as much data as possible as early as possible by running the subquery first – but the join to orders may be much more efficient, and would leave you with far fewer executions of the (more expensive) subquery. You  have to know the data and understand the query to make the best decision.

Of course you may decide that unnesting the subquery (hence finding delivery notes first and then joining to order lines) may be more efficient – on the other hand, there may be no efficient way to select a minimal set of delivery notes until after you have acquired some information from the order lines table.

Hinting

In  cases like this Oracle gives you some strategic control The hints push_subq, no_push_subq, unnest and no_unnest can be used to indicate the general strategy for the query.

In 10g, the push_subq hint should be placed inside the subqueries that you want to operate early – in prior versions of Oracle it went in the outer query. Watch out for this if you already use push_subq in 8i or 9i – after upgrading to 10g it may appear that Oracle is ignoring your hint.

[Go to Part 2]

2 Comments »

  1. Hi Jonathan,

    Can you please help me to understand the following :
    Oracle Version :10.2.0.4.0

    When I trying running the query1 it takes ages to finish compared to query 2 which takes seconds.

    Question 1 :The COST of query2 is almost 350 times more. What is the reason behind this ?
    Question 2: The query returns 90 rows. But why the cardinality is different in the plans of the queries ? Isn’t cardinality should be the same as the number of rows returned ?

    I have used autotrace to generate the plan and the gem.ref_quotation table is analyzed.

    Thanks,

    QUERY 1:
    
    select QUOTATION_INSTR_ID, quotation_perm_id from
    gem.ref_quotation where QUOTATION_INSTR_ID in
    (select QUOTATION_INSTR_ID
     from  gem.ref_quotation
    where  QUOTATION_END_DATE is null
    and QUOTATION_IS_PRIMARY = 1
    group by QUOTATION_INSTR_ID
    having count(*) > 1)
    order by QUOTATION_INSTR_ID;
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=355664 Card=1 Bytes=17)
       1    0   SORT (ORDER BY) (Cost=355664 Card=1 Bytes=17)
       2    1     FILTER
       3    2       TABLE ACCESS (FULL) OF 'REF_QUOTATION' (TABLE) (Cost=385 Card=58769 Bytes=999073)
       4    2       FILTER
       5    4         SORT (GROUP BY NOSORT) (Cost=7 Card=1 Bytes=10)
       6    5           TABLE ACCESS (BY INDEX ROWID) OF 'REF_QUOTATION' (TABLE) (Cost=7 Card=1 Bytes=10)
       7    6             INDEX (RANGE SCAN) OF 'QUOTATION_INSTR_ID' (INDEX) (Cost=3 Card=3)
    
    Query 2:
    select QUOTATION_INSTR_ID, quotation_perm_id from
    gem.ref_quotation where QUOTATION_INSTR_ID in
    (select QUOTATION_INSTR_ID
     from  gem.ref_quotation
    where  QUOTATION_END_DATE is null
    and QUOTATION_IS_PRIMARY = 1
    group by QUOTATION_INSTR_ID
    having count(*) > 1)
    order by QUOTATION_INSTR_ID;
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1019 Card=379 Bytes=8338)
       1    0   SORT (ORDER BY) (Cost=1019 Card=379 Bytes=8338)
       2    1     TABLE ACCESS (BY INDEX ROWID) OF 'REF_QUOTATION' (TABLE) (Cost=6 Card=3 Bytes=45)
       3    2       NESTED LOOPS (Cost=1018 Card=379 Bytes=8338)
       4    3         VIEW OF 'VW_NSO_1' (VIEW) (Cost=402 Card=2433 Bytes=17031)
       5    4           FILTER
       6    5             HASH (GROUP BY) (Cost=402 Card=122 Bytes=24330)
       7    6               TABLE ACCESS (FULL) OF 'REF_QUOTATION' (TABLE)(Cost=387 Card=58769 Bytes=587690)
       8    3         INDEX (RANGE SCAN) OF 'QUOTATION_INSTR_ID' (INDEX) (Cost=2 Card=3)
    

    Comment by Joe — December 14, 2009 @ 1:37 pm BST Dec 14,2009 | Reply

    • The two queries seem to be identical – is this intentional, did you simply change some parameters between the two runs ?

      A couple of things first:

        (a) there are articles on this blog pointing out that if you want to give someone an execution plan it should use dbms_xplan, and you have to supply the predicate section
        (b) you haven’t helped me to help you by formatting your code to make it easy to read – see the “How to Comment” link near the top right hand side (so I had to edit it)
        (c) this isn’t the autotrace output from 10.2.0.4 – are you using a 9i client ?
        (d) this isn’t AskTom – I don’t do troubleshooting for people
        (e) part of your answer may be related to an example I give in my chapter of “Expert Oracle Practices” (due out January)

      Question 1: In theory, cost is time; but the optimizer is not perfect and there are many reasons why the optimizer’s calculated cost and the final run time of the query don’t seem to be related. Your first execution plan operates as a filter subquery, and it’s impossible to predict how many times the subquery will operate – but for the purposes of costing, Oracle assumes it will be once per row returned from the driving rowsource (hence the very high cost in your case). The cardinality of one may be variation of a bug I’ve found in 10gR2 relating to aggregate subqueries that result in a cardinality of one after filtering.

      Question 2: See end of question 1.

      Comment by Jonathan Lewis — December 16, 2009 @ 10:30 am BST Dec 16,2009 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,088 other followers