Oracle Scratchpad

April 27, 2011

Star Transformation – 2

Filed under: CBO,Oracle,subqueries,Tuning — Jonathan Lewis @ 6:13 pm BST Apr 27,2011

After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.

You will recall that I started off with the following statement:

select
        ord.*
from
        products        prd,
        customers       cst,
        orders          ord
where
        prd.grp = 50
and     cst.grp = 50
and     ord.id_prd = prd.id
and     ord.id_cst = cst.id
;

Although this specific example is so simple that the optimizer could turn it into a star transformation automatically, I used it to demonstrate a “minimum work” version of the statement as follows (with the execution plan from 11.1.0.6):

select
        ord.*
from
        (
        select
                /*+
                        leading(prd ord)
                        use_nl(ord)
                        no_merge
                */
                ord.rowid       prid
        from
                products        prd,
                orders          ord
                where
                prd.grp = 50
        and     ord.id_prd = prd.id
                )       prid,
        (
        select
                /*+
                        leading(cst ord)
                        use_nl(ord)
                        no_merge
                */
                ord.rowid       crid
        from
                customers       cst,
                orders          ord
        where
                cst.grp = 50
        and     ord.id_cst = cst.id
        )       crid,
        orders  ord
where
        prid.prid = crid.crid
and     ord.rowid = prid.prid
;

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    11 |  1650 |   570 |
|   1 |  NESTED LOOPS               |            |    11 |  1650 |   570 |
|*  2 |   HASH JOIN                 |            |    11 |   264 |   559 |
|   3 |    VIEW                     |            |  3361 | 40332 |   277 |
|   4 |     NESTED LOOPS            |            |  3361 | 87386 |   277 |
|*  5 |      TABLE ACCESS FULL      | CUSTOMERS  |    98 |   882 |    81 |
|*  6 |      INDEX RANGE SCAN       | ORD_CST_FK |    34 |   578 |     2 |
|   7 |    VIEW                     |            |  3390 | 40680 |   281 |
|   8 |     NESTED LOOPS            |            |  3390 | 88140 |   281 |
|*  9 |      TABLE ACCESS FULL      | PRODUCTS   |   100 |   900 |    81 |
|* 10 |      INDEX RANGE SCAN       | ORD_PRD_FK |    34 |   578 |     2 |
|  11 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRID"."PRID"="CRID"."CRID")
   5 - filter("CST"."GRP"=50)
   6 - access("ORD"."ID_CST"="CST"."ID")
   9 - filter("PRD"."GRP"=50)
  10 - access("ORD"."ID_PRD"="PRD"."ID")

Go back to the original SQL statement, though. The select list contains columns from just the orders table, and there’s a fairly well-known suggestion (or possibly guideline) that:

“Tables that are not in the select list should not appear in the from clause, they should appear in subqueries in the where clause”

It’s not difficult to come up with examples where this strategy is a bad idea – but it’s often worth thinking through the consequences of trying to apply it. (Of course, you often find that after you’ve rewritten your SQL to follow the strategy the optimizer transforms it back into the join that you had been avoiding.)

Since the products and customers tables don’t appear in the select list, can I find a way of rewriting my statement with subqueries? The answer is yes. Here’s the SQL, with the execution plan I got.

rem
rem     Script:         star_emulation_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          April 2011
rem

select
        ord.*
from
        orders ord
where
        ord.rowid in (
                select
                        /*+
                                no_use_hash_aggregation
                        */
                        prid.prid
                from
                        (
                        select  /*+ no_merge */
                                ord.rowid       prid
                        from
                                products        prd,
                                orders          ord
                                where
                                prd.grp = 50
                        and     ord.id_prd = prd.id
                        )       prid,
                        (
                        select  /*+ no_merge */
                                ord.rowid       crid
                        from
                                customers       cst,
                                orders          ord
                        where
                                cst.grp = 50
                        and     ord.id_cst = cst.id
                        )       crid
                where
                        prid.prid = crid.crid
        )
;

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   138 |   591   (1)| 00:00:08 |
|   1 |  NESTED LOOPS               |            |     1 |   138 |   591   (1)| 00:00:08 |
|   2 |   VIEW                      | VW_NSO_1   |    11 |   132 |   589   (1)| 00:00:08 |
|   3 |    SORT UNIQUE              |            |     1 |   264 |            |          |
|*  4 |     HASH JOIN               |            |    11 |   264 |   589   (1)| 00:00:08 |
|   5 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|   6 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|*  7 |        TABLE ACCESS FULL    | PRODUCTS   |   100 |   900 |    94   (2)| 00:00:02 |
|*  8 |        INDEX RANGE SCAN     | ORD_PRD_FK |    33 |   561 |     2   (0)| 00:00:01 |
|   9 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|  10 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|* 11 |        TABLE ACCESS FULL    | CUSTOMERS  |   100 |   900 |    94   (2)| 00:00:02 |
|* 12 |        INDEX RANGE SCAN     | ORD_CST_FK |    33 |   561 |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("PRID"."PRID"="CRID"."CRID")
   7 - filter("PRD"."GRP"=50)
   8 - access("ORD"."ID_PRD"="PRD"."ID")
  11 - filter("CST"."GRP"=50)
  12 - access("ORD"."ID_CST"="CST"."ID")

You’ll notice that this plan is remarkably similar to the plan I got from the joins with inline views although the new plan has one extra “sort unique” operation at line 3. The optimizer has transformed my query by unnesting the subqueries and turning them into inline views – adding in a “distinct” operation because that’s what happens when you turn an “IN subquery” into a join when there isn’t a suitable uniqueness constraint on the join column.

There are two reasons for adopting this subquery approach. There’s the (highly subjective) argument that the code is a little easier to understand in this form, especially if you then want to join the orders table onwards to other tables. There’s also an objective argument relating to the “sort unique”. Note that I included the hint “no_use_hash_aggregation” to stop Oracle from using a “hash unique” operation at this point. By forcing Oracle to sort for uniqueness I know that the rowids will appear in (at least) an “extent-based” order rather than a completely random order. It’s possible that walking the fact table in physical order will be a little more efficient than accessing it in a completely randomised order. The former might find multiple rows in a block or benefit from some form of O/S or SAN read-ahead; the latter is more likely to turn one row into one random physical read.

One final thought – I said that you might want to take this type of approach for queries where the optimizer can’t do a star transformation automatically. There’s one very important case where this is always true – when you’re running Standard Edition, which doesn’t support bitmap indexes. So if you’re running SE and want to do star transformations – you now know how.

Footnote

I haven’t spent much time testing the limits of variations of this code although I did spend 30 minutes extending the example to use three dimension tables. If you think that this strategy might solve a production problem, think about using the sample code to create and test a better model of your production requirement before rewriting the production code – and if you go to production, check very carefully that the resulting code does produce the same results.

Update (July 2020)

This may be lurking elsewhere on the blog, or it may be something I’ve only mentioned in presentations on this topic, but there are a couple of other coding strategies at the point where we generate the rowids we want to use to access the orders table. One which I think is particularly cute is to change the join between the “prid” and “crid” views in the previous code into an intersection – as follows:

select
        ord.*
from
        orders ord
where
        ord.rowid in (
                select  
                        rid
                from 
                        (
                        select  /*+ no_merge */
                                ord1.rowid      rid
                        from
                                products        prd,
                                orders          ord1
                        where
                                prd.grp = 50
                        and     ord1.id_prd = prd.id
                        intersect
                        select  /*+ no_merge */
                                ord2.rowid      rid
                        from
                                customers       cst,
                                orders          ord2
                        where
                                cst.grp = 50
                        and     ord2.id_cst = cst.id
                        )
        )
;

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    33M|  4393M|  3987 |
|   1 |  NESTED LOOPS               |            |    33M|  4393M|  3987 |
|   2 |   VIEW                      |            |  3314 | 39768 |   674 |
|   3 |    INTERSECTION             |            |       |       |       |
|   4 |     SORT UNIQUE             |            |  3314 | 86164 |       |
|   5 |      NESTED LOOPS           |            |  3314 | 86164 |   284 |
|*  6 |       TABLE ACCESS FULL     | PRODUCTS   |   100 |   900 |    84 |
|*  7 |       INDEX RANGE SCAN      | ORD_PRD_FK |    33 |   561 |     2 |
|   8 |     SORT UNIQUE             |            |  3314 | 86164 |       |
|   9 |      NESTED LOOPS           |            |  3314 | 86164 |   284 |
|* 10 |       TABLE ACCESS FULL     | CUSTOMERS  |   100 |   900 |    84 |
|* 11 |       INDEX RANGE SCAN      | ORD_CST_FK |    33 |   561 |     2 |
|  12 |   TABLE ACCESS BY USER ROWID| ORDERS     | 10000 |  1240K|     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - filter("PRD"."GRP"=50)
   7 - access("ORD1"."ID_PRD"="PRD"."ID")
  10 - filter("CST"."GRP"=50)
  11 - access("ORD2"."ID_CST"="CST"."ID")

Apart from the fact that the code is slightly more compact there’s also the benefit that I don’t have to worry about forcing a sort (i.e. hinting to block hash aggregation) because the intersect necessarily does a sort unique on each of its inputs in turn (which, depending on data volumes, you might choose to avoid by going for the join option).

 

3 Comments »

  1. Hi Jonathan
    i am having of your book on CBO where in you mention “This book is the first in a series of three. Future volumes will cover important Oracle features, in
    particular partitioned tables, parallel execution, index organized tables, dynamic sampling,
    and query rewrite.”

    are you releasing them at all ?

    Regards

    Comment by kg — May 28, 2011 @ 7:38 am BST May 28,2011 | Reply

    • kg,

      The question has been raised a few times in the past. I’m having trouble writing volumes 2 and 3 – partly because the product keeps changing so rapidly . Since I wrote the first book, many more people have taken to investigating and publishing useful information on the optimizer, so the need for more volumes has decreased with time. I’m not sure that the outlines for the 2nd and 3rd volumes will ever be filled in.

      Comment by Jonathan Lewis — June 2, 2011 @ 8:33 pm BST Jun 2,2011 | Reply

  2. […] even in 11.2.0.2, it just doesn’t. Perhaps this is another opportunity for thinking about manual optimisation strategies – or perhaps ensuring that you’ve created the right set of […]

    Pingback by Mything 2 « Oracle Scratchpad — June 24, 2011 @ 5:52 pm BST Jun 24,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.