Oracle Scratchpad

April 27, 2011

Star Transformation – 2

Filed under: CBO,Oracle,subqueries,Tuning — Jonathan Lewis @ 6:13 pm BST Apr 27,2011

After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.

You will recall that I started off with the following statement:

select
	ord.*
from
	products	prd,
	customers	cst,
	orders		ord
where
	prd.grp = 50
and	cst.grp = 50
and	ord.id_prd = prd.id
and	ord.id_cst = cst.id
;

Although this specific example is so simple that the optimizer could turn it into a star transformation automatically, I used it to demonstrate a “minimum work” version of the statement as follows (with the execution plan from 11.1.0.6):

select
	ord.*
from
	(
	select
		/*+
			leading(prd ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	prid
	from
		products	prd,
		orders		ord
		where
		prd.grp = 50
	and	ord.id_prd = prd.id
		)	prid,
	(
	select
		/*+
			leading(cst ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	crid
	from
		customers	cst,
		orders		ord
	where
		cst.grp = 50
	and	ord.id_cst = cst.id
	)	crid,
	orders	ord
where
	prid.prid = crid.crid
and	ord.rowid = prid.prid
;

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    11 |  1650 |   570 |
|   1 |  NESTED LOOPS               |            |    11 |  1650 |   570 |
|*  2 |   HASH JOIN                 |            |    11 |   264 |   559 |
|   3 |    VIEW                     |            |  3361 | 40332 |   277 |
|   4 |     NESTED LOOPS            |            |  3361 | 87386 |   277 |
|*  5 |      TABLE ACCESS FULL      | CUSTOMERS  |    98 |   882 |    81 |
|*  6 |      INDEX RANGE SCAN       | ORD_CST_FK |    34 |   578 |     2 |
|   7 |    VIEW                     |            |  3390 | 40680 |   281 |
|   8 |     NESTED LOOPS            |            |  3390 | 88140 |   281 |
|*  9 |      TABLE ACCESS FULL      | PRODUCTS   |   100 |   900 |    81 |
|* 10 |      INDEX RANGE SCAN       | ORD_PRD_FK |    34 |   578 |     2 |
|  11 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRID"."PRID"="CRID"."CRID")
   5 - filter("CST"."GRP"=50)
   6 - access("ORD"."ID_CST"="CST"."ID")
   9 - filter("PRD"."GRP"=50)
  10 - access("ORD"."ID_PRD"="PRD"."ID")

Go back to the original SQL statement, though. The select list contains columns from just the orders table, and there’s a fairly well-known suggestion (or possibly guideline) that “tables that are not in the select list should not appear in the from clause, they should appear in subqueries in the where clause”. It’s not difficult to come up with examples where this strategy is a bad idea – but it’s often worth thinking through the consequences of trying to apply it. (Of course, you often find that after you’ve rewritten your SQL to match the strategy the optimizer transforms it back into the join that you had been avoiding.)

Since the products and customers tables don’t appear in the select list, can I find a way of rewriting my statement with subqueries ? The answer is yes. Here’s the SQL, with the execution plan I got.

select
	ord.*
from
	orders ord
where
	ord.rowid in (
		select
			/*+
				no_use_hash_aggregation
			*/
			prid.prid
		from
			(
			select	/*+ no_merge */
				ord.rowid 	prid
			from
				products	prd,
				orders		ord
				where
				prd.grp = 50
			and	ord.id_prd = prd.id
			)	prid,
			(
			select	/*+ no_merge */
				ord.rowid 	crid
			from
				customers	cst,
				orders		ord
			where
				cst.grp = 50
			and	ord.id_cst = cst.id
			)	crid
		where
			prid.prid = crid.crid
	)
;

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   138 |   591   (1)| 00:00:08 |
|   1 |  NESTED LOOPS               |            |     1 |   138 |   591   (1)| 00:00:08 |
|   2 |   VIEW                      | VW_NSO_1   |    11 |   132 |   589   (1)| 00:00:08 |
|   3 |    SORT UNIQUE              |            |     1 |   264 |            |          |
|*  4 |     HASH JOIN               |            |    11 |   264 |   589   (1)| 00:00:08 |
|   5 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|   6 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|*  7 |        TABLE ACCESS FULL    | PRODUCTS   |   100 |   900 |    94   (2)| 00:00:02 |
|*  8 |        INDEX RANGE SCAN     | ORD_PRD_FK |    33 |   561 |     2   (0)| 00:00:01 |
|   9 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|  10 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|* 11 |        TABLE ACCESS FULL    | CUSTOMERS  |   100 |   900 |    94   (2)| 00:00:02 |
|* 12 |        INDEX RANGE SCAN     | ORD_CST_FK |    33 |   561 |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("PRID"."PRID"="CRID"."CRID")
   7 - filter("PRD"."GRP"=50)
   8 - access("ORD"."ID_PRD"="PRD"."ID")
  11 - filter("CST"."GRP"=50)
  12 - access("ORD"."ID_CST"="CST"."ID")

You’ll notice that this plan is remarkably similar to the plan I got from the joins with inline views – although the new plan has one extra “sort unique” operation at line 3. The optimizer has transformed my query by unnesting the subqueries and turning them into inline views – adding in a “distinct” operation because that’s what happens when you turn an “IN subquery” into a join when there isn’t a suitable uniqueness constraint on the join column.

There are two reasons for adopting this subquery approach. There’s the (highly subjective) argument that the code is a little easier to understand in this form, especially if you then want to join the orders table onwards to other tables. There’s also an objective argument relating to the “sort unique”. Note that I included the hint “no_use_hash_aggregation” to stop Oracle from using a “hash unique” operation at this point. By forcing Oracle to sort for uniqueness I know that the rowids will appear in (at least) an “extent-based” order rather than a completely random order. It’s possible that walking the fact table in physical order will be a little more efficient than accessing it in a completely randomised order. The former might find multiple rows in a block or benefit from some form of O/S or SAN read-ahead; the latter is more likely to turn one row into one random physical read.

One final thought – I said that you might want to take this type of approach for queries where the optimizer can’t do a star transformation automatically. There’s one very important case where this is always true – when you’re running Standard Edition, which doesn’t support bitmap indexes. So if you’re running SE and want to do star transformations – you now know how.

Footnote: I haven’t spent much time testing the limits of variations of this code although I did spend 30 minutes extending the example to use three dimension tables. If you think that this strategy might solve a production problem, think about using the sample code to create and test a better model of your production requirement before rewriting the production code – and if you go to production, check very carefully that the resulting code does produce the same results.

3 Comments »

  1. Hi Jonathan
    i am having of your book on CBO where in you mention “This book is the first in a series of three. Future volumes will cover important Oracle features, in
    particular partitioned tables, parallel execution, index organized tables, dynamic sampling,
    and query rewrite.”

    are you releasing them at all ?

    Regards

    Comment by kg — May 28, 2011 @ 7:38 am BST May 28,2011 | Reply

    • kg,

      The question has been raised a few times in the past. I’m having trouble writing volumes 2 and 3 – partly because the product keeps changing so rapidly . Since I wrote the first book, many more people have taken to investigating and publishing useful information on the optimizer, so the need for more volumes has decreased with time. I’m not sure that the outlines for the 2nd and 3rd volumes will ever be filled in.

      Comment by Jonathan Lewis — June 2, 2011 @ 8:33 pm BST Jun 2,2011 | Reply

  2. [...] even in 11.2.0.2, it just doesn’t. Perhaps this is another opportunity for thinking about manual optimisation strategies – or perhaps ensuring that you’ve created the right set of [...]

    Pingback by Mything 2 « Oracle Scratchpad — June 24, 2011 @ 5:52 pm BST Jun 24,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,528 other followers