Oracle Scratchpad

July 16, 2012

ANSI Outer 2

Filed under: ANSI Standard,CBO,Execution plans,Oracle — Jonathan Lewis @ 4:55 pm BST Jul 16,2012

A comment on a recent post of mine pointed me to a question on the OTN SQL and PL/SQL Forum where someone had presented a well-written test case of an odd pattern of behaviour in ANSI SQL. I made a couple of brief comments on the thread, but thought it worth highlighting here as well. The scripts to create the required tables (plus a few extras) are all available on OTN. If you create only the four tables needed and all their indexes you will need about 1.3GB of space.

The core of the problem is this: there is a three table join which does a hash join involving an index fast full scan on a particular index; when you add a fourth table to the join this fast full scan turns into a full tablescan for no obvious reason. Here are the queries, with the plans that I got when running 10.2.0.3. (My final plan is slightly different from the plan shown on OTN – I have a right outer hash join to the last table where the OP had a nested loop outer – but the difference is not significant). The queries, with their execution plans, are below- the three table join first:

select
        episode.episode_id, episode.cross_ref_id, episode.date_required,
        product.number_required,
        request.site_id
from
        episode
left join
        request
on      request.cross_ref_id = episode.cross_ref_id
join
        product
on      product.episode_id = episode.episode_id
where
        episode.department_id = 2
and     product.status = 'I'
order by
        episode.date_required
;

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             | 33333 |  1725K|       | 10399   (2)| 00:02:05 |
|   1 |  SORT ORDER BY         |             | 33333 |  1725K|  4216K| 10399   (2)| 00:02:05 |
|*  2 |   HASH JOIN OUTER      |             | 33333 |  1725K|  1632K|  9959   (2)| 00:02:00 |
|*  3 |    HASH JOIN           |             | 33333 |  1236K|       |   889   (3)| 00:00:11 |
|*  4 |     TABLE ACCESS FULL  | PRODUCT     | 33333 |   325K|       |   107   (4)| 00:00:02 |
|*  5 |     TABLE ACCESS FULL  | EPISODE     |   300K|  8216K|       |   778   (2)| 00:00:10 |
|   6 |    INDEX FAST FULL SCAN| IX4_REQUEST |  4001K|    57M|       |  3785   (2)| 00:00:46 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REQUEST"."CROSS_REF_ID"(+)="EPISODE"."CROSS_REF_ID")
   3 - access("PRODUCT"."EPISODE_ID"="EPISODE"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

And now the four-table query – note that the only difference is that an outer join to product_sub_type (against product) has been added:

select
        episode.episode_id, episode.cross_ref_id, episode.date_required,
        product.number_required,
        request.site_id
from
        episode
left join
        request
on      request.cross_ref_id= episode.cross_ref_id
join
        product
on      product.episode_id= episode.episode_id
left join
        product_sub_type
on      product_sub_type.prod_sub_type_id= product.prod_sub_type_id
where
        episode.department_id = 2
and     product.status = 'I'
order by
        episode.date_required
;

------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     | 33333 |  1920K|       | 35491   (1)| 00:07:06 |
|   1 |  SORT ORDER BY         |                     | 33333 |  1920K|  4456K| 35491   (1)| 00:07:06 |
|*  2 |   HASH JOIN RIGHT OUTER|                     | 33333 |  1920K|       | 35012   (1)| 00:07:01 |
|   3 |    INDEX FULL SCAN     | PK_PRODUCT_SUB_TYPE |    15 |    45 |       |     1   (0)| 00:00:01 |
|*  4 |    HASH JOIN OUTER     |                     | 33333 |  1822K|  1728K| 35010   (1)| 00:07:01 |
|*  5 |     HASH JOIN          |                     | 33333 |  1334K|       |   889   (3)| 00:00:11 |
|*  6 |      TABLE ACCESS FULL | PRODUCT             | 33333 |   423K|       |   107   (4)| 00:00:02 |
|*  7 |      TABLE ACCESS FULL | EPISODE             |   300K|  8216K|       |   778   (2)| 00:00:10 |
|   8 |     TABLE ACCESS FULL  | REQUEST             |  4001K|    57M|       | 28832   (1)| 00:05:46 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRODUCT_SUB_TYPE"."PROD_SUB_TYPE_ID"(+)="PRODUCT"."PROD_SUB_TYPE_ID")
   4 - access("REQUEST"."CROSS_REF_ID"(+)="EPISODE"."CROSS_REF_ID")
   5 - access("PRODUCT"."EPISODE_ID"="EPISODE"."EPISODE_ID")
   6 - filter("PRODUCT"."STATUS"='I')
   7 - filter("EPISODE"."DEPARTMENT_ID"=2)

As you can see, lines 4 to 8 of the second plan mirror lines 2 to 6 of the first plan to generate a predicted 33,333 rows that will be used to probe the in-memory hash table created from index PK_PRODUCT_SUB_TYPE. Yet the first plan manages to get the necessary data from an index fast full scan of the (relatively small) index IX4_REQUEST while the second plan does a full tablescan of the (relatively large) table REQUEST. From the programmer’s perspective there is no apparent reason why this difference should appear – it’s an anomaly due (in some fashion I can’t explain) to the way in which Oracle transforms outer joins to lateral joins.

Here, then, are a couple more anomalies to go with this one. Generally speaking, the order of the tables in the FROM clause shouldn’t make any difference to the execution plan (though there are a couple of boundary conditions where it could matter “by accident”); but if you change the order of the FROM clause in this query you can get a different execution plan. If you want the index fast full scan to appear, just move the join to REQUESTS down to the end of the FROM clause:

from
        episode
join
        product
on      product.episode_id= episode.episode_id
left join
        product_sub_type
on      product_sub_type.prod_sub_type_id= product.prod_sub_type_id
left join
        request
on      request.cross_ref_id= episode.cross_ref_id

This happens (by accident) to be the way I would probably have written this query if I had started from scratch. I tend to follow two guidelines when messing about with ANSI and outer joins. The first is that I tend to postpone outer joins because I get so confused (in ANSI) by an outer join followed by an “inner” join; the second is that I tend to follow “chains” of joins, so in this case I might have written the code to highlight the chain episode -> product -> product_sub_type, rather than having the “break” in the middle where the request table interrupts the chain. (This type of thinking is all very subjective, of course, in terms of clarity of intent – sometimes there are more important considerations than visual clarity.)

The other anomaly is this – if you run the test on 11g (and I happened to have 11.2.0.3 to hand) the optimizer can do “join elimination”. The product_sub_type table in this query is not affecting the outcome – it’s an outer join by primary key, and the query doesn’t use any non-key columns. So in 11.2 I get a plan which eliminates the table from the four-table join, taking me down to the three-table join – and I still get a full table scan on REQUEST on that three table join. I think this helps to confirm the fact that the index-only path is dictated by some aspect of the lateral transformation that automatically blocks the use of the index-only path to REQUEST, and by the time that join elimination is invoked it’s too late and the index-only path can’t be re-introduced.

Digging in

Now I don’t often go digging into the 10053 trace file for a query – you can usually spot the critical problems by examining a few variatons of the possible execution plans that you can get by hinting the query; but I thought I’d take a little look at what Oracle was doing with the two variations of the four-table queries in 10g to see if that gave me any further idea.

One of the things you get in the 10053 trace is a selection of “unparsed query” lines, where Oracle shows you a version of something that looks a bit like the SQL that it’s probably going to be optimising. In the case of ANSI SQL with outer joins this often means you see some rewritten SQL with the lateral() operator, and that’s exactly what I saw when checked for these two queries; over the course of three or four steps Oracle showed me how it was unwrapping the outer joins and rewriting them as inline and lateral() views. Here are the two unparsed statements (the “good” one first, then the “bad” one):

Query block (08580A48) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT
	sq$_005.qcsj_00300_0 qcsj_00300,
	sq$_005.DEPARTMENT_ID_1 qcsj_00700,
	sq$_005.DATE_REQUIRED_2 DATE_REQUIRED,
	sq$_005.CROSS_REF_ID_3 qcsj_00702,
	sq$_005.qcsj_00500_4 qcsj_00500,
	sq$_005.qcsj_00301_5 qcsj_00301,
	sq$_005.STATUS_6 STATUS,sq$_005.
	NUMBER_REQUIRED_7 NUMBER_REQUIRED,
	sq$_005.qcsj_00501_8 qcsj_00501,
	sq$_009.DEPARTMENT_ID_2 qcsj_00701,
	sq$_009.SITE_ID_1 SITE_ID,
	sq$_009.CROSS_REF_ID_0 qcsj_00703
FROM  	(
	SELECT
		sq$_003.qcsj_00300_0 qcsj_00300_0,
		sq$_003.DEPARTMENT_ID_1 DEPARTMENT_ID_1,
		sq$_003.DATE_REQUIRED_2 DATE_REQUIRED_2,
		sq$_003.CROSS_REF_ID_3 CROSS_REF_ID_3,
		sq$_003.PROD_SUB_TYPE_ID_4 qcsj_00500_4,
		sq$_003.qcsj_00301_5 qcsj_00301_5,
		sq$_003.STATUS_6 STATUS_6,
		sq$_003.NUMBER_REQUIRED_7 NUMBER_REQUIRED_7,
		sq$_008.PROD_SUB_TYPE_ID_0 qcsj_00501_8
	FROM
		(
		SELECT
			EPISODE.EPISODE_ID qcsj_00300_0,
			EPISODE.DEPARTMENT_ID DEPARTMENT_ID_1,
			EPISODE.DATE_REQUIRED DATE_REQUIRED_2,
			EPISODE.CROSS_REF_ID CROSS_REF_ID_3,
			PRODUCT.PROD_SUB_TYPE_ID PROD_SUB_TYPE_ID_4,
			PRODUCT.EPISODE_ID qcsj_00301_5,
			PRODUCT.STATUS STATUS_6,PRODUCT.NUMBER_REQUIRED NUMBER_REQUIRED_7
		FROM
			TEST_USER.EPISODE EPISODE,
			TEST_USER.PRODUCT PRODUCT
		WHERE
			PRODUCT.EPISODE_ID=EPISODE.EPISODE_ID
		) sq$_003,
		LATERAL((
			SELECT
				PRODUCT_SUB_TYPE.PROD_SUB_TYPE_ID PROD_SUB_TYPE_ID_0
			FROM
				TEST_USER.PRODUCT_SUB_TYPE PRODUCT_SUB_TYPE
			WHERE
				sq$_003.PROD_SUB_TYPE_ID_4=PRODUCT_SUB_TYPE.PROD_SUB_TYPE_ID
		))(+) sq$_008
	) sq$_005,
	LATERAL((
		SELECT
			REQUEST.CROSS_REF_ID CROSS_REF_ID_0,
			REQUEST.SITE_ID SITE_ID_1,
			REQUEST.DEPARTMENT_ID DEPARTMENT_ID_2
		FROM
			TEST_USER.REQUEST REQUEST
		WHERE
			sq$_005.CROSS_REF_ID_3=REQUEST.CROSS_REF_ID
	))(+) sq$_009
;

Query block (085804DC) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT
	sq$_005.qcsj_00500_0 qcsj_00500,
	sq$_005.qcsj_00300_1 qcsj_00300,
	sq$_005.DATE_REQUIRED_2 DATE_REQUIRED,
	sq$_005.qcsj_00302_3 qcsj_00302,
	sq$_005.qcsj_00301_4 qcsj_00301,
	sq$_005.SITE_ID_5 SITE_ID,
	sq$_005.qcsj_00303_6 qcsj_00303,
	sq$_005.PROD_SUB_TYPE_ID_7 qcsj_00700,
	sq$_005.qcsj_00501_8 qcsj_00501,
	sq$_005.STATUS_9 STATUS,
	sq$_005.NUMBER_REQUIRED_10 NUMBER_REQUIRED,
	sq$_009.PROD_SUB_TYPE_ID_0 qcsj_00701
FROM  (
	SELECT
		sq$_003.EPISODE_ID_0 qcsj_00500_0,
		sq$_003.qcsj_00300_1 qcsj_00300_1,
		sq$_003.DATE_REQUIRED_2 DATE_REQUIRED_2,
		sq$_003.qcsj_00302_3 qcsj_00302_3,
		sq$_003.qcsj_00301_4 qcsj_00301_4,
		sq$_003.SITE_ID_5 SITE_ID_5,
		sq$_003.qcsj_00303_6 qcsj_00303_6,
		PRODUCT.PROD_SUB_TYPE_ID PROD_SUB_TYPE_ID_7,
		PRODUCT.EPISODE_ID qcsj_00501_8,
		PRODUCT.STATUS STATUS_9,
		PRODUCT.NUMBER_REQUIRED NUMBER_REQUIRED_10
	FROM	(
		SELECT
			EPISODE.EPISODE_ID	EPISODE_ID_0,
			EPISODE.DEPARTMENT_ID	qcsj_00300_1,
			EPISODE.DATE_REQUIRED	DATE_REQUIRED_2,
			EPISODE.CROSS_REF_ID	qcsj_00302_3,
			sq$_008.DEPARTMENT_ID_2	qcsj_00301_4,
			sq$_008.SITE_ID_1	SITE_ID_5,
			sq$_008.CROSS_REF_ID_0	qcsj_00303_6
		FROM
			TEST_USER.EPISODE EPISODE,
			LATERAL((
				SELECT
					REQUEST.CROSS_REF_ID	CROSS_REF_ID_0,
					REQUEST.SITE_ID		SITE_ID_1,
					REQUEST.DEPARTMENT_ID	DEPARTMENT_ID_2
				FROM
					TEST_USER.REQUEST REQUEST
				WHERE
					EPISODE.CROSS_REF_ID=REQUEST.CROSS_REF_ID
			))(+) sq$_008
		) sq$_003,
		TEST_USER.PRODUCT PRODUCT
	WHERE
		sq$_003.EPISODE_ID_0=PRODUCT.EPISODE_ID
	) sq$_005,
	LATERAL((
		SELECT
			PRODUCT_SUB_TYPE.PROD_SUB_TYPE_ID PROD_SUB_TYPE_ID_0
		FROM
			TEST_USER.PRODUCT_SUB_TYPE PRODUCT_SUB_TYPE
		WHERE
			sq$_005.PROD_SUB_TYPE_ID_7=PRODUCT_SUB_TYPE.PROD_SUB_TYPE_ID
	))(+) sq$_009

To make the texts more readable I’ve eliminated all the double-quote marks and shortened all the view and column aliases that Oracle uses; sq$_005.qcsj_00300_0 in the first line of the first query, for example, was actually “from$_subquery$_005″.”QCSJ_C000000000300000_0″.

You’ll notice in both cases in the inline views that Oracle selects various columns from the base tables that aren’t needed in the final query. In particular when you look at the lateral view on the REQUEST table you can see that the code selects the department_id column – and we might think that this is some sort of clue explaining why one of our plans ends up doing a tablescan instead of an index fast full scan. (You’ll need to set event 22829 – as shown in this posting – to check the plans.)

Both the queries above, when you generate their execution plans, show a full tablescan of the REQUEST table; however, if I delete just the reference to the department_id from the final select list in the good query (sq$_009.DEPARTMENT_ID_2 qcsj_00701) the plan shows an index fast full scan. On the other hand, if I delete EVERY single reference to the request.department_id from the bad plan, it still reports a full tablescan for the plan. So it’s not the projection of the department_id into the unparsed query that causes the problem. (I suspect the projection is a small error – perhaps brought on by the appearance of a column of the same name elsewhere in the query – it does seem to be redundant.)

Looking at the unparsed query the only guess I can make is that there is some rule coming into play because of the lateral view that appears AFTER the lateral view referencing the REQUEST table. For some reason (possibly erroneously) the presence of that lateral view forces Oracle through a code path that won’t accept allow it to use an index fast full scan unless at least one of the columns involved in the index is explicitly required (either through a constraint or through a predicate “is not null”) to be non-null. Adding one of “cross_ref_id is not null” or “site_id is not null” (and eliminating the final reference to the department_id) was sufficient to allow the optimizer to use an index fast full scan.

Conclusions:

  • Oracle still has some way to go to get ANSI SQL taking advantage of all the optimisations available to standard Oracle SQL.
  • If you get tablescans on outer joins when you think an index fast full scan is legal, try moving the outer join further down the FROM clause – if it’s legal – you may get lucky, but it’s possible you’re only allowed to be lucky once
  • Make sure you’re testing with realistic data on the right version of Oracle

5 Comments »

  1. Jonathan,

    Thanks for the additional investigation on this. I raised the original question on OTN, and did a short follow up post on my own blog. As you say it is definitely something about the way that Oracle transforms ANSI SQL syntax into its own native equivalent before then optimizing it and producing an execution plan.

    The real SQL is even more complex than the test case I gave – there are 7 tables being joined together. And the main problem is that most columns allow NULL values, which prohibits the use of certain indexes (but not all indexes). But the other tables are very small in size and have negligible access cost.

    There are also some manual rewrites I can do to the SQL to improve things, such as replacing an OUTER join by an INNER join when the WHERE clause applies a constraint to a column from the OUTER table – if the OUTER join generated a pseudo row of NULL values then it could not match the WHERE constraint of being equal to a real value. I’ve not had a chance to do a full rewrite of the SQL yet, because of the complexity of it all and other more pressing problems. But I now know what is going on within the query, and am ready to deal with it when the time arises.

    Thanks again for the additional information, and to the others that replied on the OTN forum.

    John

    Comment by John Brady — July 17, 2012 @ 8:12 am BST Jul 17,2012 | Reply

  2. [...] Continue reading… All Things Oracle [...]

    Pingback by ANSI Outer 2 – All Things Oracle — July 17, 2012 @ 5:43 pm BST Jul 17,2012 | Reply

  3. Hi Jonathan,

    I know what i am taking or asking is totally irrelevant to the above discussion but i have been working on one issue for quite some time so i though i could read you but i didn’t find way so i have chosen this plafform

    why db file sequential read wait event shows up if table is accessing full table scan. I know sometime oracle choose to retrieve some blocks(Like segment header blocks, undo blocks) but in my case the only wait event i can see in whole trace file is db file sequential read wait event and i can assure you that whole table is accessing through single block request that i have verified through owi views.

    In addition to this, i have used parallel hints to do the same but still both parallel process are again accessing single blocks instead of multiblocks.

    So i have started looking into columns structure and something like that and finanlly i have identified two columns in the select clause of my select stmt is making this strange behaviour. If i keep those two columns in the select clause then it choose single block request(db file sequential read) and if i remove those from select clause then it works the way it’s supposed to work.

    Sorry for posting this question in blog.

    Comment by jayasankar — July 19, 2012 @ 7:24 am BST Jul 19,2012 | Reply

    • Jayasankar,

      Two things about your comment (apart from the fact that this isn’t a forum):

      a) it’s not hard to find out how to reach me – my email address is on the home page for the blog, and if you didn’t think of looking there then a google search for “jonathan lewis email” returns my home page as the first hit, showing my email address and cell-phone number.

      b) you’ve previously posted this on the OTN database forum – I participate in the forum when I have time, so it’s particularly ill-mannered to post this on my blog when I’ve implicitly declined to take part in discussing your problem. It would have been slightly less rude to post a link to the original:
      https://forums.oracle.com/forums/thread.jspa?threadID=2415815&tstart=0

      As it stands to date: I think Hemant Chitale has made an interesting observation, and I would have dumped a couple of the blocks that you’ve reported as the targets of single block reads to see what’s in them.

      Comment by Jonathan Lewis — July 19, 2012 @ 11:33 am BST Jul 19,2012 | Reply

  4. Apologise i wasn’t knowing that you are part of otn database forum..to be honest i didn’t even try to find.

    Comment by jayasankar — July 19, 2012 @ 12:00 pm BST Jul 19,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,507 other followers