I’ve written before about the effects of subquery factoring (common table expressions – or CTEs) on the optimizer, and the way that the optimizer can “lose” some strategies when you start factoring out subquery expressions. Here’s another example I came across quite recently. It involved a join of about 15 tables so I’ve only extracted a few lines from the SQL and resulting execution plans.
We start with the original query, which had factored out an aggregate subquery then used it in place of an inline view:
with max_cust_comm as ( select ccm.order_id, max(ccm.comm_date) from customer_communications ccm group by ccm.order_id ) select ... from ... left join max_cust_comm mcc on mcc.order_id = ord.order_id ...
The execution path for this query included the following lines:
| 6 | HASH JOIN OUTER | | 1 | | 7 | NESTED LOOP | | 1 | ... | 41 | VIEW | | 798K| | 42 | HASH GROUP BY | | 798K| | 43 | TABLE ACCESS FULL | CUSTOMER_COMMUNICATIONS | 798K|
You can see that the optimizer has created a result set (VIEW) at line 41 by scanning the entire customer_communications table, for a total of about 800,000 rows, aggregating the data by order_id. This is not very efficient becauase (a) I happen to have a very useful index on the customer_communications table that contains all the data I need, and (b) there are just a few input rows where I need to find this max(comm_date).
Note, by the way, that we have an outer join into the aggregate view, so Oracle cannot use complex view merging (CVM) to avoid the need to create the result set – but it could, in principle, use join predicate pushdown (JPPD) to create a result once for each input value.
So now we change the code to copy the subquery back to the inline position:
select ... from ... left join ( select ccm.order_id, max(ccm.comm_date) from customer_communications ccm group by ccm.order_id ) mcc on mcc.order_id = ord.order_id ...
This, with no further intervention, is what happened to the plan.
| 6 | NESTED LOOP OUTER | | 1 | | 7 | NESTED LOOP | | 1 | ... | 43 | VIEW PUSHED PREDICATE | | 1 | | 44 | SORT AGGREGATE | | 1 | | 45 | FIRST ROW | | 1 | |* 46 | INDEX RANGE SCAN (MIN/MAX) | CST_CMM_IX2 | 1 |
Notice that the hash join outer has changed to a nested loop outer, and the full tablescan has changed to an optimal index-only access using the ideal (for this query) index range scan (min/max). Moving the inline view into a common table expression at the head of the SQL was enough to stop Oracle from investigating this option – even though the optimizer immediately moved it back inline.
So be careful the next time you think about using subquery factoring to make your queries look cleaner and simpler – it’s a good idea, and a nice piece of technology, but sometimes it will cause your execution plans to change for the worse.
Footnote: This error, and the error described in the previous note, may be related to (unpublished) bug 9850588