February 16, 2014

Recursive subquery factoring

Filed under: Hints,Ignoring Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:11 pm BST Feb 16,2014

This is possibly my longest title to date – I try to keep them short enough to fit the right hand column of the blog without wrapping – but I couldn’t think of a good way to shorten it (Personally I prefer to use the expression CTE – common table expression – over “factored subquery” or “subquery factoring” or “with subquery”, and that would have achieved my goal, but might not have meant anything to most people.)

If you haven’t come across them before, recursive CTEs appeared in 11.2, are in the ANSI standard, and are (probably) viewed by Oracle as the strategic replacement for “connect by” queries. Here’s a simple (and silly) example:

with data(p) as (
select 1 p from dual
union all
select p + 1 from data where p < 100
)
select	p
from	data
where	rownum <= 10
;

P
----------
1
2
3
4
5
6
7
8
9
10

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 37253879

---------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |     2 |    26 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                             |      |       |       |            |          |
|   2 |   VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   4 |     FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |      |       |       |            |          |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=10)
5 - filter("P"<100)

A recursive CTE has three features that identify it. First, the query alias (“data” in this case) is followed by a list of column aliases; secondly the query includes a UNION ALL; and thirdly the second subquery in the UNION ALL references the query alias – i.e. it’s the recursive bit. There are other optional bits but I’m not planning to go into those – all I want to talk about is how to control the materialization (or not) of a recursive CTE through hinting.

The reason I wrote this note was because Jeff Jacobs, in his presentation on “Performance Anti-patterns” at RMOUG last week, raised the question of whether or not the /*+ materialize */ and /*+ inline */ hints worked with recursive CTEs and gave an example of a UNION ALL query where the CTE always materialized, no matter how you applied the /*+ inline */ hint. The CTE seemed to be following the basic guideline for CTEs – if you use it once in the main query it goes inline, if you use it more than once it will (almost invariably) materialize.

I’m always interested in examples where “the hint is ignored”, so I exchanged a couple of email messages with Jeff and he sent me an example (which I’ve simplified for this blog) of a query that demonstrated the issue; and I spent a little while thinking about it and decided that it simply wasn’t possible to hint the code the way we wanted to and it was just one of those cases where it takes a bit of time for new features to catch up and fit in to the standard framework. Here’s a simplified version of the query, with its execution plan:

with data(p) as (
select 1 p from dual
union all
select p + 1 from data where p < 100
)
select	p
from	data
where	rownum <= 10
union all
select	p
from	data
where	rownum <= 10
;

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                            |     4 |    52 |     4   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION                 |                            |       |       |            |          |
|   2 |   LOAD AS SELECT                           | SYS_TEMP_0FD9D6608_7391CD7 |       |       |            |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|                            |       |       |            |          |
|   4 |     FAST DUAL                              |                            |     1 |       |     2   (0)| 00:00:01 |
|*  5 |     RECURSIVE WITH PUMP                    |                            |       |       |            |          |
|   6 |   UNION-ALL                                |                            |       |       |            |          |
|*  7 |    COUNT STOPKEY                           |                            |       |       |            |          |
|   8 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
|* 10 |    COUNT STOPKEY                           |                            |       |       |            |          |
|  11 |     VIEW                                   |                            |     2 |    26 |     2   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6608_7391CD7 |     2 |    12 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("P"<100)
7 - filter(ROWNUM<=10)
10 - filter(ROWNUM<=10)

The following morning I woke up with one of those “overnight insights” where you seem to have worked out the answer in your sleep. To make a hint work you have to put it in the right query block, or you have to name the right query block in the main query block: in this case the right query block doesn’t exist in the text, and it’s not possible to figure out what the name of the right query block would be if it came into existence.

If you try putting the /*+ inline */ hint into the query after the select at line 2 above, you’ve put the hint into the first query block of a union all, NOT into the query block of the recursvie CTE.

Having identified the problem, the solution (or at least, a possible solution) was obvious – create the query block you need. This (with its execution plan from 11.2.0.4) is what worked:

with data(p) as (
select 1 p from dual
union all
select p + 1 from data where p < 100
),
data1 as (
select /*+ inline */ * from data
)
select	p
from	(
select * from data1 where rownum <= 10
union all
select * from data1 where rownum <= 10
)
;

-----------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |      |     4 |    52 |     8   (0)| 00:00:01 |
|   1 |  VIEW                                        |      |     4 |    52 |     8   (0)| 00:00:01 |
|   2 |   UNION-ALL                                  |      |       |       |            |          |
|*  3 |    COUNT STOPKEY                             |      |       |       |            |          |
|   4 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|   6 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|*  7 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
|*  8 |    COUNT STOPKEY                             |      |       |       |            |          |
|   9 |     VIEW                                     |      |     2 |    26 |     4   (0)| 00:00:01 |
|  10 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |            |          |
|  11 |       FAST DUAL                              |      |     1 |       |     2   (0)| 00:00:01 |
|* 12 |       RECURSIVE WITH PUMP                    |      |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(ROWNUM<=10)
7 - filter("P"<100)
8 - filter(ROWNUM<=10)
12 - filter("P"<100)

All I’ve done is create a second CTE that selects from the first CTE. This is now a simple select, so I can add a perfectly placed hint to it – in the hope that this would effectively require the dependent recursive CTE to be inlined inside it. It seems to be sufficient.

I haven’t tested the strategy exhaustively – so I can give you no guarantee that this has to work – unfortunately I did have another example that I applied the method to, and after several seconds of no response it crashed with an ORA-00600 error :( but that might have been a side effect of the nature of query (it included a couple of the optional extras) rather than a specific feature of inlining.)

February 14, 2014

12c Subquery Factoring

Filed under: 12c,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 11:44 am BST Feb 14,2014

From time to time I’ve posted a reminder that subquery factoring (“with subquery”) can give you changes in execution plans even if the subquery that you’ve taken out of line is written back inline by Oracle rather than being materialized. This can still happen in 12c – here’s a sample query in the two forms with the result sets and execution plans.  First, the “factored” version:

with e as (
select
deptno
from	emp
order by
deptno
)
select
deptno, count(*)
from
e
group by deptno
;

DEPTNO   COUNT(*)
---------- ----------
10          3
20          5
30          6

3 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1536940522

-------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     3 |     9 |     6 |
|   1 |  SORT GROUP BY NOSORT|      |     3 |     9 |     6 |
|   2 |   VIEW               |      |    14 |    42 |     6 |
|   3 |    SORT ORDER BY     |      |    14 |    42 |     6 |
|   4 |     TABLE ACCESS FULL| EMP  |    14 |    42 |     2 |
-------------------------------------------------------------

Note that the view has not been materialized (no “TEMP TABLE TRANSFORMATION”). There is a “sort group by” with NOSORT, and the data has been reported in sorted order as a side effect of the “sort order by” in line 3.

Now with the inline view manually written in place:

select   deptno, count(*)
from (
select
/*+ gather_plan_statistics */
deptno
from	emp
order by
deptno
) e
group by deptno
;

DEPTNO   COUNT(*)
---------- ----------
30          6
20          5
10          3

3 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4067220884

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     3 |     9 |     6 |
|   1 |  HASH GROUP BY     |      |     3 |     9 |     6 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |    42 |     2 |
-----------------------------------------------------------

In principle, the optimizer should have rewritten the first query in the second form and optimized it – but there’s clearly something out of order in the optimisation. As you can see, the result is no longer in the same order (but we don’t expect any specific order, of course, since we didn’t end with an “order by” clause) and the aggregation step is a hash group by rather than a sort group by.

Footnote:

I know that I got this example off someone else, but all I’ve written into my script is a URL, and it’s the wrong URL – my clipboard buffer seems to lag a little sometimes when I’m copying from the host to a virtual machine. The URL I’ve got is to an article by Steven Pinker in the New Republic, and it had nothing to do with Oracle (though it was comparing science and the humanities).

If you’re the original source of the example, please let me know so that I can add an acknowledgement.

Test case

If you would like to reproduce the example, here’s the script – it’s a simple hack of the Scott emp/dept schema.

CREATE TABLE EMP (
EMPNO		NUMBER(4) NOT NULL,
ENAME		VARCHAR2(10),
JOB		VARCHAR2(9),
MGR		NUMBER(4),
HIREDATE	DATE,
SAL		NUMBER(7, 2),
COMM		NUMBER(7, 2),
DEPTNO		NUMBER(2)
);

insert into emp values
(7369, 'SMITH',  'CLERK',     7902,
to_date('17-DEC-1980', 'DD-MON-YYYY'),  800, NULL, 20);

insert into emp values
(7499, 'ALLEN',  'SALESMAN',  7698,
to_date('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);

insert into emp values
(7521, 'WARD',   'SALESMAN',  7698,
to_date('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);

insert into emp values
(7566, 'JONES',  'MANAGER',   7839,
to_date('2-APR-1981', 'DD-MON-YYYY'),  2975, NULL, 20);

insert into emp values
(7654, 'MARTIN', 'SALESMAN',  7698,
to_date('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);

insert into emp values
(7698, 'BLAKE',  'MANAGER',   7839,
to_date('1-MAY-1981', 'DD-MON-YYYY'),  2850, NULL, 30);

insert into emp values
(7782, 'CLARK',  'MANAGER',   7839,
to_date('9-JUN-1981', 'DD-MON-YYYY'),  2450, NULL, 10);

insert into emp values
(7788, 'SCOTT',  'ANALYST',   7566,
to_date('09-DEC-1982', 'DD-MON-YYYY'), 3000, NULL, 20);

insert into emp values
(7839, 'KING',   'PRESIDENT', NULL,
to_date('17-NOV-1981', 'DD-MON-YYYY'), 5000, NULL, 10);

insert into emp values
(7844, 'TURNER', 'SALESMAN',  7698,
to_date('8-SEP-1981', 'DD-MON-YYYY'),  1500,    0, 30);

insert into emp values
to_date('12-JAN-1983', 'DD-MON-YYYY'), 1100, NULL, 20);

insert into emp values
(7900, 'JAMES',  'CLERK',     7698,
to_date('3-DEC-1981', 'DD-MON-YYYY'),   950, NULL, 30);

insert into emp values
(7902, 'FORD',   'ANALYST',   7566,
to_date('3-DEC-1981', 'DD-MON-YYYY'),  3000, NULL, 20);

insert into emp values
(7934, 'MILLER', 'CLERK',     7782,
to_date('23-JAN-1982', 'DD-MON-YYYY'), 1300, NULL, 10);

CREATE TABLE DEPT(
DEPTNO	NUMBER(2),
DNAME	VARCHAR2(14),
LOC	VARCHAR2(13)
);

insert into dept values (10, 'ACCOUNTING', 'NEW YORK');
insert into dept values (20, 'RESEARCH',   'DALLAS');
insert into dept values (30, 'SALES',      'CHICAGO');
insert into dept values (40, 'OPERATIONS', 'BOSTON');

commit;

alter table emp add constraint emp_pk primary key(empno);
alter table dept add constraint dept_pk primary key(deptno);

August 6, 2013

12c subquery factoring

Filed under: 12c,Bugs,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 8:08 am BST Aug 6,2013

I’ve written a few notes about anomalies in subquery factoring (with subquery) in the past, principally making a fuss about the fact that moving an inline view into a “with subquery” can cause a plan to change even when the internal code moves the subquery back in line. With the arrival of 12c one of my first sets of tests was to rerun all the examples to see how many of them had been addressed. I hadn’t written about as many examples as I had thought, and some of them had been fixed before 12c, but here are few references to a couple of outstanding items that I thought worth a mention:

May 24, 2012

Subquery Factoring

Filed under: Execution plans,Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:37 pm BST May 24,2012

I have a small collection of postings where I’ve described anomalies or limitations in subquery factoring (the “with subquery”, or Common Table Expression (CTE) to give it the official ANSI name). Here’s another example of Oracle’s code not behaving consistently. You may recognise the basic query from yesterday’s example of logical tuning – so I won’t reprint the code to generate the data sets. This examples in this note were created on 11.2.0.2 – we start with a simple query and its execution plan:
(more…)

February 14, 2012

Subquery Factoring

Filed under: Bugs,CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:59 pm BST Feb 14,2012

Here’s an interesting little conundrum about subquery factoring that hasn’t changed in the recent (11.2.0.3) patch for subquery factoring. It came to me from Jared Still (a fellow member of Oak Table Network) shortly after I’d made some comments about the patch. It’s an example based on the scott/tiger schema – which I’ve extracted from the script \$ORACLE_HOME/rdbms/admin/utlsampl.sql (though the relevant scripts may be demobld.sql or scott.sql, depending on version).
(more…)

February 1, 2012

Subquery Factoring

Filed under: CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:52 pm BST Feb 1,2012

It’s always worth browsing through the list of Oracle’s bug fixes each time a new release or patch comes out because it can give you clues about where to look for problems in your production release – and how to anticipate problems on the upgrade. This article is an example of a fix that I found while while looking at the note for 11.2.0.3 (MOS licence required for link) quite recently.

December 8, 2011

Test Data

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 6:31 pm BST Dec 8,2011

The UKOUG conference is over for another year – but it has left me with plenty to do and lots of things to investigate. Here’s just one little point that I picked up during one of the 10 minute “Oak Talks” that members of the Oak Table Network were doing in the lunch breaks.

There is a fairly well-known strategy for generating a list of numbers by using a “select from dual … connect by …” query, but I hadn’t realised that there were two ways of using it. The code I’ve usually used is this:

select
rownum id
from
dual
connect by
rownum <= 4000
;

September 13, 2010

Subquery Factoring (4)

Filed under: CBO,Oracle,Subquery Factoring,Troubleshooting,Tuning — Jonathan Lewis @ 6:38 pm BST Sep 13,2010

I’ve written before about the effects of subquery factoring (common table expressions – or CTEs) on the optimizer, and the way that the optimizer can “lose” some strategies when you start factoring out subquery expressions. Here’s another example I came across quite recently. It involved a join of about 15 tables so I’ve only extracted a few lines from the SQL and resulting execution plans.

We start with the original query, which had factored out an aggregate subquery then used it in place of an inline view:
(more…)

June 29, 2010

Subquery Factoring (3)

Filed under: CBO,Execution plans,Oracle,Performance,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 6:28 pm BST Jun 29,2010

From time to time I’ve warned people that subquery factoring should be used with a little care if all you’re trying to do is make a query more readable by extracting parts of the SQL into “factored subqueries” (or Common Table Expressions – CTEs – if you want to use the ANSI term for them). In principle, for example, the following two queries should produce the same  execution plan:
(more…)

July 26, 2007

Subquery Factoring (2)

Filed under: Hints,Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 8:24 pm BST Jul 26,2007

I’ve written about subquery factoring a few times in the past and commented on the use of the /*+ materialize */ hint. Recently I had time to think about what it would take for the Cost Based Optimizer to decide to materialize a subquery without hinting.

I doubt if I have a complete answer yet, and I sometimes wonder if the optimizer code for handling subquery factoring is not yet complete, but my observations are as follows.

October 30, 2006

Subquery Factoring

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 7:52 am BST Oct 30,2006

A recent newsgroup question asked whether or not the “with subquery” clause – introduced as part of the select statement in 9i –  could also be used with the merge statement. The answer is yes, provided you remember that the merge statement  includes a select, and the subquery belongs to the select. The following, for example, is valid syntax:

merge
into old_data od
using	(
with m_subq as(
select
*
from new_data
where mod(id,50) = 0
)
select * from m_subq
) nd
on	(
od.id = nd.id
and	od.small_vc = nd.small_vc
)
when matched then