Oracle Scratchpad

July 29, 2015

Existence

Filed under: Execution plans,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 1:05 pm BST Jul 29,2015

A recent question on the OTN Database Forum asked:

I need to check if at least one record present in table before processing rest of the statements in my PL/SQL procedure. Is there an efficient way to achieve that considering that the table is having huge number of records like 10K.

I don’t think many readers of the forum would consider 10K to be a huge number of records; nevertheless it is a question that could reasonably be asked, and should prompt a little discssion.

First question to ask, of course is: how often do you do this and how important is it to be as efficient as possible. We don’t want to waste a couple of days of coding and testing to save five seconds every 24 hours. Some context is needed before charging into high-tech geek solution mode.

Next question is: what’s wrong with writing code that just does the job, and if it finds that the job is complete after zero rows then you haven’t wasted any effort. This seems reasonable in (say) a PL/SQL environment where we might discuss the following pair of strategies:


Option 1:
=========
-- execute a select statement to see in any rows exist

if (flag is set to show rows) then
    for r in (select all the rows) loop
        do something for each row
    end loop;
end if;

Option 2:
=========
for r in (select all the rows) loop
    do something for each row;
end loop;

If this is the type of activity you have to do then it does seem reasonable to question the sense of putting in an extra statement to see if there are any rows to process before processing them. But there is a possibly justification for doing this. The query to find just one row may produce a very efficient execution plan, while the query to find all the rows may have to do something much less efficient even when (eventually) it finds that there is no data. Think of the differences you often see between a first_rows_1 plan and an all_rows plan; think about how Oracle can use index-only access paths and table elimination – if you’re only checking for existence you may be able to produce a MUCH faster plan than you can for selecting the whole of the first row.

Next question, if you think that there is a performance benefit from the two-stage approach: is the performance gain worth the cost (and risk) of adding a near-duplicate statement to the code – that’s two statements that have to be maintained every time you make a change. Maybe it’s worth “wasting” a few seconds on every execution to avoid getting the wrong results (or an odd extra hour of programmer time) once every few months. Bear in mind, also, that the optimizer now has to optimize two statement instead of one – you may not notice the extra CPU usage in testing but perhaps in the live environment the execution benefit will be eroded by the optimization cost.

Next question, if you still think that the two-stage process is a good idea: will it result in an inconsistent database state ?! If you select and find a row, then run and find that there are no rows to process because something modified and “hid” the row you found on the first pass – what are you going to do. Will this make the program crash ? Will it produce an erroneous result on this run, or will a silent side effect be that the next run will produce the wrong results. (See Billy Verreynne’s comment on the original post). Should you set the session to “serializable” before you start the program, or maybe lock a critical table to make sure it can’t change.

So, assuming you’ve decided that some form of “check for existence then do the job” is both desirable and safe, what’s the most efficient strategy. Here’s one of the smarter solutions that miminises risk and effort (in this case using a pl/sql environment).


select  count(*)
into    m_counter
from    dual
where   exists ({your original driving select statement})
;

if m_counter = 0 then
    null;
else
    for c1 in {your original driving select statement} loop
        -- do whatever
    end loop;
end if;

The reason I describe this solution as smarter, with minimum risk and effort, is that (a) you use EXACTLY the same SQL statement in both locations so there should be no need to worry about making the same effective changes twice to two slightly different bits of SQL and (b) the optimizer will recognise the significance of the existence test and run in first_rows_1 mode with maximum join elimination and avoidance of redundant table visits. Here’s a little data set I can use to demonstrate the principle:


create table t1
as
select
        mod(rownum,200)         n1,     -- scattered data
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 10000
;

delete from t1 where n1 = 100;
commit;

create index t1_i1 on t1(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                cascade => true,
                method_opt => 'for all columns size 1'
        );
end;
/

It’s just a simple table with index, but the index isn’t very good for finding the data – it’s repetitive data widely scattered through the table: 10,000 rows with only 200 distinct values. But check what happens when you do the dual existence test – first we run our “driving” query to show the plan that the optimizer would choose for it, then we run with the existence test to show the different strategy the optimizer takes when the driving query is embedded:


alter session set statistics_level = all;

select  *
from    t1
where   n1 = 100
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

select  count(*)
from    dual
where   exists (
                select * from t1 where n1 = 100
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

Notice how I’ve enabled rowsource execution statistics and pulled the execution plans from memory with their execution statistics. Here they are:


select * from t1 where n1 = 100

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    38 (100)|      0 |00:00:00.01 |     274 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |     50 |    38   (3)|      0 |00:00:00.01 |     274 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=100)

select count(*) from dual where exists (   select * from t1 where n1 = 100  )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     3 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|*  2 |   FILTER           |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|   3 |    FAST DUAL       |       |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |
|*  4 |    INDEX RANGE SCAN| T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   4 - access("N1"=100)

For the original query the optimizer did a full tablescan – that was the most efficient path. For the existence test the optimizer decided it didn’t need to visit the table for “*” and it would be quicker to use an index range scan to access the data and stop after one row. Note, in particular, that the scan of the dual table didn’t even start – in effect we’ve got all the benefits of a “select {minimum set of columns} where rownum = 1″ query, without having to work out what that minimum set of columns was.

But there’s an even more cunning option – remember that we didn’t scan dual when when there were no matching rows:


for c1 in (

        with driving as (
                select  /*+ inline */
                        *
                from    t1
        )
        select  /*+ track this */
                *
        from
                driving d1
        where
                n1 = 100
        and     exists (
                        select
                                *
                        from    driving d2
                        where   n1 = 100
                );
) loop

    -- do your thing

end loop;

In this specific case the subquery would automatically go inline, so the hint here is actually redundant; in general you’re likely to find the optimizer materializing your subquery and bypassing the cunning strategy if you don’t use the hint. (One of the cases where subquery factoring doesn’t automatically materialize is when you have no WHERE clause in the subquery.)

Here’s the execution plan pulled from memory (after running this SQL through an anonymous PL/SQL block):


SQL_ID  7cvfcv3zarbyg, child number 0
-------------------------------------
WITH DRIVING AS ( SELECT /*+ inline */ * FROM T1 ) SELECT /*+ track
this */ * FROM DRIVING D1 WHERE N1 = 100 AND EXISTS ( SELECT * FROM
DRIVING D2 WHERE N1 = 100 )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |    39 (100)|      0 |00:00:00.01 |       2 |
|*  1 |  FILTER            |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|*  2 |   TABLE ACCESS FULL| T1    |      0 |     50 |    38   (3)|      0 |00:00:00.01 |       0 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("T1"."N1"=100)
   3 - access("T1"."N1"=100)

You’ve got just one statement – and you’ve only got one version of the complicated text because you put it into a factored subquery; but the optimizer manages to use one access path for one instantiation of the text and a different one for the other. You get an efficient test for existence and only run the main query if some suitable data exists, and the whole thing is entirely read-consistent.

I have to say, though, I can’t quite make myself 100% enthusiastic about this code strategy – there’s just a nagging little doubt that the optimizer might come up with some insanely clever trick to try and transform the existence test into something that’s supposed to be faster but does a lot more work; but maybe that’s only likely to happen on an upgrade, which is when you’d be testing everything very carefully anyway (wouldn’t you) and you’ve got the “dual/exists” fallback position if necessary.

Footnote:

Does anyone remember the thing about reading execution plan “first child first” – this existence test is one of the interesting cases where it’s not the first child of a parent operation that runs first: it’s the case I call the “constant subquery”.

July 27, 2015

Subquery Factoring (10)

Filed under: Bugs,CBO,Oracle,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 1:26 pm BST Jul 27,2015

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my playing around investigation in 12.1.0.2 – first some sample data:


create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          =&gt; user,
                tabname          =&gt;'T1',
                method_opt       =&gt; 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner &gt; 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization – and that’s what happens (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table. The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer were smart enough to assume that the temporary table would be in the cache for the second scan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded).

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples. Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

July 24, 2015

Subquery Factoring (9)

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 12:34 pm BST Jul 24,2015

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:


create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum &lt;= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


============================
Unhinted - won't materialize
============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved. If I add the predicate “where object_id > 0″ (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule “two or more appearances of the subquery and some indication of row selection in the subquery rowsource”. (In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table.

February 8, 2015

Functions & Subqueries

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 4:12 am BST Feb 8,2015

I think the “mini-series” is a really nice blogging concept – it can pull together a number of short articles to offer a much better learning experience for the reader than they could get from the random collection of sound-bites that so often typifies an internet search; so here’s my recommendation for this week’s mini-series: a set of articles by Sayan Malakshinov a couple of years ago comparing the behaviour of Deterministic Functions and Scalar Subquery Caching.

http://orasql.org/2013/02/10/deterministic-function-vs-scalar-subquery-caching-part-1/

http://orasql.org/2013/02/11/deterministic-function-vs-scalar-subquery-caching-part-2/

http://orasql.org/2013/03/13/deterministic-function-vs-scalar-subquery-caching-part-3/

Footnote:
Although I’ve labelled it as “this week’s” series, I wouldn’t want you to assume that I’ll be trying to find a new mini-series every week.

Footnote 2:
I had obviously expected to publish this note a long time ago – but must have forgotten about it. I was prompted to search my blog for “deterministic” very recently thanks to a recent note on the OTN database forum and discovered both this note and an incomplete note about improving the speed of creating function-based indexes by tweaking hidden parameters – which I might yet publish, although if you read all of Sayan’s articles you’ll find the solution anyway.

 

February 16, 2014

Recursive subquery factoring

Filed under: Hints,Ignoring Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:11 pm BST Feb 16,2014

This is possibly my longest title to date – I try to keep them short enough to fit the right hand column of the blog without wrapping – but I couldn’t think of a good way to shorten it (Personally I prefer to use the expression CTE – common table expression – over “factored subquery” or “subquery factoring” or “with subquery”, and that would have achieved my goal, but might not have meant anything to most people.)

If you haven’t come across them before, recursive CTEs appeared in 11.2, are in the ANSI standard, and are (probably) viewed by Oracle as the strategic replacement for “connect by” queries. Here, to get things started, is a simple (and silly) example:

(more…)

August 6, 2013

12c subquery factoring

Filed under: 12c,Bugs,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 8:08 am BST Aug 6,2013

I’ve written a few notes about anomalies in subquery factoring (with subquery) in the past, principally making a fuss about the fact that moving an inline view into a “with subquery” can cause a plan to change even when the internal code moves the subquery back in line. With the arrival of 12c one of my first sets of tests was to rerun all the examples to see how many of them had been addressed. I hadn’t written about as many examples as I had thought, and some of them had been fixed before 12c, but here are few references to a couple of outstanding items that I thought worth a mention:

[Further reading on “subquery factoring”]

May 24, 2012

Subquery Factoring (8)

Filed under: Execution plans,Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:37 pm BST May 24,2012

I have a small collection of postings where I’ve described anomalies or limitations in subquery factoring (the “with subquery”, or Common Table Expression (CTE) to give it the official ANSI name). Here’s another example of Oracle’s code not behaving consistently. You may recognise the basic query from yesterday’s example of logical tuning – so I won’t reprint the code to generate the data sets. This examples in this note were created on 11.2.0.2 – we start with a simple query and its execution plan:
(more…)

February 16, 2012

Subquery Factoring (7)

Filed under: Hints,Infrastructure,Oracle,Subquery Factoring,Tuning,Upgrades — Jonathan Lewis @ 5:03 pm BST Feb 16,2012

When I wrote a note last week about the fixes to the subquery factoring optimizer code in 11.2.0.3, I finished with a comment about having more to say on the test case if I materialized the subquery. Today’s the day to talk about it. As a reminder, here’s the query, but with the /*+ materialize */ hint in place:
(more…)

February 14, 2012

Subquery Factoring (6)

Filed under: Bugs,CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:59 pm BST Feb 14,2012

Here’s an interesting little conundrum about subquery factoring that hasn’t changed in the recent (11.2.0.3) patch for subquery factoring. It came to me from Jared Still (a fellow member of Oak Table Network) shortly after I’d made some comments about the patch. It’s an example based on the scott/tiger schema – which I’ve extracted from the script $ORACLE_HOME/rdbms/admin/utlsampl.sql (though the relevant scripts may be demobld.sql or scott.sql, depending on version).
(more…)

February 1, 2012

Subquery Factoring (5)

Filed under: CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:52 pm BST Feb 1,2012

It’s always worth browsing through the list of Oracle’s bug fixes each time a new release or patch comes out because it can give you clues about where to look for problems in your production release – and how to anticipate problems on the upgrade. This article is an example of a fix that I found while while looking at the note for 11.2.0.3 (MOS licence required for link) quite recently.

(more…)

December 8, 2011

Test Data

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 6:31 pm BST Dec 8,2011

The UKOUG conference is over for another year – but it has left me with plenty to do and lots of things to investigate. Here’s just one little point that I picked up during one of the 10 minute “Oak Talks” that members of the Oak Table Network were doing in the lunch breaks.

There is a fairly well-known strategy for generating a list of numbers by using a “select from dual … connect by …” query, but I hadn’t realised that there were two ways of using it. The code I’ve usually used is this:

select
        rownum id
from
        dual
connect by
        rownum <= 4000
;

(more…)

September 13, 2010

Subquery Factoring (4)

Filed under: CBO,Oracle,Subquery Factoring,Troubleshooting,Tuning — Jonathan Lewis @ 6:38 pm BST Sep 13,2010

I’ve written before about the effects of subquery factoring (common table expressions – or CTEs) on the optimizer, and the way that the optimizer can “lose” some strategies when you start factoring out subquery expressions. Here’s another example I came across quite recently. It involved a join of about 15 tables so I’ve only extracted a few lines from the SQL and resulting execution plans.

We start with the original query, which had factored out an aggregate subquery then used it in place of an inline view:
(more…)

June 29, 2010

Subquery Factoring (3)

Filed under: CBO,Execution plans,Oracle,Performance,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 6:28 pm BST Jun 29,2010

From time to time I’ve warned people that subquery factoring should be used with a little care if all you’re trying to do is make a query more readable by extracting parts of the SQL into “factored subqueries” (or Common Table Expressions – CTEs – if you want to use the ANSI term for them). In principle, for example, the following two queries should produce the same  execution plan:
(more…)

July 26, 2007

Subquery Factoring (2)

Filed under: CBO,Hints,Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 8:24 pm BST Jul 26,2007

I’ve written about subquery factoring a few times in the past and commented on the use of the /*+ materialize */ hint. Recently I had time to think about what it would take for the Cost Based Optimizer to decide to materialize a subquery without hinting.

I doubt if I have a complete answer yet, and I sometimes wonder if the optimizer code for handling subquery factoring is not yet complete, but my observations are as follows.

(more…)

October 30, 2006

Subquery Factoring

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 7:52 am BST Oct 30,2006

A recent newsgroup question asked whether or not the “with subquery” clause – introduced as part of the select statement in 9i –  could also be used with the merge statement. The answer is yes, provided you remember that the merge statement  includes a select, and the subquery belongs to the select. The following, for example, is valid syntax:

merge
	into old_data od
using	(
		with m_subq as(
		select
			*
		from new_data
		where mod(id,50) = 0
		)
		select * from m_subq
	) nd
on	(
	 	od.id = nd.id
	 and	od.small_vc = nd.small_vc
	)
when matched then
	update set od.padding = nd.padding
when not matched then
	insert values ( nd.id, nd.small_vc, nd.padding )
;

Remember, before you get too excited by this option, that if the subquery is materialized, Oracle will create an “in memory metadata” global temporary table and write the result of the subquery to your temporary tablespace using direct path writes before reading it back with db file scattered reads.

[Further reading on “subquery factoring”]

 

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,216 other followers