Oracle Scratchpad

October 9, 2020

Inline Hint

Filed under: 18c,CBO,Execution plans,Hints,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 12:46 pm BST Oct 9,2020

If you’ve ever used subquery factoring (“with” subqueries or common table expressions (CTEs) as they are often called) then you’re probably aware of the (undocumented) hints /*+ materialize */ , which forces Oracle to create a local temporary table to hold the result of the subquery for subsequent use, and /*+ inline */, which forces the optimizer to copy the text of the subquery into the body of the query before starting the optimisation phase.

There’s a small, but important, enhancement to these hints that appeared in Oracle 18. Like so many other hints in Oracle they can now have a query block name as a “parameter”, so you can use them at the top level of your query. Here’s some code to demonstrate:

rem
rem     Script:         inline_hint.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1  -- hints don't have any effect
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
/

create index t1_i1 on t1(object_id);

create table t2
as
select  *
from    t1
/

create index t2_i1 on t2(object_id);

spool inline_hint.lst


explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                inline(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and     exists (
                select
                        null
                from
                        v1      v1b
                where
                        v1b.object_id = t1.object_id
                and     v1b.object_name like 'WRI%'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                materialize(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

The first of these two queries uses the factored subquery twice so, by default, it will create a “cursor duration memory” temporary table to hold the results of the subquery and then use that temporary table twice in the execution plan.

Conversely the second query uses the factored subquery just once, so the optimizer’s default action will be to copy the text into the body of the main query and optimize the whole thing as a single query block.

To reverse the default behaviour in versions of Oracle up to 12.2.0.1 (though later patch sets may include the 18c enhancements) you could add the /*+ inline */ or /*+ materialize */ hints respectively to the factored subqueries; but my demonstration you can see that I’ve given the factored subquery a query block name and added the relevant hint to the main query block passing in the query block name of the factored subquery – hence /*+ inline(@cte) */ and /*+ materialize(@cte) */.

Here – from 19.3 – are the resulting execution plans (with some cosmetic editing) – first the plan with the inline() hint.

------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |     1 |    63 |     9  (12)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                     |       |     1 |    63 |     9  (12)| 00:00:01 |
|   2 |   NESTED LOOPS                         |       |     1 |    50 |     7  (15)| 00:00:01 |
|   3 |    SORT UNIQUE                         |       |     1 |    25 |     4   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |    25 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T2_I1 |    48 |       |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED | T1    |     1 |    25 |     2   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN                   | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  8 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |     1 |    13 |     2   (0)| 00:00:01 |
|*  9 |    INDEX RANGE SCAN                    | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$06B48120
   4 - SEL$06B48120 / GTT1@CTE
   5 - SEL$06B48120 / GTT1@CTE
   6 - SEL$06B48120 / T1@MAIN
   7 - SEL$06B48120 / T1@MAIN
   8 - SEL$06B48120 / GTT1@CTE
   9 - SEL$06B48120 / GTT1@CTE

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OBJECT_NAME" LIKE 'WRI%')
   5 - access("OBJECT_ID">=100 AND "OBJECT_ID"<=200)
   7 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("T1"."OBJECT_ID"<=200 AND "T1"."OBJECT_ID">=100)
   8 - filter("OBJECT_TYPE"='TABLE')
   9 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("OBJECT_ID"<=200 AND "OBJECT_ID">=100)

As you can see Oracle has copied the subquery text into the main body of the text and then optimized to produce a three-table join. One of the subqueries has been unnested into an aggregate view (operations 3,4,5), the other has been transformed into a semi-join.

In passing you’ll also notice that the optimizer has used transitive closure to add the range predicate on t1 to both occurrences of the t2 table.

And here’s the plan for the query with the single use of the subquery and materialize() hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  2448 |    39   (8)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6611_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  2448 |    13  (16)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED   | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                     | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  7 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|   8 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6611_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A3F38ADC
   2 - CTE
   3 - CTE          / GTT1@CTE
   5 - SEL$A3F38ADC / T1@MAIN
   6 - SEL$A3F38ADC / T1@MAIN
   7 - SEL$AA28F105 / V1A@SEL$1
   8 - SEL$AA28F105 / T1@SEL$AA28F105

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   6 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   7 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

In this plan the optimizer has created an in-memory temporary table and then used it in the existence subquery – which it has then transformed into a semi-join, so we have a query block with the name SEL$A3F38ADC; but we also see that the query block CTE still exists, labelling the operations that Oracle used to populate the temporary table.

It is an interesting (and irritating) detail that when we look at object aliases we see (operation 8) that Oracle has given the temporary table the alias of t1 – which is just a little confusing since I actually have a table called t1!

Next Steps

Being able to nominate a query block for the inline() and materialize() hints may be of great help in some cases (there’s a recent example on the Oracle Developer Forum (may need a MOS login) where it might make a huge difference to the performance of a particular query without requiring a rewrite of the SQL).

But there are a couple of details to investigate. First, I had a query block name built into my factored subquery – what happens if the author of the SQL didn’t include a query block name?

Before I’d added the inline() hint and query block names in the first example above this is what the plan looked like:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  6240 |    48  (11)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6612_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  6240 |    21  (15)| 00:00:01 |
|*  5 |    HASH JOIN SEMI                        |                            |    48 |  4992 |    13  (16)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED  | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                    | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  8 |     VIEW                                 |                            | 10000 |   771K|     8  (13)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
|* 10 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|  11 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A317D234
   2 - SEL$1
   3 - SEL$1        / GTT1@SEL$1
   6 - SEL$A317D234 / T1@SEL$2
   7 - SEL$A317D234 / T1@SEL$2
   8 - SEL$D67CB2D2 / V1B@SEL$4
   9 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
  10 - SEL$D67CB2D3 / V1A@SEL$3
  11 - SEL$D67CB2D3 / T1@SEL$D67CB2D3

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - access("V1B"."OBJECT_ID"="T1"."OBJECT_ID")
   7 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   8 - filter("V1B"."OBJECT_NAME" LIKE 'WRI%' AND "V1B"."OBJECT_ID">=100 AND "V1B"."OBJECT_ID"<=200)
  10 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

As you can see, the factored subquery (operations 2 and 3) has the query block name of sel$1 and the main query (operations 6 an 7 where the real t1 is used) has the query block name sel$2. So without giving the subquery a name I could have used the hint /*+ inline(@sel$1) */ in the main query block.

This takes us on to the second point that needs investigation. If you’ve looked at the example on the Oracle Developer Forum you will have seen that there’s an SQL statement that references a stored view and the factored subquery of interest is defined in the view. This means we might be able to edit the query that calls the view to include a hint referencing the query block inside the view – but then what do we do if we can’t edit the main query itself?

To be investigated (1) – would the inline() hint with nominated query block work if the factored subquery was inside a stored view that we were using in our query?

To be investigated(2) – if (1) works, could we achieve the same result by using an SQL Patch to attach the hint to the main query text without editing the main query?

Update (Oct 2020)

It turns out that I discovered this enhancement a few months ago while doing some experimentation with recursive subquery factoring.

Update Nov 2020

A blog note from Nenad Noveljic warns of a surprising ORA-07445 if you get too trigger-happy with the inline() and materialize() hints.

July 10, 2020

Recursive WITH upgrade

Filed under: ANSI Standard,CBO,Execution plans,Oracle,Subquery Factoring,Upgrades — Jonathan Lewis @ 4:19 pm BST Jul 10,2020

There’s a notable change in the way the optimizer does cost and cardinality calculations for recursive subquery factoring that may make some of your execution plans change – with a massive impact on performance – as you upgrade to any version of Oracle from 12.2.0.1 onwards. The problem appeared in a question on the Oracle Developer Community forum a little while ago, with a demonstration script to model the issue.

I’ve copied the script – with a little editing – and reproduced the change in execution plan described by the OP. Here’s my copy of the script, with the insert statements that generate the data (all 1,580 of them) removed.

rem
rem     Script:         recursive_with_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem
rem     Notes:
rem     https://community.oracle.com/thread/4338248
rem
rem     The upgrade to 12.2.0.1 made this query much slower (on 15,000 rows)
rem     Setting OFE to 12.1.0.1 is a first possible fix for the issue.
rem     The scale is too small to see much difference in this case
rem

drop table test_folder purge;

create table test_folder(
        fldr_key                number(16,0)            not null        enable,                 
        fldr_id                 varchar2(255 byte)      not null        enable,                 
        fldr_desc_tx            varchar2(255 byte),                     
        par_fldr_key            number(16,0),                   
        seus_key                number(16,0)            not null        enable,                 
        fldr_private_flg        varchar2(1 byte)        not null        enable,                 
        last_updt_dt            date                    not null        enable,                 
        last_upby_seus_key      number(16,0)            not null        enable,                 
        lock_seq_nbr            number(9,0) default 0   not null        enable,                 
        content_guid            raw(16),                
        constraint test_folder_pk primary key (fldr_key)                
)       
;              

-- list of insert statements

alter table test_folder add constraint test_folder_fk  
        foreign key (par_fldr_key) references test_folder(fldr_key)
;  
  
create or replace force editionable view test_folder_vw (fldr_key) as   
with rec_path(fldr_key)  as (
        select  tf.fldr_key  
        from    test_folder tf where tf.par_fldr_key is null  
        union all  
        select  tf.fldr_key  
        from    test_folder tf, rec_path  
        where   rec_path.fldr_key = tf.par_fldr_key
)  
select fldr_key  
from rec_path   
; 

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'TEST_FOLDER',
                method_opt  => 'for all columns size 1'
        );
end;
/


select * from test_folder_vw where fldr_key = -41;  

I’ve run the test 3 times. First in 12.2.0.1 with no tweaking; then in 12.2.0.1 with the hint /*+ optimizer_features_enable(‘12.1.0.2’) */ and finally in a genuine 12.1.0.2 environment. In all three cases I enabled rowsource execution stats (‘alter session set statistics_level = all’) and pulled the plans from memory – with the following results

First, the base result from 12.1.0.2

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.03 |     604 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.03 |     604 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.03 |     604 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      68 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.02 |     536 |  1696K|  1696K| 1488K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|*  6 |     TABLE ACCESS FULL                     | TEST_FOLDER |      8 |   1419 |  11352 |00:00:00.01 |     536 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

Two points to note, in particular. First that the hash join has the recursive with pump as its first (build table) child and the table access full of test_folder as its second child (probe table); secondly that there is no value given for E-Rows for the recursive with pump.

Now the 12.2.0.1 plan:

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |   2524K|      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |   2524K|   1419 |00:00:00.01 |      23 |  1743K|  1743K| 1632K (0)|
|   5 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  6 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
|   7 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41)
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

There are three changes to notice in this plan – which (for the OP) was much slower than the 12.1.0.2 plan. First, the order of the hash join has changed, the recursive with pump is now the second child (probe table) in the join (and again shows no value for E-Rows); secondly that Oracle has introduced an extra operation – the buffer sort (reuse) – populated by the table access full – as the build table; thirdly (presumably the point of buffer sort (reuse) operation) the number of buffer visits has dropped from a few hundred to a couple of dozen.

Finally let’s check what happens if we set the OFE (optimizer_features_enable) to 12.1.0.2 while running 12.2.0.1

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.01 |      23 |  1797K|  1797K| 1573K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|   6 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  7 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   7 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

In these conditions the recursive with pump has gone back to being the build table (first child); but it’s worth noting that the 12.2 buffer sort (reuse) is still in place – saving us a few hundred buffer gets (and, for a bigger table, a number of disc reads possibly). Downgrading the optimizer_features_enable has given us the plan we needed, but we’ve got an example that shows that hacking the parameter isn’t a guarantee that we will get exactly the path we used to get in the older version.

The story so far.

It seems that we can address the performance problem that the OP had by setting the optimzer_feature_enable to the older version – possibly through a hint embedded in the SQL, perhaps through an SQL Baseline or SQL Patch. Maybe we’ll have to have a logon trigger that sets the parameter for particular users or, worst case scenario, maybe we’ll have to set the parameter at the system level. Given how undesirable the last option could be it would be nice to know exactly what is causing the change in plan.

As a basic clue – if the order of tables in a hash join reverses itself this usually means that the byte (not row) estimates have changed. The optimizer will use the table with the lower byte count as the build table in a hash join. So the recursive with pump – whose row and byte estimates don’t appear – must have produced larger numbers in 12.2.0.1.

A change in the 12.2 plan that I haven’t yet mentioned is the E-rows for the hash join; it’s gone up from 640 (12.1.0.2) to 2.5 million! So let’s repeat the tests with the CBO (10053) trace file enabled and see if we can find a number like 2524K appearing as a join estimate in the trace file. Having created the two trace files (in 12.2.0.1, one with the OFE set backwards) I executed the following grep command against the trace files:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc

I’d set the tracefile_identifier to ‘ofe’ and ‘base’ respectively for the 12.1.0.2 and 12.2.0.1 tests, and here are the results:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
1166:Join Card - Rounded: 640 Computed: 639.941176

grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc
1195:Join Card - Rounded: 640 Computed: 639.941176
1391:Join Card - Rounded: 2544 Computed: 2543.865546
1576:Join Card - Rounded: 10112 Computed: 10111.865546
1737:Join Card - Rounded: 40193 Computed: 40193.075630
1898:Join Card - Rounded: 159759 Computed: 159758.731092
2059:Join Card - Rounded: 635008 Computed: 635008.462185
2220:Join Card - Rounded: 2524023 Computed: 2524023.394958
2269:Join Card - Rounded: 2524023 Computed: 2524023.394958

That’s an interesting clue. Something seems to be calculating a larger and larger value in the 12.2.0.1 trace, starting with the hash join cardinality that appeared in 12.1.0.2 had, growing by a factor of nearly 4 each time, and ending with the hash join cardinality we saw in the 12.2.0.1 plan.

Taking a closer look at the content of the 12.2.0.1 trace file it turned out that every stage in that escalation was Oracle recalculating the cost and cardinality of joining test_folder (the table) and rec_path (the “with” subquery) using the figures from the previous join calculation as the new base figures for rec_path. In effect the optimizer was calculating the cost of following the recursive subquery down to its 7th level of recursion.

Side note: in agreement with my comment about the smaller (in bytes) rowsource being used as the build table, the initial join order started as (test_folder, rec_path) in the first two iterations, but switched to (rec_path, test_folder) from the 3rd iteration onward.

So we’ve identified the mechanics that cause the change in plan; the question now is: why 7 iterations to the final cost? (Briefly I did a quick check to see how many circles of hell there were in Dante’s Inferno – but it’s 9 (or 10 depending how you count). A quick check of v$parameter (and the x$ tables for the hidden parameters) revealed the following:

Name                                     Value
------------------------------------ ---------
_recursive_with_branch_iterations            7

Setting this parameter to 1 in the session, or adding the hint /*+ opt_param(‘_recursive_with_branch_iterations’ 1) */ to the query resulted in the 12.1.0.2 plan appearing in 12.2.0.1 – and this is a much less intrusive way of getting the plan we’re interested in than setting the entire OFE back to 12.1.0.2. One might even set the parameter in the spfile (after receiving approval from Oracle Corp., of course) given how precisely targeted it is (and how it doesn’t switch off that nice little buffering trick.)

Summary

From 12.2 onward the optimizer does recursive recosting of recursive “with” subqueries. This means the cost and cardinality estimates of a plan may change and the impact may cause a significant change in performance – it certainly did for the OP.

The change seems to be driven by the hidden parameter _recursive_with_branch_iterations, which was introduced in 12.2.0.0 with a default value of 7. Setting this parameter to 1 reverts the optimizer to the pre-12.2 behaviour. If you run into a problem of recursive “with” subqueries changing plans and causing performance problems on an upgrade from pre-12.2 to a newer version of Oracle then it would be worth investigating this parameter as the least intrusive way of reverting back to the original plans.

Update (A few days later)

I’ve just been prompted to check MOS for any references to the hidden parameter – and discovered a note that was published in September 2018, updated in February 2019.  It’s amazing how easy it can be to find an answer on MOS when you already know what the answer is ! Document id 2443466.1 Oracle 12.2.0.1 CBO calculating high cost/CPU for queries with recursive sub-query.

This gives two workarounds to the problem of a change in cost in 12.2 – set the optimizer_features_enable to 12.1.0.2, or set the hidden parameter to 1. It references two bugs (one a duplicate of the other, both apparently unpublished):

  • Bug 23515289 : PERFORMANCE REGRESSION OBSERVED WITH RECURSIVE WITH SERIAL PLAN
  • Bug 24566985 : UPG: QUERY PERFORMANCE ON ALL_TSTZ_TABLES 160 TIMES SLOWER THAN 11.2.0.4

and the Permanent Fix for the problem is to install the patch for Bug 24566985 on 12.2.0.1

 

June 10, 2020

CTE Catalogue

Filed under: Oracle,Subquery Factoring — Jonathan Lewis @ 6:46 pm BST Jun 10,2020

An index of the articles I’ve written about Subquery Factoring (a.k.a. Common Table Expressions / CTEs, a.k.a. “with” subqueries). Articles are listed in reverse order of publication and each article is date stamped, but the list is not yet complete.

September 19, 2017

With Subquery()

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 7:19 pm BST Sep 19,2017

Here’s a little oddity that came up recently on the OTN database forum – an example where a “with” subquery (common table expression / factored subquery) produced a different execution plan from the equivalent statement with the subquery moved to an inline view; tested in 12.1.0.2 and 12.2.0.1. Here are the two variations:


rem
rem     Script:         match_recognize_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2017
rem
rem     Last tested
rem             19.3.0.0
rem             18.1.0.0        via LiveSQL
rem             12.2.0.1
rem             12.1.0.2

with  tbl as (
          select 1 col1, 'a'  col2 from dual
union all select 2 , 'a' from dual
union all select 3 , 'b' from dual
union all select 4 , 'a' from dual
union all select 5 , 'a' from dual
union all select 6 , 'b' from dual
union all select 7 , 'b' from dual
),
lag_data as (
        select col1, col2, lag(col2) over (order by col1) col2a from tbl
)
select  col1, col2
from    lag_data
where   col2a is null or col2a  col2
order by col1
;

with  tbl as (
          select 1 col1, 'a'  col2 from dual
union all select 2 , 'a' from dual
union all select 3 , 'b' from dual
union all select 4 , 'a' from dual
union all select 5 , 'a' from dual
union all select 6 , 'b' from dual
union all select 7 , 'b' from dual
)
select  col1, col2
from    (
        select col1, col2, lag(col2) over (order by col1) col2a from tbl
        )
where   col2a is null or col2a  col2
order by col1
;

You’ll notice that there’s an explicit “order by” clause at the end of both queries. If you want the result set to appear in a specific order you should always specify the order and not assume that it will appear as a side effect; but in this case the ordering for the “order by” clause seems to match the ordering needed for the analytic function, so we might hope that the optimizer would take advantage of the analytic “window sort” and not bother with a “sort order by” clause. But here are the two plans – first with subquery factoring, then with the inline view:


-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    16 (100)|          |
|   1 |  SORT ORDER BY   |      |     7 |    56 |    16  (13)| 00:00:01 |
|*  2 |   VIEW           |      |     7 |    56 |    15   (7)| 00:00:01 |
|   3 |    WINDOW SORT   |      |     7 |    42 |    15   (7)| 00:00:01 |
|   4 |     VIEW         |      |     7 |    42 |    14   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  11 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  12 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL2A" IS NULL OR "COL2A""COL2"

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    15 (100)|          |
|*  1 |  VIEW            |      |     7 |    56 |    15   (7)| 00:00:01 |
|   2 |   WINDOW SORT    |      |     7 |    42 |    15   (7)| 00:00:01 |
|   3 |    VIEW          |      |     7 |    42 |    14   (0)| 00:00:01 |
|   4 |     UNION-ALL    |      |       |       |            |          |
|   5 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|  11 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("COL2A" IS NULL OR "COL2A""COL2"))

The two plans are different, and the difference is an extra “sort order by” operation even though the optimizer has moved the subquery with the analtyic function inline so that in principle both statements are the technically the same and merely cosmetically different.

It’s been some time since I’ve noticed subquery factoring resulting in a change in plan when the expected effect is purely cosmetic. Interestingly, though, the “unparsed query” in the 10053 (CBO) trace file is the same for the two cases with the only difference being the name of a generated view:


SELECT
        "LAG_DATA"."COL1" "COL1","LAG_DATA"."COL2" "COL2"
FROM    (SELECT
                "TBL"."COL1" "COL1","TBL"."COL2" "COL2",
                DECODE(
                        COUNT(*) OVER ( ORDER BY "TBL"."COL1" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
                        1, FIRST_VALUE("TBL"."COL2") OVER ( ORDER BY "TBL"."COL1" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
                           NULL
                ) "COL2A"
        FROM    (
                            (SELECT 1 "COL1",'a' "COL2" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 2 "2",'a' "'A'" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 3 "3",'b' "'B'" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 4 "4",'a' "'A'" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 5 "5",'a' "'A'" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 6 "6",'b' "'B'" FROM "SYS"."DUAL" "DUAL")
                 UNION ALL  (SELECT 7 "7",'b' "'B'" FROM "SYS"."DUAL" "DUAL")
                ) "TBL"
        ) "LAG_DATA"
WHERE
        "LAG_DATA"."COL2A" IS NULL OR "LAG_DATA"."COL2A""LAG_DATA"."COL2"
ORDER BY "LAG_DATA"."COL1"

The above is the unparsed query for the query with two factored subqueries; the only difference in the unparsed query when I moved the analytic subquery inline was that the view name in the above text changed from “LAG_DATA” to “from$_subquery$_008”.

Footnote:

When I used a real table (with the same data) instead of a “union all” factored subquery for the driving data this anomaly disappeared. The union all is a convenient dirty trick for generating very small test data sets on OTN – it remains to be seen whether a more realistic example of multiple factored subqueries would still result in the optimizer losing an opportunity for eliminating a “sort order by” operation.

In passing – did you notice how the optimizer had managed to rewrite a “lag()” analytic function as a form of “first_value()” function with decode?

If you’re wondering about the choice of script name for this test – it’s because I created the script to see if I could produce a match_recognize() version of the requirement; I discovered the sort/order by anomaly as an accidental side-effect.

Update

The same anomaly appears in 18.1 (as tested on LiveSQL), although I have to say that I’m doing an “explain plan” on LiveSQL since it won’t yet let you pull a plan from the library cache so it’s possible that the actual plan and the predicted plan are not identical.

And now (March 2020) a quick test shows that nothing has changed in 19.3.

 

January 8, 2016

CTEs and Updates

Filed under: Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 1:01 pm GMT Jan 8,2016

An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.

The OP told us that the query “select * from view” took 7 minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in just a little more than that 7 minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.

Of course the OP could have given us all the details of the view definition, all the table and index definitions, with stats etc. and asked us if we could make the update run faster – but that could lead to a long and frustrating period of experimentation and testing, and a solution that might increase the general maintenance costs of the system (because a subsequent modification to the view might then have to be echoed into the code that did the update). Setting a strictly limited target that clearly ought to be achievable is (if nothing else) a very good starting point for improving the current situation.

I don’t know (as at the time of writing) if the OP implemented the strategy I suggested, but from his description it looked as if it should have been simple to use subquery factoring with materialization to achieve the required result in the most elegant way possible (meaning, in this case, simple SQL and no change to any surrounding code).

The OP has responded to my suggestion with a comment that “it didn’t work”, but it appeared to me that they were looking at and mis-interpreting the output from a call to “Explain Plan” rather than testing the query and pulling the plan from memory – so I thought I’d build a simple model to demonstrate the principle and show you how you could confirm (beyond just checking the clock) that the strategy had worked.

We start with a table to update, a non-mergeable view, and two tables to make up the non-mergeable view:

rem
rem	Script:		cte_update.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2016
rem

create table t1
as
select
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000  -- > comment to avoid wordpress format issue
;


create table t2
as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000     -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
        );
end;
/

create or replace view v1
as
select distinct
        t1.n1 t1n1, t1.n2 t1n2, t2.n2 t2n2
from
        t1, t2
where
        t1.n1 = t2.n1
;

create table t3
as
select * from v1
;

begin
        dbms_stats.gather_table_stats(
                user,
                't3',
                method_opt => 'for all columns size 1'
        );
end;
/

I’ve created the table t3 by copying the content of the view v1 and I’m going to update every row in t3 from v1; I gathered stats on t1 and t2 before creating the view and table simply to avoid the need for Oracle to do dynamic sampling as it created t3. Depending on your version of Oracle, of course, the stats collections might be redundant.

Having set the scene with the data, here’s the “original” code for doing the required update, followed by its execution plan (pulled from the memory of a 12.1.0.2 instance):


set serveroutput off
set linesize 180
set trimspool on

alter session set statistics_level = all;

spool cte_update

update t3
        set t2n2 = (
                select  v1.t2n2
                from    v1
                where   v1.t1n1 = t3.t1n1
                and     v1.t1n2 = t3.t1n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |       |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   1 |  UPDATE                                  | T3    |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   2 |   TABLE ACCESS FULL                      | T3    |      1 |    200 |    200 |00:00:00.01 |       3 |       |       |          |
|   3 |   VIEW                                   | V1    |    200 |      1 |    200 |00:00:01.22 |   46332 |       |       |          |
|   4 |    SORT UNIQUE                           |       |    200 |      1 |    200 |00:00:01.21 |   46332 |  2048 |  2048 | 2048  (0)|
|   5 |     NESTED LOOPS                         |       |    200 |      1 |  45000 |00:00:01.11 |   46332 |       |       |          |
|   6 |      NESTED LOOPS                        |       |    200 |      1 |  45000 |00:00:00.34 |    1332 |       |       |          |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    200 |      1 |   3000 |00:00:00.02 |     684 |       |       |          |
|*  8 |        INDEX RANGE SCAN                  | T1_I1 |    200 |     15 |   3000 |00:00:00.01 |     408 |       |       |          |
|*  9 |       INDEX RANGE SCAN                   | T2_I1 |   3000 |      1 |  45000 |00:00:00.11 |     648 |       |       |          |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T2    |  45000 |      1 |  45000 |00:00:00.31 |   45000 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - filter("T1"."N2"=:B1)
   8 - access("T1"."N1"=:B1)
   9 - access("T2"."N1"=:B1)
       filter("T1"."N1"="T2"."N1")

Points to note from this execution plan: the VIEW operation at line 3 has started 200 times (there are 200 rows in table t3, the subquery runs once per row); and a simple measure of work done is the 46,745 buffer visits (of which, I can tell you, roughly 400 are current block gets) reported under Buffers in the top line of the plan.

It’s an interesting detail that although Oracle has pushed the correlation predicates inside the view (as shown by the predicate section for operations 7,8 and 9) it doesn’t report the operation at line 3 as “VIEW PUSHED PREDICATE”. It would be nice to see the explicit announcement of predicate pushing here, but that seems to be an expression reserved for pushing join predicates into views – fortunately we always check the Predicate Information, don’t we!

Now let’s see what the SQL and plan look like if we want Oracle to instantiate the entire v1 result set and use that to update the t3 table.

update t3 
        set t2n2 = (
                with v0 as (
                        select
                                /*+ materialize */
                                t1n1, t1n2, t2n2
                        from v1
                )
                select
                        t2n2
                from
                        v0
                where   v0.t1n1 = t3.t1n1
                and     v0.t1n2 = t3.t1n2
        )
;

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT            |                            |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   1 |  UPDATE                     | T3                         |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   2 |   TABLE ACCESS FULL         | T3                         |      1 |    200 |    200 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|   3 |   TEMP TABLE TRANSFORMATION |                            |    200 |        |    200 |00:00:00.18 |     778 |      1 |      1 |       |       |          |
|   4 |    LOAD AS SELECT           |                            |      1 |        |      0 |00:00:00.01 |     171 |      0 |      1 |  1040K|  1040K|          |
|   5 |     VIEW                    | V1                         |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |       |       |          |
|   6 |      HASH UNIQUE            |                            |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |  1558K|  1558K| 3034K (0)|
|*  7 |       HASH JOIN             |                            |      1 |  45000 |  45000 |00:00:00.01 |     168 |      0 |      0 |  1969K|  1969K| 1642K (0)|
|   8 |        TABLE ACCESS FULL    | T1                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|   9 |        TABLE ACCESS FULL    | T2                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|* 10 |    VIEW                     |                            |    200 |  45000 |    200 |00:00:00.17 |     603 |      1 |      0 |       |       |          |
|  11 |     TABLE ACCESS FULL       | SYS_TEMP_0FD9D6618_911FB4C |    200 |  45000 |  40000 |00:00:00.08 |     603 |      1 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1"."N1"="T2"."N1")
  10 - filter(("V0"."T1N1"=:B1 AND "V0"."T1N2"=:B2))

The headline figure to note is that 1,185 Buffer visits – clearly we’ve done something very different (and possibly cheaper and faster, even in this tiny demonstration). Looking at operation 3 we see the “TEMP TABLE TRANSFORMATION”, which tells us that we’ve materialized our factored subquery. There is scope, though, for a little ambiguity and uncertainty – the Starts column for this operation says we started it 200 times, once for each row in t3. We might worry that we’ve actually recreated the result and written it to disc 200 times even though we might then notice that lines 4 – 9 tell us that we loaded the temporary table just once (Starts = 1).

You could take my word for it that we didn’t “do” the temp table transformation 200 time, we merely used the result of the temp table transformation 200 times; but I wasn’t prepared to make this assumption until I had done a little more checking, so there’s no reason why you shouldn’t still be a little suspicious. Lines 4 – 9 do seem to tell us (consistently) that we only load the data once but there have been occasional bugs where counters have been reset to zero when they shouldn’t have been, so even though we see at operation 8 (for example) “1 full tablescan of t1 returning 3,000 rows after visiting 84 buffers” that may mean that Oracle counted the work once and “forgot” to count it the other 199 times.

It’s easy enough to do a quick cross-check. Take a snapshot of v$mystat joined to v$statname before and after runnning the query, and check the difference in buffer visits, tablescans, and tablescan rows gotten – if those figures are broadly consistent with the figures in the execution plan I think we can be reasonably confident that the plan is telling us the truth.

Here’s what I got for a few key figures:

Name                                       Value
----                                       -----
session logical reads                      1,472
db block gets                                412
consistent gets                            1,060
consistent gets from cache                 1,060
db block changes                             410
table scans (short tables)                   205
table scan rows gotten                    46,213
table scan blocks gotten                     366

There are a number of oddities – not to mention version and feature dependent variations – in the numbers and a couple of discrepancies introduced by the code I was using to take the snapshot, but the “table scan rows gotten” figure is particularly easy to see in the execution plan:

46,213 = 3000 (t1) + 3000 (t2) + 200 (t3) + 200 * 200 (temp table)

With a small error the number of “table scans (short tables)” is also consistent with the plan Starts – and that’s perhaps the most important indicator, we scan t1 and t2 just once, and the temp table result 200 times. If we were creating the temp table 200 times we’d have to have done over 400 table scans (200 each for t1 and t2).

I won’t go into the details of how to compare the session logical I/O to the total Buffer gets for the plan – but the figures are in the right ballpark as far as matching is concerned – if the plan was deceiving us about the number of times the temporary table was created (rather than used) the session stats would have to report a figure more like 33,600 (200 * (84 + 84)) consistent gets.

Conclusion

We have managed to reduce the workload from “one view rowsource generation per row” to “one view rowsource instantiation” with a very small change to the SQL. In the case of the OP this should result in a small, easily comprehensible, change in the SQL statement leading to a drop in run-time from several hours to seven minutes – and maybe that’s good enough for the present.

July 29, 2015

Existence

Filed under: Execution plans,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 1:05 pm BST Jul 29,2015

A recent question on the OTN Database Forum asked:

I need to check if at least one record present in table before processing rest of the statements in my PL/SQL procedure. Is there an efficient way to achieve that considering that the table is having huge number of records like 10K.

I don’t think many readers of the forum would consider 10K to be a huge number of records; nevertheless it is a question that could reasonably be asked, and should prompt a little discssion.

First question to ask, of course is: how often do you do this and how important is it to be as efficient as possible. We don’t want to waste a couple of days of coding and testing to save five seconds every 24 hours. Some context is needed before charging into high-tech geek solution mode.

Next question is: what’s wrong with writing code that just does the job, and if it finds that the job is complete after zero rows then you haven’t wasted any effort. This seems reasonable in (say) a PL/SQL environment where we might discuss the following pair of strategies:


Option 1:
=========
-- execute a select statement to see in any rows exist

if (flag is set to show rows) then
    for r in (select all the rows) loop
        do something for each row
    end loop;
end if;

Option 2:
=========
for r in (select all the rows) loop
    do something for each row;
end loop;

If this is the type of activity you have to do then it does seem reasonable to question the sense of putting in an extra statement to see if there are any rows to process before processing them. But there is a possibly justification for doing this. The query to find just one row may produce a very efficient execution plan, while the query to find all the rows may have to do something much less efficient even when (eventually) it finds that there is no data. Think of the differences you often see between a first_rows_1 plan and an all_rows plan; think about how Oracle can use index-only access paths and table elimination – if you’re only checking for existence you may be able to produce a MUCH faster plan than you can for selecting the whole of the first row.

Next question, if you think that there is a performance benefit from the two-stage approach: is the performance gain worth the cost (and risk) of adding a near-duplicate statement to the code – that’s two statements that have to be maintained every time you make a change. Maybe it’s worth “wasting” a few seconds on every execution to avoid getting the wrong results (or an odd extra hour of programmer time) once every few months. Bear in mind, also, that the optimizer now has to optimize two statement instead of one – you may not notice the extra CPU usage in testing but perhaps in the live environment the execution benefit will be eroded by the optimization cost.

Next question, if you still think that the two-stage process is a good idea: will it result in an inconsistent database state ?! If you select and find a row, then run and find that there are no rows to process because something modified and “hid” the row you found on the first pass – what are you going to do. Will this make the program crash ? Will it produce an erroneous result on this run, or will a silent side effect be that the next run will produce the wrong results. (See Billy Verreynne’s comment on the original post). Should you set the session to serializable before you start the program, or maybe lock a critical table to make sure it can’t change.

So, assuming you’ve decided that some form of “check for existence then do the job” is both desirable and safe, what’s the most efficient strategy. Here’s one of the smarter solutions that minimises risk and effort (in this case using a pl/sql environment).


select  count(*)
into    m_counter
from    dual
where   exists ({your original driving select statement})
;

if m_counter = 0 then
    null;
else
    for c1 in {your original driving select statement} loop
        -- do whatever
    end loop;
end if;

The reason I describe this solution as smarter, with minimum risk and effort, is that (a) you use EXACTLY the same SQL statement in both locations so there should be no need to worry about making the same effective changes twice to two slightly different bits of SQL and (b) the optimizer will recognise the significance of the existence test and run in first_rows_1 mode with maximum join elimination and avoidance of redundant table visits. Here’s a little data set I can use to demonstrate the principle:

rem
rem     Script:         existence_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2015
rem

create table t1
as
select
        mod(rownum,200)         n1,     -- scattered data
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 10000  -- > comment to avoid WordPress formatting problem
;

delete from t1 where n1 = 100;
commit;

create index t1_i1 on t1(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                cascade => true,
                method_opt => 'for all columns size 1'
        );
end;
/

It’s just a simple table with index, but the index isn’t very good for finding the data – it’s repetitive data widely scattered through the table: 10,000 rows with only 200 distinct values. But check what happens when you do the dual existence test – first we run our “driving” query to show the plan that the optimizer would choose for it, then we run with the existence test to show the different strategy the optimizer takes when the driving query is embedded:


alter session set statistics_level = all;

select  *
from    t1
where   n1 = 100
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

select  count(*)
from    dual
where   exists (
                select * from t1 where n1 = 100
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

Notice how I’ve enabled rowsource execution statistics and pulled the execution plans from memory with their execution statistics. Here they are:


select * from t1 where n1 = 100

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    38 (100)|      0 |00:00:00.01 |     274 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |     50 |    38   (3)|      0 |00:00:00.01 |     274 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=100)

select count(*) from dual where exists (   select * from t1 where n1 = 100  )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     3 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE    |       |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|*  2 |   FILTER           |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|   3 |    FAST DUAL       |       |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |
|*  4 |    INDEX RANGE SCAN| T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   4 - access("N1"=100)

For the original query the optimizer did a full tablescan – that was the most efficient path. For the existence test the optimizer decided it didn’t need to visit the table for “*” and it would be quicker to use an index range scan to access the data and stop after one row. Note, in particular, that the scan of the dual table didn’t even start – in effect we’ve got all the benefits of a “select {minimum set of columns} where rownum = 1” query, without having to work out what that minimum set of columns was.

But there’s an even more cunning option – remember that we didn’t scan dual when there were no matching rows:


for c1 in (

        with driving as (
                select  /*+ inline */
                        *
                from    t1
        )
        select  /*+ track this */
                *
        from
                driving d1
        where
                n1 = 100
        and     exists (
                        select
                                *
                        from    driving d2
                        where   n1 = 100
                );
) loop

    -- do your thing

end loop;

In this specific case the factored subquery would automatically be copied inline so the hint here is actually redundant; in general you’re likely to find the optimizer materializing your subquery and bypassing the cunning strategy if you don’t use the hint. (This example is one of the special cases where subquery factoring doesn’t automatically materialize – there’s no where clause in the subquery.)

Here’s the execution plan pulled from memory (after running this SQL through an anonymous PL/SQL block):


SQL_ID  7cvfcv3zarbyg, child number 0
-------------------------------------
WITH DRIVING AS ( SELECT /*+ inline */ * FROM T1 ) SELECT /*+ track
this */ * FROM DRIVING D1 WHERE N1 = 100 AND EXISTS ( SELECT * FROM
DRIVING D2 WHERE N1 = 100 )

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |    39 (100)|      0 |00:00:00.01 |       2 |
|*  1 |  FILTER            |       |      1 |        |            |      0 |00:00:00.01 |       2 |
|*  2 |   TABLE ACCESS FULL| T1    |      0 |     50 |    38   (3)|      0 |00:00:00.01 |       0 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |      1 |      2 |     1   (0)|      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("T1"."N1"=100)
   3 - access("T1"."N1"=100)

You’ve got just one statement – and you’ve only got one version of the complicated text because you put it into a factored subquery; but the optimizer manages to use one access path for one instantiation of the text and a different one for the other. You get an efficient test for existence and only run the main query if some suitable data exists, and the whole thing is entirely read-consistent.

I have to say, though, I can’t quite make myself 100% enthusiastic about this code strategy – there’s just a nagging little doubt that the optimizer might come up with some insanely clever trick to try and transform the existence test into something that’s supposed to be faster but does a lot more work; but maybe that’s only likely to happen on an upgrade, which is when you’d be testing everything very carefully anyway (wouldn’t you) and you’ve got the “dual/exists” fallback position if necessary.

Footnote:

Does anyone remember the thing about reading execution plans “first child first” – this particular usage of an existence test is one of the interesting cases where it’s not the first child of a parent operation that runs first: it’s the case I often refer to as the “constant subquery”.

July 27, 2015

Subquery Factoring (10)

Filed under: Bugs,CBO,Oracle,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 1:26 pm BST Jul 27,2015

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my investigation in 12.1.0.2 – first some sample data:

rem
rem     Script:         subq_factor_materialize_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2015
rem

create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner > 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization as the default, and that’s what happened (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default (materialized) plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table.

The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer had been coded to assume that the temporary table would be in the buffer cache for the second tablescan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded) i.e. the cost of the two tablescans with no cost assigned for the creation of the temporary table.

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples.

Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

Update (Jun 2020)

The costing anomaly is still present in 12.2.0.1 and 19.3.0.0

July 24, 2015

Subquery Factoring (9)

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 12:34 pm BST Jul 24,2015

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:

rem
rem     Script:         subq_factor_materialize.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2013
rem

create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum <= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


==============================
Unhinted - doesn't materialize
==============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved.

If I add the predicate “where object_id > 0” (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule:

two or more appearances of the subquery and some indication of row selection in the subquery rowsource

(In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table. (This is not always the case, though).

Update

Following on from Mark Farhham’s comment below, a simple test showed that adding the predicate “where 1 = 1” was sufficient to make the optimizer materialize the CTE without hinting!

February 8, 2015

Functions & Subqueries

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 4:12 am GMT Feb 8,2015

I think the “mini-series” is a really nice blogging concept – it can pull together a number of short articles to offer a much better learning experience for the reader than they could get from the random collection of sound-bites that so often typifies an internet search; so here’s my recommendation for this week’s mini-series: a set of articles by Sayan Malakshinov a couple of years ago comparing the behaviour of Deterministic Functions and Scalar Subquery Caching.

Footnote:
Although I’ve labelled it as “this week’s” series, I wouldn’t want you to assume that I’ll be trying to find a new mini-series every week.

Footnote 2:
I had obviously expected to publish this note a long time ago – but must have forgotten about it. I was prompted to search my blog for “deterministic” very recently thanks to a recent note on the OTN database forum and discovered both this note and an incomplete note about improving the speed of creating function-based indexes by tweaking hidden parameters – which I might yet publish, although if you read all of Sayan’s articles you’ll find the solution anyway.

 

February 16, 2014

Recursive subquery factoring

Filed under: Hints,Ignoring Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:11 pm GMT Feb 16,2014

Updated for 19c, May 2020

This is possibly my longest title to date – I try to keep them short enough to fit the right hand column of the blog without wrapping – but I couldn’t think of a good way to shorten it (Personally I prefer to use the expression CTE – common table expression – over “factored subquery” or “subquery factoring” or “with subquery”, and that would have achieved my goal, but might not have meant anything to most people.)

If you haven’t come across them before recursive CTEs appeared in 11.2, are in the ANSI standard, and are (probably) viewed by Oracle as the strategic replacement for “connect by” queries. Here, to get things started, is a simple (and silly) example:

(more…)

May 24, 2012

Subquery Factoring (8)

Filed under: Execution plans,Hints,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 6:37 pm BST May 24,2012

I have a growing collection of postings where I’ve described anomalies or limitations in subquery factoring (the “with subquery”, or Common Table Expression (CTE) to give it the ANSI name). Here’s another example of Oracle’s code not behaving consistently. You may recognise the basic query from yesterday’s example of logical tuning – so I won’t reprint the code to generate the data. The examples in this note were created on 11.2.0.2 – we start with a simple query and its execution plan:
(more…)

February 16, 2012

Subquery Factoring (7)

Filed under: Hints,Infrastructure,Oracle,Subquery Factoring,Tuning,Upgrades — Jonathan Lewis @ 5:03 pm GMT Feb 16,2012

When I wrote a note last week about the fixes to the subquery factoring optimizer code in 11.2.0.3, I finished with a comment about having more to say on the test case if I materialized the subquery. Today’s the day to talk about it. As a reminder, here’s the query, but with the /*+ materialize */ hint in place:
(more…)

February 14, 2012

Subquery Factoring (6)

Filed under: Bugs,CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:59 pm GMT Feb 14,2012

Here’s an interesting little conundrum about subquery factoring that hasn’t changed in the recent (11.2.0.3) patch for subquery factoring. It came to me from Jared Still (a fellow member of Oak Table Network) shortly after I’d made some comments about the patch. It’s an example based on the scott/tiger schema – which I’ve extracted from the script $ORACLE_HOME/rdbms/admin/utlsampl.sql (though the relevant scripts may be demobld.sql or scott.sql, depending on version).
(more…)

February 1, 2012

Subquery Factoring (5)

Filed under: CBO,Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 5:52 pm GMT Feb 1,2012

It’s always worth browsing through the list of Oracle’s bug fixes each time a new release or patch comes out because it can give you clues about where to look for problems in your production release – and how to anticipate problems on the upgrade. This article is an example of a fix that I found while while looking at the List of Bug Fixes for 11.2.0.3 (MOS licence required for link) quite recently.

(more…)

December 8, 2011

Test Data

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 6:31 pm GMT Dec 8,2011

The UKOUG conference is over for another year – but it has left me with plenty to do and lots of things to investigate. Here’s just one little point that I picked up during one of the 10 minute “Oak Talks” that members of the Oak Table Network were doing in the lunch breaks.

There is a fairly well-known strategy for generating a list of numbers by using a “select from dual … connect by …” query, but I hadn’t realised that there were two ways of using it. The code I’ve usually used is this:

select
        rownum id
from
        dual
connect by
        rownum <= 4000
;

(more…)

Next Page »

Website Powered by WordPress.com.