Oracle Scratchpad

March 12, 2024

Missing Partition

Filed under: CBO,Execution plans,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 12:20 pm GMT Mar 12,2024
Tags: ,

Here’s a silly little detail about execution plans on (interval) partitioned tables that I hadn’t noticed until it showed up on this thread on a public Oracle forum: it’s an execution plan that claims that Oracle will be visiting a partition that clearly won’t be holding the data requested.

Here’s the starting section of a demonstration script – mostly by Solomon Yakobson with minor tweaks and additions from me:

rem
rem     Script:         non_existent_partition.sql
rem     Author:         Solomon Yakobson / Jonathan Lewis
rem     Dated:          Mar 2024
rem
rem     Last tested 
rem             19.11.0.0
rem

create table invoices_partitioned(
        invoice_no   number not null,
        invoice_date date   not null,
        comments     varchar2(500)
)
partition by range (invoice_date)
interval (interval '3' month)
(
        partition invoices_past values less than (date '2023-01-01')
);

insert into invoices_partitioned
select  level,
        date '2023-01-01' + numtoyminterval(3 * (level - 1),'month'),
        null
from  dual
connect by level <= 6
/

insert into invoices_partitioned select * from invoices_partitioned;
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/

commit
/

execute dbms_stats.gather_table_stats(user,'invoices_partitioned')

set linesize 156
column high_value format a80

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

alter table invoices_partitioned drop partition for (date'2023-09-01');
purge recyclebin;

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

The script creates an interval partitioned table, with an interval of 3 months, then inserts 131,072 rows per partition (the strange re-execution of “insert into x select from x” was my lazy way of increasing the volume of data from the original one row per partition without having to think too carefully.

After creating the data we report the partition names and high values in order, then drop (and purge) the partition that should hold the value 1st Sept 2023 (which will be the partition with the high_value of 1st Oct 2023) and report the partition names and high values again so that you can see the “gap” in the high values and the adjustment to the partition_position values. Here are the “before” and “after” outputs:

PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39377             TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 7     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

7 rows selected.



PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

6 rows selected.

Now we check the execution plan for a query that would have accessed the partition we’ve just dropped:

explain plan for
select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';

select  *
  from  dbms_xplan.display();


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |   109K|  1173K|   104   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                      |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The execution plans says it’s going to visit partition number 4 (pstart/pstop) – which we know definitely cannot be holding any relevant data. If this were an ordinary range-partitioned table – as opposed to interval partitioned – it would be the correct partition for 1st Sept 2024, of course, but it isn’t, so it feels like the pstart/pstop ought to say something like “non-existent” and all the numeric estimates should be zero.

A quick trick for making an interval partition appear without inserting data into it is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually had the same effect of making Oracle create the partition, so I ran the query against user_tab_partitions again just to check that this hadn’t happend.

So what’s going to happen at run-time: is this an example of “explain plan” telling us a story that’s not going to match what shows up in v$sql_plan (dbms_xplan.display_cursor). Let’s run the query (with rowsource execution stats enabled) and find out:


set serveroutput off
alter session set statistics_level = all;
alter session set "_rowsource_execution_statistics"=true;

select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';


select  *
  from  table(dbms_xplan.display_cursor(format=>'allstats last partition'));


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  d42kw12htubhn, child number 0
-------------------------------------
select  *   from  invoices_partitioned   where invoice_date = date
'2023-09-01'

Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |      1 |        |       |       |      0 |00:00:00.01 |
|   1 |  PARTITION RANGE SINGLE|                      |      1 |    109K|     4 |     4 |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |      0 |    109K|     4 |     4 |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

It’s the same plan with the same “wrong” partition identified, and the same estimate for rows returned – but the access never actually happened: Starts = 0 on the table access full.

My hypothesis about this misleading reporting is that Oracle knows from the table definition everything about every partition that might eventually exist – the high_value for the “anchor” partition is known and the interval is known so the appropriate partition number for any partition key value can be derived [ed: see comment #4 for details]. Then, at some point, a disconnect appears between the theoretical partition position and the set of physically instantiated partitions, so the optimizer gets the message “theoretically it’s in the 4th partition” and collects the stats from “partition_position = 4” to do the arithmetic and produce the plan. [ed: a later thought – the 109K estimate in this particularly simple example is based on {number of rows in table}/ {number of distinct values in table}]

Fortunately there’s some metadata somewhere that means the run-time engine doesn’t try to access the wrong partition, so this little glitch doesn’t really matter for this simple query – beyond its ability to cause a little confusion.

It’s possible, though, that this type of behaviour leaves the optimizer with another reason for getting the arithmetic wrong and picking the wrong path if you have a number of “missing” partitions in an interval partitioned table that you’re querying with a range-based predicate that crosses several (non-existent) partitions. So treat this as a warning/clue if you recognise that pattern in any of your partitioned table.

November 20, 2023

gby_pushdown

Filed under: CBO,Execution plans,Hints,Oracle,Parallel Execution — Jonathan Lewis @ 6:42 pm GMT Nov 20,2023

This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.

It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.

I’ll start with a script to create a simple data set and a trivial query with a parallel hint:

rem
rem     Script:         gby_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem

set linesize 180
set pagesize 60
set trimspool on
set tab off

create table t1 
as 
select 
        * 
from 
        all_objects 
where 
        rownum <= 50000
/


alter session set statistics_level = all;

set serveroutput off

prompt  ==================================================
prompt  Default (for this data) pushdown and hash group by
prompt  ==================================================

set feedback only

select 
        /*+ 
                qb_name(main)
                parallel(t1 2) 
--              no_gby_pushdown(@main)
--              no_use_hash_gby_for_pushdown(@main)
        */ 
        owner, object_type, count(*) ct
from 
        t1 
group by 
        owner, object_type
order by
        owner, object_type
/

set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));

In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.07 |       5 |      0 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.01 |       0 |      0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    121 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |    276 |    77   (8)|    121 |00:00:00.04 |    1043 |    991 |  1079K|  1079K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |    991 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.01 |    1043 |    991 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.

If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.

After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)

With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.

There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:

lan hash value: 3954212205

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.14 |       5 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.14 |       5 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.03 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    120 |00:00:00.03 |       0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       SORT GROUP BY      |          |      2 |    276 |    77   (8)|    120 |00:00:00.13 |    1043 |  9216 |  9216 |     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |    1043 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.

There is an interesting detail to look out for, though, in the Outline Data and Hint Report:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   1 -  MAIN
           -  qb_name(main)

   8 -  MAIN / T1@MAIN
           -  parallel(t1 2)

I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)

The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    77 (100)|     96 |00:00:00.08 |       5 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     96 |00:00:00.08 |       5 |  6144 |  6144 |     1/0/0|
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY        |          |      2 |    276 |    77   (8)|     96 |00:00:00.11 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE          |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |  50000 |    73   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |  50000 |    73   (3)|  50091 |00:00:00.02 |    1051 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T1       |     28 |  50000 |    73   (3)|  50091 |00:00:00.01 |    1051 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
   1 -  MAIN
           -  no_gby_pushdown(@main)
           -  qb_name(main)

   7 -  MAIN / T1@MAIN
           -  parallel(t1 2)

Key details to highlight here are:

  • There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
  • 50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
  • The no_gby_pushdown(@main) does appear in the Hint Report
  • The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).

Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.

Summary

When using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.

You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.

November 1, 2023

Push Group by

Filed under: Execution plans,Hints,Oracle,Transformations — Jonathan Lewis @ 10:40 am GMT Nov 1,2023

Jump to summary.

A new optimizer feature that appears in 23c (probably not 21c) was the ability to push group by clauses into union all set operations. This will happen unhinted, but can be hinted with the highly memorable [no_]push_gby_into_union_all() hint that appeared in 23.1.0.0 according to v$sql_hint. and the feature can be disabled by setting the (equally memorable) hidden parameter _optimizer_push_gby_into_union_all to false.

From a couple of simple experiments it looks as if the hint should be used to identify query blocks where you want an aggregation (group by) that appears “outside” a union all (inline) view to happen “inside” the view. Here’s a trivial demonstration that I’ve run on 23.3

rem
rem     Script:         push_gby_ua.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             23.3.0.0
rem

create table t1 
as 
select  *
from    all_Objects 
where   rownum <= 50000
;

set linesize 156
set pagesize 60
set trimspool on
set tab off
set serveroutput off

alter session set statistics_level = all;

select
        /*+
                -- qb_name(main)
                push_gby_into_union_all(@sel$2)
                no_push_gby_into_union_all(@sel$3)
        */
        owner, count(*)
from    (
        select /*  qb_name(u1) */ owner from t1 where owner = 'SYS'
        union all
        select /*  qb_name(u2) */ owner from t1 where owner = 'PUBLIC'
)       
group by owner
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost outline alias hint_report qbregistry qbregistry_graph '));

All I’ve done here is create a table that copies 50,000 rows from the view all_objects, then executed a query that reports the number of objects for owners SYS and PUBLIC by selecting the two sets of objects separately and aggregating a union all of those sets.

For maximum visibility I’ve shown the positive and negative versions of the hint – the aggregation doesn’t have to apply to all the branches of the view and it’s not unknown for the optimizer to make the wrong choices if it hasn’t managed to produce a good cost estimate.

Here’s the execution plan (with some of the bits removed) that I got from 23.3 for this test:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |   295 (100)|      2 |00:00:00.03 |    2232 |   1114 |
|   1 |  HASH GROUP BY          |      |      1 |     15 |   295   (4)|      2 |00:00:00.03 |    2232 |   1114 |
|   2 |   VIEW                  |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   3 |    UNION-ALL            |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   4 |     SORT GROUP BY NOSORT|      |      1 |      1 |   147   (3)|      1 |00:00:00.02 |    1116 |   1114 |
|*  5 |      TABLE ACCESS FULL  | T1   |      1 |   3333 |   147   (3)|  39724 |00:00:00.01 |    1116 |   1114 |
|*  6 |     TABLE ACCESS FULL   | T1   |      1 |   3333 |   147   (3)|   9287 |00:00:00.01 |    1116 |      0 |
----------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$FC1F66D1")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SET$22FBD6DA")
      PUSH_GBY_INTO_UNION_ALL(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SET$1")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" GROUP_BY)
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$FC1F66D1" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("OWNER"='SYS')
   6 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   4 -  SEL$FC1F66D1
           -  push_gby_into_union_all(@sel$2)

   6 -  SEL$3
           -  no_push_gby_into_union_all(@sel$3)

It’s interesting to note that the Hint Report tells us that both my hints were valid (and used); but the Ouline Data echoes only one of them (the “positive” push_gby_into_union_all). Because I’ve used the same table twice it’s not instantly clear that the optimizer has pushed the subquery that I had specified but if you check the Predicate Information you can confirm that the SYS data has been aggregated inside the union all and the PUBLIC data has been passed up to the union all operator without aggregation. (In the absence of the hints both data sets would have been aggregated early.)

Here, in comparison, is the plan (slightly reduced, and with the qbregistry options removed) that I got from 19.11.0.0

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   265 (100)|      2 |00:00:00.09 |    1986 |    991 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |     13 |   265   (4)|      2 |00:00:00.09 |    1986 |    991 |  1422K|  1422K|  653K (0)|
|   2 |   VIEW               |      |      1 |   7692 |   263   (3)|  48446 |00:00:00.07 |    1986 |    991 |       |       |          |
|   3 |    UNION-ALL         |      |      1 |        |            |  48446 |00:00:00.06 |    1986 |    991 |       |       |          |
|*  4 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|  42034 |00:00:00.02 |     993 |    991 |       |       |          |
|*  5 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|   6412 |00:00:00.01 |     993 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OWNER"='SYS')
   5 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         E -  push_gby_into_union_all

As you can see, 19.11 treats the hint as an error and both subqueries against t1 pass their rows up to the union all without aggregation. The 19.11 plan also gives you some idea of why it can be worth pushing the group by: 23.3 doesn’t report any memory used for either of the aggregation operations that take place while the postponed (or, rather, unpushed) aggregation in 19.11 reports 1.4M of memory used. As a general principle we might expect several small aggregations have a lower peak of memory usage than one large aggregation. There’s also a CPU benefit when Oracle doesn’t have to push lots of rows up through a couple of operations.

In fact the absence of memory-related columns in the 23.3 plan is a little suspect and I may have to examine it further. It may simply be the case that the size of the “small allocation” that doesn’t get reported in earlier versions has been increased to (best guess) 1MB; it may be that dbms_xplan in 23c has got a little bug that omits that part of the report.

Summary

Oracle 23c has a new transformation that will probably help to reduce memory and CPU consumption when it comes into play. Queries that aggregate over union all views may change plans to push the aggregation into some or all of the separate subqueries inside the union.

The feature is cost-based but you can over-ride the optimizer’s choice of which subqueries should be aggregated early with the hint [no_]push_gby_into_union_all(@qbname). The feature can also be disabled completely by setting the hidden parameter _optimizer_push_gby_into_union_all to false.

Addendum

It occurred to me that the optimizer will transform an IN-list to a list of equalities with OR, and it’s also capable of using OR-expansion then there might be cases where an aggregate based on an IN-list could go through the two steps and then benefit from this new feature, for example:

select
        sts, count(*) ct
from    t1
where   sts in ('B','C')
group by
        sts
/

-------------------------------------------------------------------------------
| Id  | Operation	     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	      |       |       |     3 (100)|	      |
|   1 |  SORT GROUP BY NOSORT|	      |     2 |     4 |     3	(0)| 00:00:01 |
|   2 |   INLIST ITERATOR    |	      |       |       | 	   |	      |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |   100 |   200 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))

Alas, no. Although we can see the rewrite of the IN-list the optimizer doesn’t then use OR-expansion. And when I added the hint /*+ or_expand */ to try to push Oracle into the right direction the Hint Report told me:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -	SEL$1
	 U -  or_expand / No valid predicate for OR expansion

Maybe in the next release.

October 18, 2023

Remove Subquery

Filed under: CBO,Execution plans,Oracle,subqueries,Transformations — Jonathan Lewis @ 2:33 pm BST Oct 18,2023

This is a note that echoes a feature (dating back at least as far as 10.2.0.4) that I’ve described in a previous post on Simpletalk. I’m raising it here for three reasons

  • first it says something about optimising SQL by rewriting it
  • secondly it advertises the hugely under-used feature of analytic functions
  • thirdly I’ve not seen anyone in the various forums asking about a particular pattern of surprising execution plans that they couldn’t understand

This last reason makes me wonder whether there are lots of queries in the wild that need a small “cosmetic” change to allow the optimizer to transform them into something completely different.

My example is based on the SH demo schema – the query I’ll be demonstrating came (I think) from a presentation given by Jože Senegačnik about 15 years and 7 major versions ago (10gR2) – so I’ll start with a bit of text to recreate a couple of tables from that schema as it was a few years ago.

rem
rem     Script:         remove_aggr_subq.sql
rem     Author:         Joze Senegacnik / Jonathan Lewis
rem     Dated:          June 2008
rem

create table products(
        prod_id                 number(6,0)     not null,
        prod_name               varchar2(50)    not null,
        prod_desc               varchar2(4000)  not null,
        prod_subcategory        varchar2(50)    not null,
        prod_subcategory_id     number          not null,
        prod_subcategory_desc   varchar2(2000)  not null,
        prod_category           varchar2(50)    not null,
        prod_category_id        number          not null,
        prod_category_desc      varchar2(2000)  not null,
        prod_weight_class       number(3,0)     not null,
        prod_unit_of_measure    varchar2(20),
        prod_pack_size          varchar2(30)    not null,
        supplier_id             number(6,0)     not null,
        prod_status             varchar2(20)    not null,
        prod_list_price         number(8,2)     not null,
        prod_min_price          number(8,2)     not null,
        prod_total              varchar2(13)    not null,
        prod_total_id           number          not null,
        prod_src_id             number,
        prod_eff_from           date,
        prod_eff_to             date,
        prod_valid              varchar2(1),
        constraint products_pk primary key (prod_id)
   )
;

create table sales (
        prod_id         number          not null,
        cust_id         number          not null,
        time_id         date            not null,
        channel_id      number          not null,
        promo_id        number          not null,
        quantity_sold   number(10,2)    not null,
        amount_sold     number(10,2)    not null,
        constraint sales_product_fk foreign key (prod_id)
                references products (prod_id)
   ) 
;

It’s a long time since I loaded, or even looked at, the SH schema but I assume the key details that I need will still be the same. All I’ve got is a products table with a declared primary key of prod_id, and a sales table with a prod_id column declared as not null with a foreign key constraint to the products table. Both tables have not null declarations on most columns.

Imagine writing a query to report all sales where the quantity_sold is less than the average quantity_sold for the corresponding product. The “obvious” choice of SQL for this would be something like:

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products    prd,
        sales       sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = sal1.prod_id
                )
;

.I’ve used hints to block dynamic sampling and adaptive plans, and I’ve used the qb_name() hint to name the two query blocks. The subquery calculates the average quantity_sold for the correlated prod_id and we’re probably assuming Oracle will execute the subquery for each row in the sales (sal1) table with savings from scalar subquery caching – especially if we were to create a “value-added” foreign key index of (prod_id, quantity_sold)). Here’s the execution plan I got from the query in its current form:

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN            |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN           |          |     4 |   244 |     5  (20)| 00:00:01 |
|   3 |    VIEW               | VW_SQ_1  |    82 |  2132 |     3  (34)| 00:00:01 |
|   4 |     HASH GROUP BY     |          |    82 |  2132 |     3  (34)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| SALES    |    82 |  2132 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL  | SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS FULL   | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - access("ITEM_1"="SAL1"."PROD_ID")
       filter("SAL1"."QUANTITY_SOLD"<"AVG(SAL2.QUANTITY_SOLD)")

The optimizer has decided it would make sense to unnest the subquery, generate an aggregate rowsource of the sales data, then eliminate the unwanted sales rows through a hash join to this aggregate rowsource. In passing, take note particularly of the Predicate Information for operation 2 – its’ a reminder that hash joins apply only for equality predicates so the check against average quantity_sold has to take place as a filter predicate after Oracle has found the correct average by probing the build table.

This unnesting will be appropriate for many cases of subquery usage but we could block it and force Oracle to do something that looked more like our original “for each row” visualisation by adding the hint no_unnest(@subq) to the hints at the top of the query (note how we can address the hint to a specific query block). The effect of this is to produce the following execution plan:

--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     2 |   150 |    86   (0)| 00:00:01 |
|*  1 |  FILTER             |          |       |       |            |          |
|*  2 |   HASH JOIN         |          |    82 |  6150 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  6 |    TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ NO_UNNEST QB_NAME
              ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SOLD")
              FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   2 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   6 - filter("SAL2"."PROD_ID"=:B1)

Now we can see that Oracle is using the subquery as a “filter subquery” as we had imagined it. For each row surviving the simple hash join between products and sales Oracle will execute the subquery (unless the relevant information is already in the scalar subquery cache). A little detail that may surprise users who are less familiar with execution plans is the appearance of the bind variable (:B1) in the predicate for operation 6 – this is the optimizer reminding you that the correlating predicate in the subquery uses a value that will be unknown until run-time when it arrives (repeatedly with constantly changing values) from the main query block.

Again, we can understand that this pattern will probably be suitable for some circumstances, but we may want to control where in the plan the subquery is used – for some queries it might be more efficient to execute the subquery before we do the join. We can tell the optimizer to do this by adding the hint push_subq(@subq) after the no_unnest(@subq) hint, in my case producing the following plan:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     4 |   300 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN           |          |     4 |   300 |     4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | SALES    |     4 |   140 |     2   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  4 |     TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - MAIN / SAL1@MAIN
   3 - SUBQ
   4 - SUBQ / SAL2@SUBQ
   5 - MAIN / PRD@MAIN

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ PUSH_SUBQ NO_UNNEST
              QB_NAME ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SO
              LD") FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   4 - filter("SAL2"."PROD_ID"=:B1)

The first thing that stands out in this plan is that the sales table has become the build (first) table in the hash join and the products table has become the probe table. (In fact this is mainly because (a) there’s no data/statistics, (b) I’ve blocked dynamic sampling, and (c) Oracle has used a few of its standard guesses.)

Another point that stands out is the “staggered” position of operation 3. I’ve included the ‘alias’ format option for the execution plan so that you can see that operations 3 and 4 represent the original subq query block with no transformation. Filter predicates using subqueries often produce odd little “twists” in execution plans which would be hard to explain if you followed the basic “first child first” rule and forgot to check whether there were any separate query blocks that needed to be walked in isolation.

“Cosmetic” effects

I said the query I started with was the “obvious” choice. I didn’t offer any justification for “obvious”, but the query shows an almost a perfect translation of the condition “sales quantity greater than the average sales quantity for the matching product”, and correlating on the prod_id from the sales row you’re looking at (i.e. sal1.prod_id) seems a highly intuitive choice.

However – prod_id is a foreign key to the products table, and the main query block includes the demand/predicate “sal1.prod_id = prd.prod_id” so, based on transitive closure, all the not null constraints, and the foreign key constraint, we should be happy to make the following, logically valid, minor edit to the original query (and it’s so minor I’ve highlighted the critical line in case you miss it):

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products prd, sales sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = prd.prod_id
                )
;

Here’s the resulting execution plan following this apparently trivial and logically irrelevant change:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    82 |  6068 |     5  (20)| 00:00:01 |
|*  1 |  VIEW                | VW_WIF_1 |    82 |  6068 |     5  (20)| 00:00:01 |
|   2 |   WINDOW SORT        |          |    82 |  7134 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN         |          |    82 |  7134 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| SALES    |    82 |  3854 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VW_COL_5" IS NOT NULL)
   3 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")

The plan been reduced to a simple two table join, lost the correlated subquery, and has signs of an analytic (window) function being used somehow to handle the comparison with the average. This is a feature enabled by the (very old) parameter _remove_aggr_subquery which defaults to true.

It’s not immediately visible what the optimizer is doing – the predicate vw_col_5 is not null clearly relates to the internal view vw_wif_1 (Windows Inline Function?), but we have to look at the appropriate “unparsed query” from the CBO/10053 trace to find out why that predicate gives us the right answer. This, with a lot of cosmetic editing, is the transformed query that Oracle finally optimised:

select 
        /*+ qb_name (main) */
        vw_wif_1.item_1         prod_id,
        vw_wif_1.item_2         prod_name,
        vw_wif_1.item_3         time_id,
        vw_wif_1.item_4         quantity_sold 
from    (
        select 
                prd.prod_id             item_1,
                prd.prod_name           item_2,
                sal1.time_id            item_3,
                sal1.quantity_sold      item_4,
                case 
                        when    sal1.quantity_sold < avg(sal1.quantity_sold) over ( partition by sal1.prod_id) 
                        then    sal1.rowid 
                end                     vw_col_5 
        from
                test_user.sales         sal1,
                test_user.products      prd 
        where 
                sal1.prod_id = prd.prod_id
        )       vw_wif_1 
where 
        vw_wif_1.vw_col_5 is not null
;

It’s interesting, and a little surprising, that the code uses a case clause to generate a rowid which is then tested for null. It seems to add complexity that a person would not introduce if you had asked them to do the rewrite by hand, but maybe it comes from a generic framework that addresses more subtle examples.

I said at the start of the note that I’ve not seen anyone asking about this “surprising” pattern of plan; that wasn’t quite true. I have found one question on the Oracle forums dated 2012 using version 10.2.0.4. It’s been possible for a long time – so possibly the fact that questions are so rare is that the correct choice of correlating predicate is rarely made and the “intuitive” choice doesn’t allow the transformation to take place. (It’s also possible that it appears so rarely because so few systems make proper use of constraints.)

Most significantly, though, it’s been possible to write this type of windowing code by hand since version 8.1.6, but it’s only in the last few years that responses to questions about max(subquery), avg(subquery) etc. have been suggestions to rewrite with analytic functions. Maybe some of the answers should have been “change the correlating column”.

Further Observations

As with all optimisation strategies, it’s possible that Oracle will use this feature when it shouldn’t or fail to use it when it should. Unfortunately there is no hint like /*+ [no_]remove_aggr_subq(@subq) */ though aggregate subquery removal seems to be embedded with subquery unnesting, so the /*+ no_unnest() */ hint might be sufficient to block the feature; I don’t know what you can do, though, to force it to happen if you think it’s legal but isn’t happening – other than rewriting the query by hand, of course.

If the no_unnest() hint doesn’t work you could use the hint: /*+ opt_param(‘_remove_aggr_subquery’,’false’) */ to disable the feature complete for the duration of the query, and you may find that that even works as an SQL Patch.

If you’re interested in CBO trace files, the mnemonic (which doesn’t appear in the legend near the start of the trace) for the feature is “RSW” (maybe “remove subquery windowing”?). There aren’t many explicit strings relating to this mnemonic in the binary, but the following are visible:

RSW: Not valid for subquery removal %s (#%d)
RSW:  Valid for having clause subquery removal %s (#%d)
RSW:  Valid for correlated where clause SQ removal %s (#%d)
RSW:  Valid for uncorrelated where clause SQ removal %s (#%d)

My example falls into the category: “Valid for correlated where clause SQ removal”, so there’s scope for someone else to discover what a query, it’s plan, and its unparsed query look like when a query falls into one of the other two categories.

Summary

For certain patterns of query involving aggregate subqueries as predicates the optimizer has been able to use a feature known as aggregate subquery removal to effect the intent of the subquery through the use of an analytical (window) function.

This feature has been available since (at least) 10.2.0.4 and you may already have several queries where the optimizer would use it if the correlating predicate was appropriate – in particular if the predicate is currently joining to the child end of a foreign key constraint then you need to test the effect of joining it to the parent end.

October 11, 2023

no_merge() #JoelKallmanday

Filed under: CBO,Hints,Ignoring Hints,Oracle,Transformations — Jonathan Lewis @ 9:10 am BST Oct 11,2023

This is a second follow-up to the video Connor McDonald produced a few days ago about the risks of depending on “current tricks” to control the order of predicate operation, then showing how some requirements for the pattern “check condition B only for rows which have already satisfied condition A” could be handled through a case expression.

The first follow-up note highlighted the problem of a significant change in the handling of CTEs (a.k.a. “with” subqueries); the point I want to discuss in this note is about the step where Connor tried to use a no_merge() hint in a failed attempt to isolate some activity inside an inline view and made the comment that: “no_merge isn’t explicitly designed for inline views of this type”.

Here’s the SQL to create the data, and a version of the query that uses an inline view rather than a CTE:

rem
rem     Script:         ignore_no_merge.sql
rem     Author:         Connor McDonald / Jonathan Lewis
rem     Dated:          Oct 2023
rem
rem
rem     Last tested 
rem             23.2.0.0
rem             19.11.0.0
rem

drop table t purge;

create table t as
select * from dba_objects
/

insert into t ( object_id, owner, object_name)
values (-1,'BAD','BAD')
/

commit;

select
        /*+ no_merge(@inline) */
        count(object_name)
from    (
        select  /*+ qb_name(inline) */
                *
        from    t
        where object_id > 0
        )       my_view
where   sqrt(object_id) between 1 and 10
/

I’ve used a query block name for my inline view, then referenced that name in a no_merge() hint in the main query block to ensure that there’s no possible ambiguity about where the hint should apply. When I run the query in either 19.11 or 23.2 Oracle (often – but not always, thanks to statistical effects) raises the error “ORA-01428: Argument -1 is out of range.” which rather suggests that the no_merge() has been ignored and that the optimizer has managed to transform the query into a single query block combining the predicates rather than forcing the inline query block to eliminate the threat before the main query block evaluates the square root.

If you check the hidden parameters you will find a pair that go a long way back in Oracle’s history (at least 8i for the first of the pair and 10g for the second) relating to view merging :

_complex_view_merging    enable complex view merging 
_simple_view_merging     control simple view merging performed by the optimizer

For a very long time I’ve believed that the no_merge() hint was supposed to block complex view merging but not block simple view merging (and the view merging required for our query is an example of simple view merging). I’m no longer sure why I believed this, maybe it used to be true in the very early days maybe I just assumed that complex and simple view merging worked differently, either way I don’t have any scripts that I might have used to test the idea.

In fact it’s quite simple to show that Oracle hasn’t ignored the hint – even though it’s easy to assume that it has because the run-time response doesn’t seem to have changed. But take a look at the execution plan for the query which, pulled from memory with the alias and outline format options, looks like this:

| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   193 (100)|          |
|   1 |  SORT AGGREGATE     |      |     1 |    66 |            |          |
|   2 |   VIEW              |      |   184 | 12144 |   193   (5)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T    |   184 |  7360 |   193   (5)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - INLINE / MY_VIEW@SEL$1
   3 - INLINE / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$1")
      NO_ACCESS(@"SEL$1" "MY_VIEW"@"SEL$1")
      FULL(@"INLINE" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter((SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10
              AND "OBJECT_ID">0))

As you can see, the inline view has suffered some type of transformation effect that results in both the object_id and sqrt() predicates being applied during the tablescan – but the query block named inline still appears in the plan, and we still have an object called my_view.

Compare this with the plan we get if we don’t use the no_merge() hint:

----------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   193 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    40 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   184 |  7360 |   193   (5)| 00:00:01 |
---------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$5843E819
   2 - SEL$5843E819 / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$5843E819")
      MERGE(@"INLINE" >"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"INLINE")
      FULL(@"SEL$5843E819" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10
              AND "OBJECT_ID">0))

In this case we can see the optimizer’s strategy in the merge(@inline>sel$1) hint in the Outline Data, a new query block name has appeared (derived from the (default) sel$1 combined with inline) and my inline view my_view has disappeared from the plan.

So how come the optimizer has managed to combine predicates from two different query blocks without merging those query blocks? The answer is in another hidden parameter: _optimizer_filter_pushdown, which defaults to true. The optimizer has pushed the sqrt() predicate down into the inline view. We can avoid the problem by setting the parameter to false, for example:

select
        /*+ 
                no_merge(@inline) 
                opt_param('_optimizer_filter_pushdown','false')
        */
        count(object_name)
...

select * from table(dbms_xplan.display_cursor(format=>'outline alias'))
/

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   190 (100)|          |
|   1 |  SORT AGGREGATE     |      |     1 |    79 |            |          |
|*  2 |   VIEW              |      | 73570 |  5675K|   190   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T    | 73570 |  2873K|   190   (4)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - INLINE / MY_VIEW@SEL$1
   3 - INLINE / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$1")
      NO_ACCESS(@"SEL$1" "MY_VIEW"@"SEL$1")
      FULL(@"INLINE" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10))
   3 - filter("OBJECT_ID">0)


The overall shape of the plan doesn’t change but we now see the sqrt() predicate being applied at operation 2 (after the table data has been safely filtered at operation 3). We can also see that the Row estimate has changed dramatically – the 73,570 rows extracted from the table is reasonable (but the optimizer seems to have forgotten about making any allowance for the impact of the sqrt() predicates, it should have reported the same 184 that we see in the earlier plans – the normal 5% of 5% for “I haven’t got a clue what a range using a function will do”.)

For this example I’ve used the opt_param() hint to disable filter pushdown, you could, in principle, set the parameter at the session or system level. Whatever you do the effect does not get captured in the Outline Data – so if you can’t rewrite the SQL (and if you could you might opt for a safer strategy anyway) your best bet might be to create an SQL Patch to inject the hint.

Summary

When you use the no_merge() hint to block view merging you may get the impression that it hasn’t worked because the optimizer can still do filter predicate pushdown, so a predicate that you wanted to apply “outside” the view still gets applied “inside” the view. Don’t assume you know what the plan looks like just because the outcome of running the query hasn’t changed – check the actual execution plan and especially the Predicate Information. You may need to set _optimizer_filter_pushdown to false to achieve the “delayed” application of your predicate, possibly by means of an SQL Patch.

October 5, 2023

CTE Upgrade

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 4:17 pm BST Oct 5,2023

The “common table expression” (CTE) also known as “with clause” or “factored subquery” has been the target of an optimizer upgrade in recent versions of Oracle that may cause a problem for some people – including, possibly, a few of my former clients and readers who may have adopted a suggestion I made for working around particular performance problems.

It was a recent video by Connor McDonald that brought the change to my attention so I’m going to use parts of his script to demonstrate the effect. We start with a very simple table, and a requirement to avoid an Oracle error:

rem
rem     Script:         with_clause_pushdown.sql
rem     Author:         Connor McDonld / Jonathan Lewis
rem     Dated:          Oct 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.2.0.0        Pushdown 
rem             21.8.0.0        Pushdown (Connor)
rem             19.21.0.0       Pushdown (Connor)
rem             19.11.0.0       No pushdown
rem

drop table t purge;

create table t as
select * from dba_objects
/

insert into t ( object_id, owner, object_name)
values (-1,'BAD','BAD');
commit;

prompt  =====================================
prompt  The original query crashes (probably)
prompt  =====================================

select  count(object_name)
from    t
where
        object_id > 0
and     sqrt(object_id) between 1 and 10
/

I have a query that will fail if it tries to take the square root for the row where object_id = -1 (Oracle hasn’t implemented complex numbers, or even imaginary ones). But the query should be okay because I’ve got a predicate that filters out all the rows where the object_id is not positive. Try running the query, though, and you’ll (probably) find that Oracle responds with “ORA-01428: argument ‘-1’ is out of range”.

The optimizer decides the order it wants to apply the predicates so you can’t guarantee that the order you need will be the order used. This used to be a problem that showed up fairly frequently on various Oracle forums at a time when applications were often guilty of storing numeric data in character columns and ran into problems with queries that had predicates like “character_column = numeric_constant” These would sometimes fail because of an implicit to_number() being applied in a row where the column value was not numeric, resulting in the ORA-01722 conversion error. There used to be several ways to bypass this type of problem, none of them particularly desirable but all perfectly adequate as temporary (one hoped) workarounds. Here’s one such workaround:

with pos as (
        select /*+ materialize */ *
        from   t
        where  object_id > 0 
) 
select  count(object_name)
from    pos
where   sqrt(object_id) between 1 and 10
/

We put the “protective” predicate into a “with” subquery and materialize the subquery so that the call to sqrt() is applied to an (internal) global temporary table that holds only the “safe” rows. When Connor demonstrated this method the query crashed with the ORA-01428 (sqrt(-1) problem) that the original query had exhibited. This led Connor to extol the virtues of avoiding dirty tricks to fool the optimizer because an enhancement might appear that made the dirty trick fail, and then he demonstrated an alternative, elegant, restatement of the query that couldn’t be called a dirty trick.

As someone not employed by Oracle Corp. my response was irritation that the optimizer was doing something that (clearly) it shouldn’t, and to wonder whether it was deliberate or a bug that could be worked around. The first step, of course, was to repeat Connor’s test on my default setup of Oracle – which happened to be 19.11 – to find that the optimizer did exactly what I expected and produced the right answer rather than an error. Exchanging email with Connor I learned that he had tested on 19.21 and 21.8 – so something must have changed between 19.11 and 19.21. [Ed: see comment #4 below – reader reports show that the change appeared in 19.21]

I don’t have a 21c VM handy on my laptop but I do have the 23cFREE developer VM (not yet upgraded to 23.3), so I started that up, constructed the model, and started work on the (newly) problematic query. Here are the execution plan from both versions of Oracle, 19.11 first followed by 23.2

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |     1 |    79 |   759   (1)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665C_11934AEE |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                           | 73563 |  9482K|   398   (1)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                             |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                             | 73563 |  5675K|   361   (1)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D665C_11934AEE | 73563 |  9482K|   361   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |     1 |    79 |   238   (5)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6681_1D6D63A |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                          |   211 |  8229 |   236   (5)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                            |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                            |   211 | 16669 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6681_1D6D63A |   211 |  8229 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The shape of the plan doesn’t change in the upgrade – but 23c crashed with error ORA-01428 while Oracle 19c produced the correct expected result. There’s a clue about why this happened in the plans, but there’s a terrible flaw in the in the way I reported the execution plans: where’s the predicate information!

Predicate Information (identified by operation id):    -- 19.11 version
---------------------------------------------------
   3 - filter("OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

Predicate Information (identified by operation id):    -- 23c version
---------------------------------------------------
   3 - filter(SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10 AND "OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

If you compare the filter() predicate information for operation 3 you can see that the optimizer in 23c has pushed the predicate from the main query block into the CTE’s query block( and, frankly, I’d call that a design error). The clue in the main body of the plan that this had happened was in the Rows column – 73,563 for 19.11 but only 211 for 23.2: the former looks about right for a copy of dba_objects, the latter is clearly much smaller that you might expect. If you’re wondering why the sqrt() predicate is repeated at operation 5, you’re not alone – I can’t think of a good reason why that’s there.

Workaround

Yes, I am going to tell you how to work around a (nominal) optimizer enhancement. This change may slip through unnoticed in testing and only be discovered, too late, in production, so some people may need a short term hack to deal with it. (Setting optimizer_features_enable to 19.1.0 didn’t make the problem go away on my system.)

Recognising that Oracle was invoking some form of filter predicate “pushdown” I searched the list of parameters, SQL Hints, and fix controls, for possibly contenders. There don’t appear to be any (new) hints related to this behaviour, but there is a parameter _optimizer_wc_filter_pushdown in 23.2 that doesn’t exist in 19.11, and it defaults to true; the description of this parameter is “enable/disable with clause filter predicate pushdown”. Setting this parameter to false (alter session/system or through an opt_param() hint) does what we need.

There are a couple of fix controls that also appear to be related, but I haven’t examined them yet:

30235754  QA control parameter for with clause filter predicate pushdown
29773708  with clause filter predicate pushdown

Again, these are not present in 19.11.

Summary

Somewhere in the 19c and 21c timelines an enhancement to the optimizer allows Oracle to minimise the size of materialized CTEs by pushing filter predicates from the query blocks that use the CTE into the CTE definition. This could result in some existing queries failing (probably with a “conversion error”) because they depended on the CTE applying some “limiting” predicates that stopped illegal values from reaching a later predicate.

If you are caught by this enhancements you can disable the feature by setting the parameter _optimizer_wc_filter_pushdown to false for the session, system, or specific query (you may be able to do the last through an SQL Patch). You ought to seek a strategic fix, however, and Connor’s video shows how you may be able to bypass the materialized CTE trick completely by using a CASE/END expression.

January 30, 2023

Lost Or-Expand

Filed under: 12c,19c,CBO,Oracle,Transformations,Upgrades — Jonathan Lewis @ 1:39 pm GMT Jan 30,2023

I’ve commented previously on the “new” cost-based Or-Expansion introduced in 12c to replace the “legacy” Concatenation transformation, and I’ve been re-running some of my concatenation scripts to see whether the most recent versions of the optimizer will use Or-expansion unhinted in places where I’ve previously had to use hints to force concatenation to appear.

The latest test has produced a surprising result – I’ve got an example where 19c and 21c will use concatenation when hinted with use_concat(), but will not obey the or_expand() hint on the grounds that there’s “No valid predicate for OR expansion”

It’s worth knowing this could happen if you’re upgrading from 11g to 19c (as many people seem to be doing at present) as you may find that you have some statements that used to use concatenation unhinted, but now need to be hinted to do so as they can’t switch to or-expansion and won’t use concatenation unless hinted to do so.

tl;dr (the rest of the note is just a demonstration.) When you upgrade from 11g to 19c (or later) you may find that some queries perform badly because they stop using the legacy “concatenation” operator but can’t be transformed by the new “cost-based Or Expand” operator, and need to be hinted with a use_concat() hint.

Here’s a statement I can use to demonstrate the effect – I’ll post the code to create the tables at the end of the note:

select  /*+ gather_plan_statistics */
        n1, n2, small_vc
from
        t1
where
        (n1 = 1 and n2 = 10000)
or      (n1 = 10000 and n2 = 1)
;

I’ve rigged the data so that there are 9,999 distinct values of n1 each with one row, and 10,001 rows with the value 10,000; and I’ve done the same with n2 – 9,999 distinct values with one row each and 10,001 rows with the value 10,000.

I’ve gathered stats that include histograms on n1 and n2 (separately) and I’ve created indexes on n1 and n2 (separately). As a result the ideal path for this query is to use the index on n1 to find rows for the first of the two compound predicates and use the index on n2 to find rows for the second of the predicates, which should be possible if the optimizer first transforms the query using OR-expansion.

You’ll notice I’ve included the hint to capture rowsource execution statistics, so I’ll be executing this query with various hints and reporting the actual execution plans and workload. Using 19.11.0.0 and 21.3.0.0 with no special parameter settings the execution plan that appeared used B-tree/bitmap conversion:

| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |    45 (100)|      2 |00:00:00.01 |      50 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |    45   (3)|      2 |00:00:00.01 |      50 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |      1 |        |            |      2 |00:00:00.01 |      48 |
|   3 |    BITMAP OR                        |       |      1 |        |            |      1 |00:00:00.01 |      48 |
|   4 |     BITMAP AND                      |       |      1 |        |            |      1 |00:00:00.01 |      24 |
|   5 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  6 |       INDEX RANGE SCAN              | T1_N1 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|   7 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |      22 |
|*  8 |       INDEX RANGE SCAN              | T1_N2 |      1 |        |    21   (0)|  10001 |00:00:00.01 |      22 |
|   9 |     BITMAP AND                      |       |      1 |        |            |      1 |00:00:00.01 |      24 |
|  10 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|* 11 |       INDEX RANGE SCAN              | T1_N2 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|  12 |      BITMAP CONVERSION FROM ROWIDS  |       |      1 |        |            |      1 |00:00:00.01 |      22 |
|* 13 |       INDEX RANGE SCAN              | T1_N1 |      1 |        |    21   (0)|  10001 |00:00:00.01 |      22 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("N1"=1)
   8 - access("N2"=10000)
  11 - access("N2"=1)
  13 - access("N1"=10000)

This is a fairly clever plan but not what I wanted to test so I set the hidden parameter ‘_b_tree_bitmap_plans’ to false for all subsequent tests. With this block in place the plan changed to a full tablescan:

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    99 (100)|      2 |00:00:00.01 |     349 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |      1 |    99   (2)|      2 |00:00:00.01 |     349 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((("N1"=1 AND "N2"=10000) OR ("N1"=10000 AND "N2"=1)))


Definitely not what I wanted – so I added a hint telling the optimizer I wanted to see OR-expansion. The optimizer produced the same full tablescan! Since I had included the format option ‘hint_report’ in my call to dbms_xplan.display_cursor() I can show you the extra lines of output that explained why the optimizer “ignored” my hint:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         U -  or_expand(@sel$1 (1) (2)) / No valid predicate for OR expansion

As you can see the hint was not “N – unresolved” or “E – Syntax error”. It was recognised, syntactically correct, notionally applicable but unused because the optmizer couldn’t see a way to use it (even though we can see an obvious way to use it).

Idle curiosity then prompted me to try the use_concat() hint, in the form: “use_concat(@sel$1 1)” – here’s the resulting execution plan:

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     4 (100)|      2 |00:00:00.01 |       7 |
|   1 |  CONCATENATION                       |       |      1 |        |            |      2 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN                  | T1_N2 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
|*  4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    INDEX RANGE SCAN                  | T1_N1 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"=10000)
   3 - access("N2"=1)
   4 - filter(("N2"=10000 AND (LNNVL("N2"=1) OR LNNVL("N1"=10000))))
   5 - access("N1"=1)

Exactly the plan I wanted to see from or_expand(), although the two subqueries are in the reverse order to the order I would expect from or_expand(). So the new cost-based or-expansion says there’s no valid predicate available for expansion, but the old, deprecated, heuristic, concatenation transformation manages to find a disjunct (OR) that can be expanded.

Of course the next thing to do is look at the predicted cost and actual work (mostly buffer gets) that Oracle reported for each plan:

  • bitmap conversion: (cost 45, buffers 50)
  • full tablescan: (cost 99, buffers 349)
  • concatenation: (cost 4, buffers 7)

The predicted costs are actually fairly consistent with buffer gets (which, if I flushed the cache, would also be mostly disk reads). I had been fairly impressed that the optimizer picked bitmap conversion, but it would have been so much better if the optimizer could see that this (slightly complex) set of predicates included an opportunity for or-expansion.

Footnote 1

This query shows an example of disjunctive normal form (DNF), i.e the where clause is a disjunct (OR) of conjuncts (ANDs). I understand that optimizers (in general) quite like this form, but there is another “nice” form which is CNF (conjunctive normal form) i.e. where the where clause is a conjuct (AND) of disjuncts (ORs). So, for entertainment, I rewrote the where clause in conjunctive normal form. You have to be a little careful when you play the “normal form” game, it’s quite easy to get it wrong, so here are the steps I took (using A, B, C, D instead of my 4 atomic predicates):

(A and B) or (C and D) ==
        (A or (C and D)) and (B or (C and D)) ==               -- distributing the (A and B)
        (A or C) and (A or D) and (B or C) and (B or D)        -- distributing the two occurrences of (C and D)

Here’s the restulting query and unhinted execution plan after substituting “n = 1” etc. back into the symbolic presentation (and it probably gives you some idea why I played safe by starting with A, B, C, D):

select  /*+ gather_plan_statistics */
        n1, n2, small_vc
from
        t1
where
        (n1 = 1 or n2 = 1) 
and     (n1 = 1 or n1 = 10000) 
and     (n2 = 10000 or n2 = 1)
and     (n2 = 10000 or n1 = 10000)
;

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                 |      1 |        |     4 (100)|      2 |00:00:00.01 |       7 |
|   1 |  VIEW                                 | VW_ORE_BA8ECEFB |      1 |      2 |     4   (0)|      2 |00:00:00.01 |       7 |
|   2 |   UNION-ALL                           |                 |      1 |        |            |      2 |00:00:00.01 |       7 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1              |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN                  | T1_N1           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1              |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|*  6 |     INDEX RANGE SCAN                  | T1_N2           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("N2"=10000)
   4 - access("N1"=1)
   5 - filter(("N1"=10000 AND LNNVL("N1"=1)))
   6 - access("N2"=1)

It’s the OR-expansion I wanted to see.

If I can do an algorithmic rewrite that produces the desired plan the optimizer can be coded to do the rewrite – so I think you can expect to see this limitation removed at some future point. This plan, however, did still depend on my disabling B-tree/bitmap conversion; when I enabled B-tree/bimap conversion the optimizer used it to produce the following plan:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |     2 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |      1 |        |            |      2 |00:00:00.01 |       4 |
|   3 |    BITMAP OR                        |       |      1 |        |            |      1 |00:00:00.01 |       4 |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  5 |      INDEX RANGE SCAN               | T1_N1 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
|   6 |     BITMAP CONVERSION FROM ROWIDS   |       |      1 |        |            |      1 |00:00:00.01 |       2 |
|*  7 |      INDEX RANGE SCAN               | T1_N2 |      1 |        |     1   (0)|      1 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((INTERNAL_FUNCTION("N1") AND INTERNAL_FUNCTION("N2") AND ("N2"=10000 OR "N1"=10000)))
   5 - access("N1"=1)
   7 - access("N2"=1)

The thing to note in this case, though, is that the B-tree/bitmap conversion is logically the correct thing to choose when you compare the estimated cost and actual workload:

  • or-expansion: (cost 4, buffers 7)
  • bitmap conversion: (cost 2, buffers 6)

Footnote 2

Mohamed Houri wrote an article on Or-expansion a year ago explaining the possible settings for the hidden parameter “_optimizer_cbqt_or_expansion”, which can off, on, linear, greedy or two_pass. I tried all the options to see if that would make any difference (apart from the obvious impact of “off”); but it didn’t.

Source code

If you want to do further experiments, here’s the script I used to generate the data:

rem
rem     Script:         concat_3b.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2008 / Jan 2003
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             21.3.0.0
rem

create table t1
as
with generator as (
        select
                rownum  id
        from    dual
        connect by level <= 10000
)
select
        rownum                  n1,
        10000                   n2,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1
;

insert /*+ append */ into t1
select
        n2, n1, small_vc, padding
from
        t1
;

commit;

create index t1_n1 on t1(n1);
create index t1_n2 on t1(n2);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns size 100 n1, n2'
        );
end;
/


November 15, 2022

opt_estimate 4a

Filed under: CBO,Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 11:21 am GMT Nov 15,2022

I wrote a batch of notes about the opt_estimate() hint a couple of years ago, including one where I explained the option for using the hint to specify the number of rows in a query block. I’ve just come across a particular special case for that strategy that others might find a use for. It’s something to do when using the “select from dual … connect by” trick for multiplying rows.

Here’s a little data to model the idea – I’ve used the all_tables view to generate some “well-known” data since I want to add a tiny bit of complexity to the query while still leaving it easy to understand the index. The results from this demonstration come from Oracle 21.3.0.0, and I’ve included the hint /*+ no_adaptive_plan */ to stop Oracle from getting too clever during optimisation.

rem
rem     Script:         opt_estimate_dual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2022
rem
rem     Last tested 
rem             21.3.0.0
rem

create table tables_table as select * from all_objects where object_type = 'TABLE';
create table objects_table as select * from all_objects;

alter table objects_table add constraint ot_pk primary key(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    => 'tables_table',
                method_opt => 'for columns size 60 owner'
        );
end;
/

set serveroutput off

with driver as (
        select  /*+ materialize */
                tt.owner, tt.object_id, v1.rn
        from    tables_table tt,
                (
                select
                        /*+  opt_estimate(query_block scale_rows=10) */
                        rownum rn
                from    dual
                connect by
                        level <= 10
                ) v1
        where
                tt.owner = 'OUTLN'
)
select  /*+ no_adaptive_plan */
        dr.rn, dr.owner, dr.object_id,
        ot.object_id, ot.owner, ot.object_type, ot.object_name
from
        driver dr,
        objects_table   ot
where
        ot.object_id = dr.object_id
/

select * from table(dbms_xplan.display_cursor(format => 'hint_report'));


In my system tables_table holds 727 rows and objects_table holds 58383 rows. Three rows in tables_table correspond to tables owned by user ‘OUTLN’ which means I expect the driver CTE (common table expression / “with” subquery) to generate 30 rows and, given the join on unique id, the query to return 30 rows.

I’ve used the /*+ materialize */ hint to force Oracle to create an in-memory temporary table for the driver CTE, the /*+ no_adaptive_plan */ hint to stop Oracle from getting too clever during optimisation, and the critical /*+ opt_estimate() */ hint to help the optimizer understand the effect of my “connect by” on dual. Here’s the execution plan I get if I don’t include that last hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    14 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6632_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |     3 |    78 |     9   (0)| 00:00:01 |
|   4 |     VIEW                                 |                            |     1 |    13 |     2   (0)| 00:00:01 |
|   5 |      COUNT                               |                            |       |       |            |          |
|   6 |       CONNECT BY WITHOUT FILTERING       |                            |       |       |            |          |
|   7 |        FAST DUAL                         |                            |     1 |       |     2   (0)| 00:00:01 |
|   8 |     BUFFER SORT                          |                            |     3 |    39 |     9   (0)| 00:00:01 |
|*  9 |      TABLE ACCESS FULL                   | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |     3 |   276 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6632_31D19D4 |     3 |    78 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize


I’ve highlighted operations 4 and 8 in the plan: operation 4 is the view of dual that has generated 10 rows – unfortunately the optimizer has only considered the stats of the dual table, and hasn’t factored in the effects of the “connect by with rownum”. Operation 8 shows us that the optimizer has (correctly, thanks to the histogram I requested) estimated 3 rows for the tablescan of tables_table. The result of these two estimates is that operation 3 reports an estimate of 3 ( = 3 * 1 ) rows to be used in probing objects_table.

This is the plan after enabling the /*+ opt_estimate() */ hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    45 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6633_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |    30 |   780 |    13   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |    10 |   130 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |    10 |   130 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |    30 |  2760 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6633_31D19D4 |    30 |   780 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize


There are three things that stand out in this report.

  • I’ve highlighted operations 4 and 6: operation 4 is the tablescan of tables_table that correctly estimates 3 rows; operation 6 is the view operation that now correctly estimates 10 rows.
  • With the correct estimate for the view the estimate for the join to objects_table is now correct and the join order for the merge join cartesian at operation 3 has been reversed.
  • The Hint Report tells us that the opt_estimate() hint is not (always) an optimizer hint! This is a real pain because when the opt_estimate() hints you’ve tried to use don’t appear to work it’s not easy to work out what you’ve done wrong.

To make a point, I can take the demo a little further by changing the /*+ opt_estimate() */ hint to scale_rows=120. Here’s the body of the resulting plan:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |   369 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D663A_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |   360 |  9360 |    13   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |   120 |  1560 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |   120 |  1560 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   HASH JOIN                              |                            |   360 | 54360 |   356   (1)| 00:00:01 |
|  11 |    VIEW                                  |                            |   360 | 33120 |     2   (0)| 00:00:01 |
|  12 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D663A_31D19D4 |   360 |  9360 |     2   (0)| 00:00:01 |
|  13 |    TABLE ACCESS FULL                     | OBJECTS_TABLE              | 58383 |  3363K|   354   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The earlier plans used a nested loop join into objects_table. In this plan we can see at operation 10 that the optimizer has selected a hash join because the larger row estimate for the CTE has increased the cost of the query beyond the inflection point between nested loop and hash joins.

Summary

If you need to use the “connect by” in an inline view then you may find that the optimizer gets a very bad estimate of the number of rows the view definition will generate and that an /*+ opt_estimate() */ hint in the view using the “scale_rows=nnn” option will produce better estimates of cardinality, hence a better plan.

Footnote

In this particular case where I’ve used the dual table by itself in an inline view I could have used the rows=NNN” option to get the same effect.

In any case I could have added a /*+ qb_name() */ hint to the inline view, and included a qualifying “@{queryblock}” in the /*+ opt_estimate() */ hint.

Using hints is hard, especially when they’re not documented. There is a lot more to learn about this hint: for example, telling the optimizer about the size of a rowsource doesn’t help if it’s going to use its estimate of distinct values in the next steps of the plan – a correction you’ve managed to inject at one stage may disappear in the very next optimizer calculation.

This catalogue lists more articles on the opt_estimate() hint and its relatives.

July 21, 2022

Pagination cost

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 3:56 pm BST Jul 21,2022

There’s a thread on the MOSC database tuning forum (needs an account) at the moment asking why a “fetch first N” query to fetch next 41 rows with an offset of 8602 rows takes longer to run than the same query when the offset is zero rows. Here’s a possible answer with a little lesson in thinking about what’s going on.

Apart from gremlins in the system there are two possible reasons

  • nothing has changed, but it takes longer to fetch 8643 rows in order and discard 8602 of them than it takes to fetch 41 rows in order and discard none
  • the optimizer has worked out that if it has to fetch 8643 rows then it ought to use a different plan but (as often happens) it was a bad idea to change the plan.

Here’s a little script to build some demo data.

rem
rem     Script:         fetch_first_offset.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as
select 
        * 
from 
        all_objects
where   rownum <= 50000
order by 
        dbms_random.value
/

create index t1_i1 on t1(object_name);

alter session set statistics_level = all;
set serveroutput off
set feedback off

column owner format a20
column object_type format a12
column object_name format a32

All I’ve done is create a table of 50,000 rows with an order by clause that maximises the randomness of the data pattern so that the index on object_name will have a very high clustering_factor.

Here’s the first query I’m going to run, followed by the execution plan, pulled from memory with rowsource execution stats enabled. I’ve queried for the first 20 rows (offset 0 next 20) ordered by object_name:

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        0 rows
fetch next 
        20 rows only
/

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last'));


SQL_ID  fmdb8vuxwkp99, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  0 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    23 (100)|     20 |00:00:00.01 |      25 |      2 |
|*  1 |  VIEW                         |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     20 |     3   (0)|     20 |00:00:00.01 |       5 |      2 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=20 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">0))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=20)

As you can see from the execution plan Oracle has used an index full scan (because that will access the data in exactly the right order and allows the “nosort stopkey” on the “window (no)sort” operation). It has fetched (A-Rows) 20 rows and reported a cost of 23 – which basically corresponds to 3 block visits for the index and one block visit for each row from the table. In passing you’ll notice from the Predicate Information at operation 2 that Oracle has transformed our “fetch first” into an analytic query using row_number() over(). The phrase “syntactic sugar” seems appropriate.

How do things change if we ask for the 2nd 20 rows – (offset 20, next 20). I’ll show only the output from dbms_xplan, including its slightly mangled SQL statement but dropping the Predicate Information:

select  owner, object_type, object_name from  t1 order by  object_name
offset  20 rows fetch next  20 rows only

Plan hash value: 3254925009

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    43 (100)|     20 |00:00:00.01 |      45 |
|*  1 |  VIEW                         |       |      1 |     40 |    43   (0)|     20 |00:00:00.01 |      45 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     40 |    43   (0)|     40 |00:00:00.01 |      45 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  49999 |    43   (0)|     40 |00:00:00.01 |      45 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     40 |     3   (0)|     40 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------

As you can see, the optimizer has still decided to use the index full scan, and this time has fetched 40 rows and passed them up the plan until at operation 1 it discards the first 20 rows. The cost (43) is again related to 3 blocks for the index, 40 blocks for 40 rows from the (randomly distributed) table.

What would we see if we added a /*+ full(t1) */ hint to the query to force a tablescan to get the 2nd 20 rows?

select /*+ full(t1) */  owner, object_type, object_name from  t1 order
by  object_name offset  20 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.01 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |     40 |   947   (1)|     20 |00:00:00.01 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|     40 |00:00:00.01 |     996 | 15360 | 15360 |14336  (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Oracle has obeyed the hint, and the tablescan has fetched all 50,000 rows from the table and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at operation 2; we haven’t had to create a sorted list of all 50,000 rows before picking the top 40. Again, once we’ve got the top 40 we discard the top 20 to allow for the offset.

We note that the cost of the tablescan was 278 but the cost of the sort was really rather large, taking the total cost of this path to 947. So here’s a thought experiment – what’s likely to happen if we ask for an offset of 940 and next 20? Given the costs we’ve seen for the indexed access path the optimizer will calculate a cost of 3 (plus a bit, maybe) for the index and a cost of 960 for visiting the table giving a total cost of about 963 – which should make the tablescan strategy the lower cost.

select  owner, object_type, object_name from  t1 order by  object_name
offset  940 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.02 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |    960 |   947   (1)|     20 |00:00:00.02 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|    960 |00:00:00.02 |     996 |   267K|   267K|  237K (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

My guesswork about the cost seems to have been nearly right. Unhinted, with an offset of 940 (which you can see as the 960 rows fetched) the optimizer has decided that the tablescan path has a lower cost than the indexed access.

Of course we ought to check this by hinting the indexed access path and seeing what its cost is:

select  /*+ index(t1) */  owner, object_type, object_name from  t1
order by  object_name offset  940 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |   969 (100)|     20 |00:00:00.01 |     968 |      4 |
|*  1 |  VIEW                         |       |      1 |    960 |   969   (1)|     20 |00:00:00.01 |     968 |      4 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |  50000 |     9   (0)|    960 |00:00:00.01 |       9 |      4 |
-----------------------------------------------------------------------------------------------------------------------

The cost of the indexed access path is 969 – that’s 960 for the randomly scattered table rows we need plus 9 for the index scan (because at 960 index entries we’re going to visit a few more index leaf blocks than the original 3).

Summing Up

I’ve demonstrated with a simple query using “offset N rows fetch M rows only” that the optimizer will calculate the cost of fetching “N + M” rows using whatever paths are available, then pick the lowest cost path.

As you might expect, the presence of a suitable index will encourage the optimizer to walk the index in order jumping randomly around the table to avoid the cost of acquiring all the relevant data and sorting it. So for “small” values of “offset + next” Oracle might choose an indexed access path with “window nosort stopkey”, but for “large” values of “offset + next” it might choose to do a full tablescan with “window sort pushed rank”.

The consequence of this is that – in the same way we see the optimizer switching between nested loops and hash joins at the wrong moment – we may see the optimizer switch from an indexed access path to a full tablescan either too soon, or too late.

Answering the question

Why did the query with an offset of 8602 take so much longer than the query with an offset of zero when the next was only 41 rows?

It may be that the optimizer stuck with an indexed access path and had to do physical reads of 8,643 blocks when it should have switched to a tablescan.

It may be that the optimizer switched to a tablescan and sort when it should have stuck with using an index on well-clustered, well-cached, data.

As so often happens, the first step to answering an SQL performance question is to look at the actual execution plans.

July 15, 2022

Index Wildcard

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 11:51 am BST Jul 15,2022

This is a little case study on working around a problem caused by a 3rd-party application that couldn’t be changed quickly and easily. There’s a touch of irony in this case as the problem I’ll address is a side effect of the best possible workaround to a more generic design flaw in the application, moreover the side effect appears because of a special case optimisation mechanism introduced in 12.2.

A recent post on the MOS database forum (needs an account) came up with a slightly surprising problem – a query with a predicate “columnX like ‘%XYZ%” would perform reasonably well if hinted to use a B-tree index on columnX (doing an index full scan) but performed badly when cursor_sharing was set to force because it switched to an index range scan which did a lot more work.

Modelling

Here’s a little script to build some data to demonstate the issue:

rem
rem     Script:         index_wildcard.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0

create table t1
as
select  obj.*
from
        (select * from all_objects where rownum <= 50000) obj,
        (select rownum from dual connect by level <= 5) gen
/

create index t1_id on t1(object_name);

alter session set cursor_sharing = force;
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

set serveroutput off

All I’ve done is create a simple heap table with 250,000 rows and an index (which happens to be on a column declared as NOT NULL, but that detail is not important). Here’s my query:

select  /*+ index(t1(object_name)) cursor_sharing_exact */
        t1.*
from
        t1 t1
where
        object_name like '%XYZ%'
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

In normal circumstances the leading wildcard would mean that the optimizer would not be able to use an index driven plan (unless the query satisfied the requirements to allow an index fast full scan), so the default plan would have been a tablescan with a selectivity of 5% (estimated rows = 12,500).

Because I’ve hinted the use of an index I get the following plan from a call to dbms_xplan.display_cursor():

LAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  1q1p6fyubhafu, child number 0
-------------------------------------
select /*+ index(t1(object_name)) cursor_sharing_exact */  t1.* from
t1 t1 where  object_name like '%XYZ%'

Plan hash value: 2798063786

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |   1717 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
|*  2 |   INDEX FULL SCAN                   | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |   1717 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("OBJECT_NAME" LIKE '%XYZ%')

Because I’ve set cursor_sharing to force I’ve included the hint /*+ cursor_sharing_exact */ in the query to make sure that Oracle sees the query with a literal rather than a system generated bind variable. The plan tells us that the optimizer has picked an index full scan, and the Predicate Information for operation 2 tells us that Oracle has checked our predicate for every single entry in the index, and the A-rows column tells us that it didn’t find any matching index entries. So the total work done was 1718 buffer gets and 250,000 index entries tested. (You’ll notice that the E-rows is still the standard 5% guess.)

Now let’s try again, allowing enforced cursor sharing to be applied (i.e. taking out the cursor_sharing_exact hint):

SQL_ID  cm4tjwuqxgbqg, child number 0
-------------------------------------
select /*+ index(t1(object_name)) */  t1.* from  t1 t1 where
object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.26 |     241K|
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.26 |     241K|
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |    250K|00:00:00.04 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_NAME" LIKE :SYS_B_0)
   2 - access("OBJECT_NAME" LIKE :SYS_B_0)

The SQL text shows us that the literal has been replaced by the bind variable :SYS_B_0. Operation 2 has changed from an index full scan to an index range scan – which “feels” as if “the range” should be the whole index and the query shouldn’t, therefore, do any more work than the index full scan plan. But the Buffers column now shows 241K buffer visits at operation 1, the table access! What’s gone wrong?

In the case of the OP the unexpected visit to the table blocks didn’t do just a few hundred thousand “unnecessary” buffer visits, it did 4 million buffer gets and 127,000 disk reads.

Look at the Predicate Information (always): the filter() that matched our original predicate is no longer being applied to the index, it’s only applied at the table. The only predicate applied to the index is an access() predicate – so Oracle takes the bind value and works out that the range scan should start at the beginning of the index and stop at the end of the index – that’s what an access() predicate does, it doesn’t check the actual values in the index entries once it has derived the start and stop locations.

Trouble-shooting

You will have noticed in the initial code to set up the data I had a call to set event 10053 (the CBO trace event) – a fairly quick and easy option to look at in this case since I’m looking at such a simple query. The critical details that showed up in the trace files were:

  • in both cases the optimizer reported “User hint to use this index” in the Base Statistical Information.
  • in the case of the “literal string” SQL the optimizer did not consider an index full scan
  • in the case of the “forced bind” SQL the optimizer did not consider an index range scan

This is worth knowing as it tells us that the switch between paths was not an “accident” of circumstances or statistics, it was a deliberately code strategy; so we need to discover whether there is something we can do to change the strategy.

Time to check the hidden parameters and fix controls, and this is what I found in a listing of the 19 .11.0.0 system fix controls (listed as appearing in 12.2.x.x):

     BUGNO OPTIMIZE SQL_FEATURE                        DESCRIPTION

---------- -------- ---------------------------------- ---------------------------------------------------------------- 
  20289688 12.2.0.1 QKSFM_ACCESS_PATH_20289688         check for leading wildcard in LIKE with bind 

Doesn’t that sound a little bit promising? Let’s turn the fix off by adding the hint /*+ opt_param(‘_fix_control’ ‘20289688:0’) */ to the problem query and see what we get:

SQL_ID  75cxtymdrh8r2, child number 0
-------------------------------------
select /*+   index(t1(object_name))   opt_param('_fix_control'
'20289688:0')  */  t1.* from  t1 t1 where  object_name like :"SYS_B_0"

Plan hash value: 539998951

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.08 |    1718 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |  12500 |      0 |00:00:00.08 |    1718 |
|*  2 |   INDEX RANGE SCAN                  | T1_ID |      1 |  12500 |      0 |00:00:00.08 |    1718 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" LIKE :SYS_B_0)
       filter("OBJECT_NAME" LIKE :SYS_B_0)

We still have an index range scan (with a range that is obviously the whole index), but our predicate is now being applied as a filter() in the index with a total of only 1,718 buffer gets even though the literal has been replaced by a system-generated bind variable.

Summary

The application is accepting a search value from the user interface and constructing “literal string” SQL that adds a leading and trailing wildcard to the value. As a result of this the optimizer will not consider using a particular index unless hinted, and does a very expensive tablescan.

The optimizer will accept and use an index() hint to produce a plan that is much more efficient than the tablescan even though it has to use an index full scan and apply a filter predicate on every index entry.

Unfortunately cursor_sharing has been set to force to deal with the general parsing overheads of applications that construct “literal string” SQL, and when the index() hint is applied to a bind-variable version of this query the optimizer applies an index range scan and postpones the filter() until it has acquired rows from the table.

A check of the system fix controls identified a fix that looked as if it might have some relevance to our situation, “leading wild card, bind variable, LIKE clause”, and a test with an opt_param() hint to switch off that optimizer “fix” had the effect we wanted – the optimizer still reported an index range scan but it also used our initial predicate as a filter() predicate on the index.

Technically, since the fix_control was introduced in 12.2.0.1 we could have got the same effect by setting the optimizer_features_enable back to 12.1.0.2 – but that’s not a good idea when the production version is so much newer although it might be an acceptable strategy when implemented as a hint for a few special case queries.

After showing that the fix control works by hinting the next step should be to create an SQL Patch (using dbms_sqldiag.create_sql_patch) attached to the (forced) bind-variable version of the query.

Update (a few hours later)

It didn’t take long to find that the fix_control and the optimzer_features_enable were not the only workarounds. There’s a hidden parameter “_optim_peek_user_binds” with the description “enable peeking of user binds” default value “true”, so if you set this to false the optimizer can’t notice that the critical bind value starts with a wildcard and behaves the way the OP wants. As with the other options you could set this as a system parameter or a session parameter, or as a hint (SQL patch etc.) /*+ opt_param(‘_optim_peek_user_binds’ ‘false’) */ – the last being the kindest for the system.

Update (21st July)

Bug number 33500777 (unpublished) looks like an exact match for the problem; the description reads:

FILTER FOR PREDICATE WITH LIKE AND BIND WITH LEADING WILDCARD IS NOT ALLOCATED ON INDEX SCAN AFTER FIX 20289688

May 31, 2022

Join View delete

Filed under: ANSI Standard,Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 5:39 pm BST May 31,2022

I’ve written a couple of notes about deleting from join views and (ultimately) the fact that if you have muliple key-preserved tables in join view then the first key preserved table in the from clause is the one where the delete is applied. The topic came up on the Oracle developer forum again today, and closed with a funny little quirk.

Here’s a little data model followed by one query and two deletes that I’ve just run on Oracle 19.11.0.0 and 21.3.0.0. The query and first delete feature a join using “ANSI” syntax, the second delete translates the join into “traditional” Oracle syntax.

rem
rem     Script:         delete_join_2.sql       
rem     Dated:          May 2022
rem     Author:         J P Lewis
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem

create table t_parent (
        id      number 
                        constraint t_parent_pk primary key,
        val     varchar2(10)
);

create table t_child (
        id      number 
                        constraint t_child_fk references t_parent(id),
        val     varchar2(10)
);

alter table t_child add constraint t_child_pk primary key(id);

insert into t_parent values(1,rpad('parent',10));
insert into t_child  values(1,rpad('child', 10));

commit;

execute dbms_stats.gather_table_stats(user,'t_parent',cascade=>true)
execute dbms_stats.gather_table_stats(user,'t_child', cascade=>true)

set autotrace on explain

select t_child.* from t_parent join t_child on t_child.id = t_parent.id;

delete
        (select t_child.* from t_parent join t_child on t_child.id = t_parent.id);

delete (select t_child.* from t_parent, t_child where t_child.id = t_parent.id);

set autotrace off

The table definitions and content (with a little modification) come from the forum posting, and I have to admit that the “one-to-one” referential integrity from parent to child is a little unusual (though occasionally a very effective strategy). I’ve also enabled autotrace to get the (probable) execution plans for the three statements, but in the script on my test machine I’ve also enabled the CBO (10053) trace to pick up a few extra details.

The query is there to show us that the join will return a single row, and the row content will be from the t_child table. What happens, though, when we try to use that query as an inline view to the delete.

It is an updateable join view – and both tables are, in fact, key-preserved thanks to the primary key constraints at both ends of the join. Here’s what we get (with the execution plans removed from the output):

        ID VAL
---------- ----------
         1 child

1 row selected.



delete
*
ERROR at line 1:
ORA-02292: integrity constraint (TEST_USER.T_CHILD_FK) violated - child record found


1 row deleted.

Line 5 tells us that we selected one row – as expected.

Line 11 tells us that we’ve raised an error because we’re trying to break the foreign key constraint on the t_child table – and that error shouldn’t be a surprise because the t_parent table appears first in the from clause, so that’s the table the optimizer wants to delete rows from.

However, line 15 tells us that if we change to “traditional ” Oracle syntax the delete works – so which of the two delete statements is displaying buggy behaviour?

Execution plans

There’s a clue in the execition plans (when we get them – autotrace doesn’t give a plan for the delete that raised the error). So here are the plans for the select and the “traditional” delete in that order:

----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    14 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_CHILD |     1 |    14 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------



-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |            |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  DELETE          | T_CHILD    |       |       |            |          |
|   2 |   INDEX FULL SCAN| T_CHILD_PK |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With assistance from the foreign key constraint, Join Elimination has occurred in both cases leaving t_child as the candidate for deletion. You’ll also note that the optimizer has recognised that while the view’s select list is “all columns” it only needs to optimize for “select rowid” to drive the delete, so produces an index-only access path to get the data it needs.

If we use explain plan, or check the 10053 trace file, for the ANSI delete that failed we’ll find the following:

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |             |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  DELETE             | T_PARENT    |       |       |            |          |
|   2 |   NESTED LOOPS      |             |     1 |     6 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN  | T_CHILD_PK  |     1 |     3 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| T_PARENT_PK |     1 |     3 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$4396EC5C
   3 - SEL$4396EC5C / T_CHILD@SEL$1
   4 - SEL$4396EC5C / T_PARENT@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T_CHILD"."ID"="T_PARENT"."ID")

There’s no Join Elimination, and Oracle is following the rule that “the first key preserved table in the from clause should be the target of the deletes.

Where’s the bug?

This is a question that doesn’t have an obvious answer.

The traditional syntax gives you the result you probably wanted but it’s deleting from the second table in the from clause of the join view – so is that a documentation bug or a coding bug because if the documentation is correct the statement should have raised the error that the ANSI syntax gave.

Conversely, is there an optimizer bug (or limitation) because table elimination could have been used in the ANSI case but wasn’t, although you could say that the ANSI code is doing the right thing (following the documentation) by raising an error and that the traditional syntax is doing the wrong thing.

If we search the CBO (10053) traces of the select and the ANSI join, looking for sections where the optimizer considers Join Elimination, the results are helpful:

grep -nT "^JE" or19_ora_30861_*.trc >temp.txt

With a little cosmetic editing here’s the output from the trace of the select statement:

or19_ora_30861_select.trc:  78  JE - Join Elimination
or19_ora_30861_select.trc: 807  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_select.trc: 813  JE:[V2] Query block (0x7fd3a152f5c0) before join elimination:
or19_ora_30861_select.trc: 816  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 817  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 818  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 819  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_select.trc: 820  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_select.trc: 821  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 822  JE:[V2] Query block (0x7fd3a152f5c0) after join elimination:

or19_ora_30861_select.trc: 844  JE:[V2]   Considering Join Elimination on query block SEL$58A6D7F6 (#0)
or19_ora_30861_select.trc: 850  JE:[V2] Query block (0x7fd39c09ee60) before join elimination:
or19_ora_30861_select.trc: 853  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 854  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 855  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 856  JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 857  JE:[V2] Eliminate table: T_PARENT (T_PARENT)
or19_ora_30861_select.trc: 858  JE:[V2] Query block (0x7fd39c09ee60) after join elimination:

We can see that the optimizer has considered join elimination twice (lines 2 and 12), once for each of two query blocks (sel$1 – a query block embedded in the text – and sel$58a6d7f6, a query block name generated during some query transformation).

At lines 5 and 6 we can see that Oracle failed to take advantage of the referential integrity constraint to eliminate t_parent with the explanation “predicate column refs”; but at lines 16 and 17 we see that there’s a change of heart and the optimizer does take advantage of the referential integrity to eliminate t_parent. So let’s look at the text of the “query block before join elimination” in both cases and see what’s changed:

First pass
----------
SELECT 
        "T_PARENT"."ID"  "QCSJ_C000000000300000",
        "T_PARENT"."VAL" "QCSJ_C000000000300002",
        "T_CHILD"."ID"   "QCSJ_C000000000300001",
        "T_CHILD"."VAL"  "QCSJ_C000000000300003" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"


Second Pass
-----------
SELECT 
        "T_CHILD"."ID" "ID",
        "T_CHILD"."VAL" "VAL" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"

The first pass seems to have expanded the two tables in the intial JOIN, ignoring the limited select list, so join elimination is blocked by the presence of the (non-PK) column t_parent.val (aliased as qcsj_c000000000300002, but when we get to the 2nd pass the select list has been limited to the minimum necessary set of columns that would be needed by a pure select statement and join elimination is possible.

The clue to how/why this has happened is the in the few lines between the two passes; the trace file also reports: “CVM: Merging SPJ view SEL$1 (#0) into SEL$2 (#0)” and when we look for the point where the optimizer registered query block sel$2 it turns out to have one item in its from clause, an object called: from$_subquery$_003 (which tells us where the 3 came from in all those column aliases QCSJ_C0000000003nnnnn. In effect it looks like the optimizer started by rewriting the ANSI select in Oracle syntax as (with a little poetic licence for clarity):

select  child_id, child_val, parent_id
from    (
         select  t_child.*, t_parent.*
         from    t_child, t_parent
         where   t_child.id = t_parent.id
        ) from$_subquery$_003
/ 

So the first pass was the optimizer trying to do join elimination on the inner query block and the second pass happened after the inner block was merged with the outer block and the resulting query block (called sel$58a6d7f6) allowed the join elimination to take place

So what happened with the delete statement. Here’s the result of the call to grep:

or19_ora_30861_delete.trc:  90  JE - Join Elimination
or19_ora_30861_delete.trc: 837  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_delete.trc: 843  JE:[V2] Query block (0x7fd3a152f758) before join elimination:
or19_ora_30861_delete.trc: 846  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_delete.trc: 847  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 848  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_delete.trc: 849  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_delete.trc: 850  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_delete.trc: 851  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 852  JE:[V2] Query block (0x7fd3a152f758) after join elimination:

That’s the lot – there is no attempt to do join elimination on a querye block called sel$58a6d7f6, though a few lines after this extract the trace file reports:

CVM:   Merging SPJ view SEL$1 (#0) into SEL$2 (#0)
Registered qb: SEL$58A6D7F6 0x9c0a3038 (VIEW MERGE SEL$2; SEL$1; SEL$2)

That’s exactly the same view merging that was present in the select – but the optimizer doesn’t attempt join elimination before moving on to report, a couple of lines later:

CVM:   Merging SPJ view SEL$58A6D7F6 (#0) into DEL$1 (#0)
Registered qb: SEL$4396EC5C 0x9c09dde0 (VIEW MERGE DEL$1; SEL$58A6D7F6; DEL$1)

It looks like this is a case of the optimizer missing a cycle in its transform/optimze loopback. The delete went through exactly the same process as the select, but failed to optimize the intermediate query block (which happened to be called from$_subquery$_004 in this case, being one layer deeper thanks to the extra level introduced by the delete).

Workaround

I tried two or three things to get the ANSI-style code to work without changing the table order in the from clause – extra layers of subqueries, explicit aliases, but Oracle kept coming back to a point where it had the full projection of all the columns in the two tables. (This reminded me of an old bug in ANSI select expansion that was fixed in 12cR2. It made me wonder if this was a code path where the same bug had been overlooked – it also made me wonder if there would be any problems if the two tables in the join totalled more than 1,000 columns – but that’s a test for another day.)

I did eventually find a fix (that didn’t involved switching the table order in the inline view):

delete
        (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);

Note that I now have only the primary key of t_child in the select

The section in the CBO trace file for Join Elimination looked like this:

JE:[V2] Query block (0x7f93996231f8) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_PARENT"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_PARENT" "T_PARENT","TEST_USER"."T_CHILD" "T_CHILD" WHERE "T_CHILD"."ID"="T_PARENT"."ID"
JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2] Eliminate table: T_PARENT (T_PARENT)
JE:[V2] Replaced column: T_PARENT.ID with column: T_CHILD.ID
JE:[V2] Query block (0x7f93996231f8) after join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_CHILD"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_CHILD" "T_CHILD"
Registered qb: SEL$E703A888 0x996231f8 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; "T_PARENT"@"SEL$1")

You may decide that this is an obvious workaround, but it’s not particularly helpful. In the general case I would expect to see a stored view that joined the two tables, with the expectation that this type of delete was just one of many operations the view was used for. Rewriting a view to use the increasingly common ANSI syntax shouldn’t result in some statements crashing “for no apparent reason”.

The last laugh

Given that the select and delete both included a merge of sel$1 into sel$2 to produce a query block called sel$58a6d7f6 I wondered what would happen if I blocked the merge:

SQL>  delete  /*+  no_merge (@sel$1) */
  2          (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);
	(select t_child.id from t_parent join t_child on t_child.id = t_parent.id)
        *
ERROR at line 2:
ORA-01752: cannot delete from view without exactly one key-preserved table

Isn’t that the error message that should have been coming out the entire time – according to the orginal documentation for updateable join views?

April 13, 2022

Adaptive Joins

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 1:53 pm BST Apr 13,2022

There’s a question on the Oracle Forums at the moment about a query that’s taking a long time to parse. Even after being reduced to one prebuilt (currently remote) table with two non-correlated outer joins to it the parse time is several hundred seconds. This seems fairly bizarre – I have seen some very long parse times from Oracle, but 10 minutes for 3 tables is well over the top; it did remind me, though of a note I started a few years ago of a 4 table join taking 4 seconds to parse, so I thought I’d present the query, the plan, and a little chat on debugging. Here’s the query:

select
        /*+ parallel(t4,3) */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

I’m expecting a simple cascade of hash joins, with t1, t2 and t3 – the “small” tables – turning into “build” tables, then t4 – the “large” table – passing through each of them in turn until the penultimate rowsource is aggregated.

Here’s the execution plan — which looks pretty much as I expected it to – but there’s something wrong about it that isn’t visible in the output. Why is the query (plan) saying it took 0.07 seconds to complete (A-time), returning only 60 rows, when my SQL*Plus session didn’t return any data for 4 seconds

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   300 (100)|          |        |      |            |     60 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR                |          |      1 |        |            |          |        |      |            |     60 |00:00:00.07 |       5 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)          | :TQ10004 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,04 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    HASH GROUP BY               |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |     60 |00:00:00.01 |       0 |      0 |  1394K|  1394K|     3/0/0|
|   4 |     PX RECEIVE                 |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |    180 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND HASH              | :TQ10003 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,03 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY            |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,03 | PCWP |            |    180 |00:00:00.14 |    6114 |   6018 |  1394K|  1394K|     3/0/0|
|*  7 |        HASH JOIN               |          |      3 |   8460 |   299   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.14 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|   8 |         JOIN FILTER CREATE     | :BF0000  |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|   9 |          PX RECEIVE            |          |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|  10 |           PX SEND BROADCAST    | :TQ10000 |      0 |     22 |     2   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |            PX SELECTOR         |          |      3 |        |            |          |  Q1,00 | SCWC |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 12 |             TABLE ACCESS FULL  | T3       |      1 |     22 |     2   (0)| 00:00:01 |  Q1,00 | SCWP |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 13 |         HASH JOIN              |          |      3 |  27322 |   297   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  14 |          JOIN FILTER CREATE    | :BF0001  |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  15 |           PX RECEIVE           |          |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  16 |            PX SEND BROADCAST   | :TQ10001 |      0 |     21 |     2   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  17 |             PX SELECTOR        |          |      3 |        |            |          |  Q1,01 | SCWC |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 18 |              TABLE ACCESS FULL | T2       |      1 |     21 |     2   (0)| 00:00:01 |  Q1,01 | SCWP |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 19 |          HASH JOIN             |          |      3 |  92953 |   294   (3)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  20 |           JOIN FILTER CREATE   | :BF0002  |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  21 |            PX RECEIVE          |          |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  22 |             PX SEND BROADCAST  | :TQ10002 |      0 |     19 |     2   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  23 |              PX SELECTOR       |          |      3 |        |            |          |  Q1,02 | SCWC |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|* 24 |               TABLE ACCESS FULL| T1       |      1 |     19 |     2   (0)| 00:00:01 |  Q1,02 | SCWP |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|  25 |           JOIN FILTER USE      | :BF0000  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  26 |            JOIN FILTER USE     | :BF0001  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  27 |             JOIN FILTER USE    | :BF0002  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  28 |              PX BLOCK ITERATOR |          |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWC |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|* 29 |               TABLE ACCESS FULL| T4       |     48 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.06 |    6114 |   6018 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
  12 - SEL$1 / T3@SEL$1
  18 - SEL$1 / T2@SEL$1
  24 - SEL$1 / T1@SEL$1
  29 - SEL$1 / T4@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_fix_control' '16923858:5')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T4"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T3"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T4"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$1")
      USE_HASH(@"SEL$1" "T4"@"SEL$1")
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$1" "T3"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T4"@"SEL$1" BROADCAST NONE)
      PX_JOIN_FILTER(@"SEL$1" "T4"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T2"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T3"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T3"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T2"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T3"@"SEL$1")
      GBY_PUSHDOWN(@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T3"."ID"="T4"."ID3")
  12 - filter((TO_NUMBER("T3"."SMALL_VC")=1 OR TO_NUMBER("T3"."SMALL_VC")=2 OR TO_NUMBER("T3"."SMALL_VC")=3 OR TO_NUMBER("T3"."SMALL_VC")=4 OR TO_NUMBER("T3"."SMALL_VC")=5))
  13 - access("T2"."ID"="T4"."ID2")
  18 - filter((TO_NUMBER("T2"."SMALL_VC")=1 OR TO_NUMBER("T2"."SMALL_VC")=2 OR TO_NUMBER("T2"."SMALL_VC")=3 OR TO_NUMBER("T2"."SMALL_VC")=4))
  19 - access("T1"."ID"="T4"."ID1")
  24 - filter((TO_NUMBER("T1"."SMALL_VC")=1 OR TO_NUMBER("T1"."SMALL_VC")=2 OR TO_NUMBER("T1"."SMALL_VC")=3))
  29 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0002,"T4"."ID1"),SYS_OP_BLOOM_FILTER(:BF0001,"T4"."ID2"),SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID3")))


One possible trouble-shooting step is simply to re-run the query, taking a snapshot of the session activity stats (v$mystat) and the session events (v$session_event) to see what they tell you (if anything) – here’s the critical extract from the session stats:

Name                                      Value
----                                      -----
CPU used when call started                  423
CPU used by this session                    429
parse time cpu                              411
parse time elapsed                          419

Most of the time is parse time, spent on the CPU. (If this had been a much larger scale problem and had occurred in the recent past I might have looked at ASH (v$active_session_hsitory) for any samples for the correct SQL_ID, and seen the problem revealed in the in_parse, in_hard_parse columns.

So let’s enable event 10053 and run the query again – but since it’s “only” 4 seconds, let’s tweak the timer option to report any step that took longer than 0.1 seconds. The default timer setting is a whole second (10^6 microseconds), so we set the fix-control to 5 to get 0.1 seconds (10^5 microseconds).

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

-- run the query, find the trace file

 grep TIMER or19_ora_23370.trc 

Here’s the output from the call to grep: it looks like group by placement (GBP) is causing a problem.

TIMER:  GBP: costing SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER: GBP: iteration (#1) SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER:  GBP: costing SEL$1565E019 cpu: 0.293 sec elapsed: 0.298 sec
TIMER: GBP: iteration (#2) SEL$1565E019 cpu: 0.294 sec elapsed: 0.299 sec
TIMER:  GBP: costing SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER: GBP: iteration (#3) SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER:  GBP: costing SEL$B5D97CA0 cpu: 0.533 sec elapsed: 0.540 sec
TIMER: GBP: iteration (#4) SEL$B5D97CA0 cpu: 0.534 sec elapsed: 0.540 sec
TIMER:  GBP: costing SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.531 sec
TIMER: GBP: iteration (#5) SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.532 sec
TIMER:  GBP: costing SEL$ED1298E3 cpu: 0.522 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#8) SEL$ED1298E3 cpu: 0.523 sec elapsed: 0.524 sec
TIMER:  GBP: costing SEL$5968095A cpu: 0.523 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#9) SEL$5968095A cpu: 0.524 sec elapsed: 0.523 sec
TIMER:  GBP: costing SEL$4CA81688 cpu: 0.525 sec elapsed: 0.527 sec
TIMER: GBP: iteration (#12) SEL$4CA81688 cpu: 0.526 sec elapsed: 0.528 sec
TIMER: Group-By Placement SEL$1 cpu: 3.766 sec elapsed: 3.793 sec
TIMER: Cost-Based Transformations (Overall) SEL$1 cpu: 3.769 sec elapsed: 3.795 sec
TIMER: Access Path Analysis (Final) SEL$1 cpu: 0.288 sec elapsed: 0.289 sec
TIMER: SQL Optimization (Overall) SEL$1 cpu: 4.072 sec elapsed: 4.108 sec

If you check further up the page, though, you’ll see in the Outline Data that Oracle has not used group by placement so one quick hack we could try is to add the hint /*+ no_place_group_by(@sel$1) */ to the query just to see what happens – and here’s the effect on the parse time:

Name                                      Value
----                                      -----
parse time cpu                               33
parse time elapsed                           34

Problem solved – provided we can get the hint into the code (by hand, or SQL Patch, etc.)

Note you’ve probably spotted the gby_pushdown() hint in the Outline Data. That’s not “group by placement” it’s “group by pushdown” which relates to aggregation in parallel execution and explains why you see hash group by appearing at both operations 3 and 6, viz: before sending and after receiving. The optimizer thinks that aggregating before sending is worth doing to reduce the size of the rowsource that has to be sent to the next layer of PX processing.

The question still remains: where did the time go? The trace file was fairly long (375,000 lines for the original, compared to 32,000 for the hinted) but a rapid scan seemed in order – and something very quickly caught my attention. It was pretty easy to spot because something big and nasty had happened 8 times.

The answer was in “Adaptive Plans”, which (mostly) get flagged with the label “AP:” in the 10053 trace file, for example:

AP: Computing costs for inflection point at min value 0.00
AP: Using binary search for inflection point search
AP: Costing Join for ADM inflection point at card 0.00
AP: Costing Join for ADM inflection point at card 0.00

You can see here that the optimizer is searching for an “inflection point”, that means it’s “thinking about” an adaptive join, and searching for the number of rows where a switch between a nested loop join and a hash join makes sense.

Notice particularly the comment about “using binary search”. After calculating the cost of the “best” join using the current estimates of cardinality for the incoming rowsource the optimizer starts calculating the possible costs (nested loop or hash, it ignores merge) for a set of alternative cardinalities until it finds the cardinality where the nested loop join and hash join have the same cost. That’s the number of rows that Oracle will use at run time to decide whether it should switch from its initial selection of join method to the alternative.

If the initial join selection was a hash join (i.e. large volume startegy) Oracle will use an alternative cardinality of 1 to start its list of estimates – on the assumption that that would produce a nested loop plan and then keep doubling the estimate until the resulting plan switched to a hash join, then close in on the break point by halving and doubling the remaining gap between the NL estimate and the HJ estimate.

If the initial join selection was a nested loop join Oracle will use a “worst case scenario” for the incoming estimate (acting, broadly speaking, as if every filter at that point had had a selectivity of 1 – i.e. 100% of the available data), and start by halving the estimate. This is fine when the starting estimate is (for example) for the first table in the join order and the esimate is just a few million rows.

Here’s the first estimate in my case (you’ll need a wide screen, or very small print):

6772:AP: Computing costs for inflection point at max value 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00 

After you’ve halved that number about 1,000 times you’re down to values in the region of a few thousand. And when you have to cost a couple of joins every time you halve, and when you’ve gone through the whole process 8 times that’s a lot of CPU.

In fact, with the no_place_group_by hint there was still one point where the optimizer did this adaptive join work – which probably accounts for most of the final 0.33 CPU seconds – but it didn’t show up in any separately reported timer events.

Of course the final little test of this parse time hypothesis is to add the hint /*+ no_adaptive_plan */ – so I did that, and the parse time was still about 0.3 seconds! Investigation of the 10053 trace showed that even with the hint in place the optimizer still went through that one huge binary chop – but when it had worked out the inflection point it printed the message:

AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

According to the hint report the hint was valid, however, so that behaviour looks a little buggy. It then occurred to me that maybe I could have hinted /*+ no_adaptive_plan(@sel$1) */ – and that worked properly with the trace reporting:

AP: Checking validity for query block SEL$1, sqlid=7fjtvwktcmsgq
AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

Had I not realised that this was possible I have fallen back on the hint /*+ opt_param() */ to set the parameter optimizer_adaptive_plans to false for the query in which case the following message (and no other “AP:” message) appeared near the start of the trace:

AP: Adaptive joins bypassed for query block SEL$1 due to disabled by adaptive join parameter

If you want to experiment on your own system here’s the script to create the data – the script name reflects the fact that I found this example by accident while working on something completely different:

rem
rem     Script:         12c_vector_transform_c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2015
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             12.1.0.2

drop table t1;
drop table t2;
drop table t3;
drop table t4;
purge recyclebin;

create table t1 
as
select
        rownum          id,
        to_char(mod(rownum,10)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(mod(rownum,12)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(mod(rownum,14)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

spool 12c_vector_transform_c

set timing on
set linesize 255
set trimspool on
set pagesize 90
set arraysize 100
set serveroutput off

execute snap_my_stats.start_snap

alter session set statistics_level = all;

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

select
        /*+ 
--              opt_param('optimizer_adaptive_plans' 'false')
--              no_adaptive_plan
--              no_adaptive_plan(@sel$1)
--              no_place_group_by(@sel$1)
                parallel(t4,3)
        */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats all note -bytes'));

alter session set events '10053 trace name context off';

set serveroutput on
execute snap_my_stats.end_snap

spool off

March 22, 2022

Upgrade Surprise

Filed under: 19c,Bugs,Oracle,Transformations,Upgrades — Jonathan Lewis @ 10:04 am GMT Mar 22,2022

Here’s a little surprise that showed up in the most recent (March 2022) article that I sent to Simpletalk for the series on transformations. I had been using 19c (19.11.0.0) to create and run my little demos but the editor had used 12.2.0.1 to check the examples and questioned a comment I had made about a “default plan”.

Here’s the query in question. I was using the emp and dept tables from the Scott schema to demonstrate a point about subquery execution:

rem
rem     Script:         unnest_demo_simpletalk_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem 

break on deptno skip 1

select
        /*+ 
                qb_name(main)
                gather_plan_statistics 
        */
        *
from    emp e1
where   e1.sal + nvl(e1.comm,0) > (
                select  /*+ qb_name(subq) */
                        avg(e2.sal + nvl(e2.comm,0))
                from    emp e2
                where   e2.deptno = e1.deptno
        )
order by
        e1.deptno, e1.empno
/

As you can see, I’ve used a correlated aggregate subquery to report all employees who earned more than the average for their department, where “earnings” is calculated as the sum of salary and commission.

Here’s the plan I got when I ran this query under 19c:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |      1 |      6 |00:00:00.01 |      24 |  2048 |  2048 | 2048  (0)|
|*  2 |   FILTER             |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   3 |    TABLE ACCESS FULL | EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   4 |    SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:00.01 |      18 |       |       |          |
|*  5 |     TABLE ACCESS FULL| EMP  |      3 |      5 |     14 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("E1"."SAL"+NVL("E1"."COMM",0)>)
   5 - filter("E2"."DEPTNO"=:B1)

When my editor ran the query on 12.2.0.1, and when I started up an instance of 12.2.0.1 and ran the query, the plan looked like this:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|*  1 |  FILTER              |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|   2 |   SORT GROUP BY      |      |      1 |      4 |     14 |00:00:00.02 |      29 |      6 |  2048 |  2048 | 2048  (0)|
|*  3 |    HASH JOIN         |      |      1 |     70 |     70 |00:00:00.02 |      29 |      6 |  1922K|  1922K| 1053K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9E18A930")
      MERGE(@"SEL$AA0D0E02" >"SEL$B4BE209F")
      OUTLINE(@"SEL$B4BE209F")
      UNNEST(@"SUBQ")
      OUTLINE(@"SEL$AA0D0E02")
      OUTLINE(@"SEL$D6166863")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      FULL(@"SEL$9E18A930" "E2"@"SUBQ")
      FULL(@"SEL$9E18A930" "E1"@"MAIN")
      LEADING(@"SEL$9E18A930" "E2"@"SUBQ" "E1"@"MAIN")
      USE_HASH(@"SEL$9E18A930" "E1"@"MAIN")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("E1"."SAL"+NVL("E1"."COMM",0)>SUM("E2"."SAL"+NVL("E2"."COMM",0))/COUNT("E2"."SAL"+NVL("E2"."COMM",0)))
   3 - access("E2"."DEPTNO"="E1"."DEPTNO")

(I’ve added in a request for the ‘outline’ in the display_cursor() format.) The correlated subquery has been unnested and the resulting inline view has been subject to complex view merging. It was only at this point that I realised that the 19c plan was a little surprising and not what I should have expected.

After checking that the configuration and statistics (including the system stats) were the same on the two instances I re-ran the query on 12c with the /*+ no_unnest */ hint to make it use the plan that 19c had produced and I found (as expected) that the plan with filter subquery produced by 19c had a higher cost than the unnesting plan produced by 12c.

Next I re-ran the query on 19c with the /*+ unnest */ hint to make it use the plan that 12c had produced – but it didn’t! 19c “ignored” the hint and carried on using the filter subquery plan. It did, however, supply the following extra information when I added the ‘hint_report’ option to the to the display_cursor() format:

Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  MAIN
           -  qb_name(main)

   4 -  SUBQ
         U -  unnest / Failed basic validity checks
           -  qb_name(subq)

The code in 19c thinks that it would be illegal to unnest the subquery that 12c was unnesting (does this mean that some people may be getting wrong results in 12c). So I checked the 10053 (CBO) trace file to see if there was any further information there that would “explain” the problem. This is what I found:

SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#1)
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

The reference to “Null accepting” looks a little suspect but prompted a few experiments (modifying the data to eliminate nulls, adding not null declarations to columns, simplifying the query etc.) that suggested that the problem was essentially that the optimizer did not want to unnest when the comparison was with the expression (sal + comm) regardless of the operator, and even when all the relevant columns had been populated, declared not null, and the nvl() function had been removed.

It doesn’t seem reasonable in this case, but possibly the block is a generic strategy to avoid errors in some more subtle cases, and perhaps the block will be refined and relaxed in future versions. (Or maybe it’s a bug that the wrong test is being called at this point – there was nothing in the query requiring “null acceptance” by the time I got through the last test.)

I did find a workaround that you could use to avoid any code rewrite:

alter table emp add nvl_earnings 
        invisible 
        generated always as (sal + nvl(comm,0)) 
        virtual
;

There’s seems to be no good reason why this should work – but it does. The subquery unnests and the Predicate Information in the plan doesn’t give any clue that it’s using a virtual column.

Summary:

When you upgrade from 12c there are some queries involving correlated subqueries that no longer unnest the subquery. This may have a significant impact on performance and it may not be possible to bypass the problem unless you rewrite the query to do a manual unnest although I did find a virtual column workaround for my specific example. So far I’ve tested the query on 19.11.0.0 and 21.3.0.0, the behaviour is the same in both versions.

Footnote:

After failing to find anything on MOS about the problem I emailed a draft of this note to Nigel Bayliss at Oracle – who did find a promising match on MOS.

The failure to unnest may be the consequence of the fix for bug 30593046: “A query having a scalar subquery returned a wrong result”. The fix was introduced in 19.9.0.0 but was too restrictive, leading to the creation of bug 33325981: “Query Loses Subquery Unnesting From Plan in 19.9 and Above”.

The fix for 33325981 was distributed in 19.13.0.0 and 21.4.0.0 (plus a couple of earlier RURs, with patches available for various versions back to 12.2.0.1). Unfortunately the newer fix still doesn’t go far enough in reducing the restrictions and my example still doesn’t unnest.

Make sure you check any code that depends on “expression-based” subquery unnesting before you upgrade to 19.9, as it may change plan, which may make a difference to performance and a requirement for a workaround.

March 2, 2022

Generated Predicates

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 11:25 am GMT Mar 2,2022

A question arrived on the MOS Community forum yesterday (needs an account if you want to see the original) that reported a couple of fragments of a CBO trace (10053) file:

----- Current SQL Statement for this session (sql_id=4c85twjpdg8g9) -----
select /*+ 123456 */ count(*) from gl_detail where prepareddatev='2022-01-22 15:00:00'

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "NC63"."GL_DETAIL" "GL_DETAIL" 
WHERE "GL_DETAIL"."PREPAREDDATEV"='2022-01-22 15:00:00' 
AND SUBSTR("GL_DETAIL"."PREPAREDDATEV",1,10)='2022-01-22'

The question was:

Why after transformations ,oracle add condition SUBSTR(“GL_DETAIL”.”PREPAREDDATEV”,1,10)=’2022-01-22′

Mark Powell asked for the execution plan and information about indexes (normal and function-based) and histograms, as well as asking for the Oracle version. I asked about constraints and virtual columns and, in particular, the possibility of a virtual column being used as a partition key.

We didn’t get explicit answers to all our questions, but we did get “no constraints, no virtual columns, no partitioning”, and we also got the full 10053 trace file which, given the simplicity of the query, was mercifully short .. a mere 95KB and 2,800 lines.

The key aid to reading 10053 trace files is knowing what you’re expecting to see before you start looking. And with a generated predicate there was likely to be something that would tell me about about the “column” that caused the predicate to appear and the arithmetic that was the consequence of that predicate coming into existence. So I started with the section headed “SINGLE TABLE ACCESS PATH” where the cardinality estimate (for each individual table) would be calculated. This showed two columns being considered for the single table in the query:

  Column (#77): 
    NewDensity:0.000000, OldDensity:0.000035 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:8314506
  Column (#77): PREPAREDDATEV(
 
  Column (#88): 
    NewDensity:0.000188, OldDensity:0.000583 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:4551
  Column (#88): SYS_NC00088$(

Check the name of column #88 – sys_nc00088$ – that’s an internally generated virtual column which may well turn out to be associated with a function-based index, so let’s back up a bit to the “BASIC STATISTICAL INFORMATION” and (thirteen sets of) index stats for the table where we find:

  Index: I_GL_DETAIL_7  Col#: 88
    LVLS: 3  #LB: 433301  #DK: 4551  LB/K: 95.00  DB/K: 5922.00  CLUF: 26953639.00

The obvious first guess is that column #88 is the invisible virtual column underpinning an index that has been created on substr(prepareddatev,1,10) and here’s a quick and dirty test script to demonstrate that this could be the correct guess.

rem
rem     Script:         generated_predicates.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2022
rem 

create table t1 (v1 varchar2(20), v2 varchar2(1));
create index t1_i1 on t1(substr(v1,1,10));

select column_name, virtual_column, hidden_column from user_tab_cols where table_name = 'T1';
select * from user_ind_expressions where table_name = 'T1';

insert into t1 values('2022-03-02 09:01:00', 'x');
commit;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace traceonly explain

select /*+ full(t1) */  * from t1 where v1 = '2022-03-02 09:01:00';

set autotrace off

And here’s the output cut and pasted from an SQL*Plus session running 11.2.0.4 (which is the version the CBO trace file came from).

Table created.


Index created.


COLUMN_NAME          VIR HID
-------------------- --- ---
V1                   NO  NO
V2                   NO  NO
SYS_NC00003$         YES YES

3 rows selected.


INDEX_NAME           TABLE_NAME                COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- ------------------------- ---------------------------------------- ---------------
T1_I1                T1                        SUBSTR("V1",1,10)                                      1

1 row selected.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("V1"='2022-03-02 09:01:00' AND
              SUBSTR("V1",1,10)='2022-03-02')

We see the “extra” predicate and a column with a name of the form sys_ncXXXXX$. The results from more recent versions of Oracle should be the same. I think there’s a pretty good chance that if the OP runs suitable queries against XXX_tab_cols and XXX_ind_expressions they’ll see similar results that explain the predicate that surprised them.

Footnote

There are various notes on the blog about constraints and transitive closure generating extra predicates, and how the optimizer can use function-based indexes that have definitions that are “good enough” though not perfect matches for user-supplied predicates. This is just another little detail in how the optimizer tries to find as much helpful information as it can from the data dictionary. The earliest note I can find on my blog about this at present is about partition elimination and generated predicates – which prompted various comments about function-based indexes and predicate generation.

February 21, 2022

CTE Enhancement

Filed under: 12c,Oracle,Performance,Transformations — Jonathan Lewis @ 9:24 am GMT Feb 21,2022

For many years I’ve told people that when you materialize a CTE (common table expression / “with” subquery) the result set will be written to the temporary tablespace using direct path writes and will be read back using cached reads. This stopped being an accurate description in 12c.

There is a clue about this in the way that the corresponding execution plans and I’ll be pointing that out later. The key difference between earlier versions of Oracle and newer versions is that the GTT (global temporary table) that holds the materialized result set is not necessarily written to disc, and may even avoid allocating a temporary segment.

I started looking at this feature a couple of days ago after remembering that someone, somewhere, had mentioned some details about a temporary object being kept in the PGA rather than being written to disc if the size wasn’t too big. I couldn’t remember if this was GTTs or temporary LOBs (or something completely different) and I only had a vague memory that there was a critical size that might have been 256KB; so I started experimenting with materializing CTEs.

Here’s the script I used to establish a baseline in 11g. I took a fairly arbitrary starting guess that if there was a PGA/Temp split is might be around 64KB.

rem
rem     Script:         cte_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

alter session set events '10046 trace name context forever, level 8';

prompt  ==============================
prompt  First sample - just under 64KB
prompt  ==============================

set arraysize  35
set pagesize  35

set pause Waiting...
set pause on

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 63
)
select
        substr(v1,-10)
from
        g1
;

alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_ts.end_snap

prompt  ==============================
prompt  Second sample - just over 64KB
prompt  ==============================

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 64
)
select
        substr(v1,-10)
from
        g1
;

execute snap_my_stats.end_snap
execute snap_ts.end_snap

set pause off

The calls to the package snap_my_stats are the usual ones I use (very old source at this Wayback URL) to report the session’s activity (v$mystat) between start and end snapshot and the calls in the snap_ts package do the same for the I/O per tablespace, summing bu tablespace across v$filestat and v$tempstat.

This script basically materializes and reports a GTT with a single column of 1,024 bytes, and in this case either 63 or 64 rows. You’ll note that I’ve built another assumption into the code that the CTE (if kept in memory) won’t follow all the details of a “real” table block, but will be a simple array with a tiny overhead per row – I’ve also assumed that the optimizer isn’t smart enough (or foolhardy enough) to push the substr() call inside the CTE.

I’ve set pause on and set both the arraysize and pagesize to a value less than the total number of rows I’ll be fetching so that I can check a couple of dynamic performance views (in particular v$sort_usage) from another session while the queries are running.

As a starting point, here are some critical values I’ve selected from the various outputs for just the 63 row case when running 11.2.04:

-------------
Session stats
-------------
Name                                                 Value
----                                                 -----
physical reads                                           9
physical reads cache                                     9
physical writes                                          9
physical writes direct                                   9
physical writes direct temporary tablespace              9

---------
T/S Stats 
---------
TS#        Reads      Blocks   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max Tablespace
----       -----      ------   ---------    ---      ------      ------   ---------    --- -------------------
   3           1           9        .000      1           1           9        .000      0 TEMP


Since I’ve enabled extended tracing at level 8 (waits) I can also show you the I/O waits reported for the reads and writes

WAIT #140185232180256: nam='direct path write temp' ela= 180 file number=201 first dba=35073 block cnt=9 obj#=-1 tim=1645178184267768
...
WAIT #140185232180256: nam='db file scattered read' ela= 183 file#=201 block#=35073 blocks=9 obj#=-40016369 tim=1645178184268342

A dump of the 9 blocks of the temporary file (the “file number-201” translates to tempfile 1 since I have db_files = 200) starting at block 35073 looks like an ordinary table with pctfree 10 (which is one of the annoying details of GTTs – you can’t adjust it), 3 ITL slots (which is normal for CTAS) and 7 rows per block.

So, for 11g, we conclude that the effect of materializing a CTE is simply to create a GTT in the temporary tablespace, write it out using direct path writes, then read it back into the buffer cache using db file scattered reads. (You might want to confirm that this always happens, even if the CTE holds only one row.)

If you take advantage of the pause to issue “alter system flush buffer_cache” from another session you can also dump the segment header block (35072 in my case) to see that it’s a normal table segment header block – using freelist management, not ASSM because that’s the way temporary tablespaces have to be declared. The segment header block didn’t get written to disc in the normal course of the test.

12c Enhancement

This is the moment where the second query, and the pause that allows me to query v$sort_usage, becomes significant. When I started 12.2.0.1 with the 63 row query I saw:

  • No I/O on the temporary tablespace
  • No entry in v$sort_usage

To my great satisfaction the 64 row query did report I/O to the temporary tablespace (10 blocks this time – needing one extra block to cater for the 64th row) with v$sort_usage reporting a segment being created on my behalf. Obviously I re-ran the test a couple of times, flushing the buffer cache and shared pool, and connecting to a new session each time. The results were totally consistent: 63 rows => no GTT, 64 rows => GTT.

If you’re feeling a little suspicious at this point, bear with me.

This is the point where I switched to 19.11.0.0 – and both queries ran in memory with no sign of a GTT being created. Luckily I had cloned the query several times in the script generating different pairs of numbers of rows: 127/128, 255/256, 511/512, 1023/1024, and when I hit 1024 (and 1023) my session produced a GTT.

Somewhere between 512 and 1023 rows I was hitting a critical breakpoint – so I nearly started working through a binary chop to find the actually breakpoint; luckily, though, I had a little inspiration: if the overhead per row was 3 bytes (as it would be for a normal table column of more than 254 bytes) then 1023 rows would have an overhead of about 3KB – so I should test 1021 rows if I wanted to test a memory of just under 1MB.

Sure enough, at 1021 rows the GTT didn’t appear, at 1022 rows it did – time after time after time.

But …

My tests seemed to be totally repeatable. Again, I connected to a new session a few times, I flushed the buffer cache, I flushed the shared pool, I checked v$sort_usage. Consistently the results seemed to say:

  • 12.2 uses the PGA up to 64KB then dumps to a GTT
  • 19.11.0.0 uses the PGA up to 1MB then dumps to a GTT

Except – that night I had to shut down the two virtual machines because sometimes, for no obvious reason, I can’t hibernate my laptop while the VMs are running; and when I started everything up again the following morning and started re-running the tests as I wrote up the results something had changed. In fact my 19.11 instance didn’t dump to a GTT until I had reached nearly 10MB of data and my 12.2 wasn’t even dumping at 1MB; and I have no idea why a complete restart made such a difference.

After spending a little time trying to figure out what had changed – and I think it may have been that I’d been running the previous day’s tests after doing a lot of heavy work with temporary LOBs trying to pin down an anomaly with the handling of the temporary tablespace – I finally tried a google search using keywords that might be relevant and found this article that Keith Laker wrote about 5 years ago.

The feature is known as In-memory “cursor-duration” temporary table. I mentioned a clue in the execution plans at the start of this note: materialization shows up with a “temp table transformation” operation followed, in 11g, by with a child operation of “load as select”; but in 12.2 the child operation is “load as select (cursor duration memory)”. I really should have started my investigation by putting the entire text of that operation into a search engine.

Summary

(Basically the highlights from Keith’s article):

  • The “in-memory cursor-duration”temporary table” change appeared in 12.2
  • It can be used in a number of transformations that the optimizer does
  • It’s not possible to force the use of the feature for a given query, it’s down to an internal algorithm
  • The mechanism uses memory that is “essentially” PGA
  • Despite the name this feature does not require you to licence the In-Memory option
  • If you’re still using an older version of Oracle this could be a good reason for upgrading as it can reduce the I/O load particularly for “analytic” types of query at a cost of using extra memory.

All the work I had done trying to find a break-point where Oracle switched from using PGA to using a GTT had been a waste of time – and the apparently consistent results on the first day had been an “accident” dictated (possibly) by some PGA-related activity that had taken place before I started running my tests .

Footnotes and geeky things

Five years on from the publication date of Keith’s article we shouldn’t be surprised to see some changes. Keith notes that the mechanism will apply only to serial queries that do more than one pass over the table – but there are two points to raise there:

  • possibly the two-pass thing is because it usually takes two passes over a CTE before Oracle will materialize a CTE automatically; my example shows the in-memory effect on a single pass – but that was a forced materialization.
  • the restrictions on parallelism may have been relaxed by now – if you check for hidden parameters you will find: _in_memory_cdt_maxpx, default value 4, described as “Max Parallelizers allowed for IMCDT”.

Interestingly there are more “cdt” parameters in 12.2 than there are in 19.11, and there are clear indications of some changes in algorithm and mechanism:

12c parameters like '%cdt%
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_cdt_shared_memory_limit                   0                    Shared Memory limit for CDT
_cdt_shared_memory_query_percent           20                   Per Query Percentage from CDT pool
_imcdt_use_mga                             ON                   MGA for In Memory CDT
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

19g parameters like '%cdt%'
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_hcs_enable_in_mem_cdt_hint                FALSE                add hint opt_param('_in_memory_cdt', 'off')
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

The parameter “_in_memory_cdt” can take the values ON, LIMITED, or OFF – which tells you that even if you can’t force a query to use in-memory CDTs you can (if you really want to) stop a query from using the feature. There are a few notes about this parameter and its significance to RAC and parallel execution (for 12.2) on MOS – if you have an account – Doc ID 2388236.1 What is _in_memory_cdt Parameter?

The reference to MGA (the “managed global area”) in 12.2 is also quite interesting. This is documented as a Solaris feature using OSM to share memory between processes. For more general details you can review MOS Doc ID 2638904.1 MGA (Managed Global Area) Reference Note (again, only if you have an account).

The “new” oradebug mechanism shows (from 18c) a couple of relevant components under SQL compilation and execution that you could trace if you want to investigate further.

Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler ((null))
    ICDT_Compile               In Memory CDT Compilation (qks3t)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    ICDT_Exec                  In Memory CDT Execution (qes3t, kxtt)

Next Page »

Website Powered by WordPress.com.