Execution plans | Oracle Scratchpad

March 12, 2024

Missing Partition

Filed under: CBO,Execution plans,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 12:20 pm GMT Mar 12,2024
Tags: Oracle, Partitioning

Here’s a silly little detail about execution plans on (interval) partitioned tables that I hadn’t noticed until it showed up on this thread on a public Oracle forum: it’s an execution plan that claims that Oracle will be visiting a partition that clearly won’t be holding the data requested.

Here’s the starting section of a demonstration script – mostly by Solomon Yakobson with minor tweaks and additions from me:

rem
rem     Script:         non_existent_partition.sql
rem     Author:         Solomon Yakobson / Jonathan Lewis
rem     Dated:          Mar 2024
rem
rem     Last tested 
rem             19.11.0.0
rem

create table invoices_partitioned(
        invoice_no   number not null,
        invoice_date date   not null,
        comments     varchar2(500)
)
partition by range (invoice_date)
interval (interval '3' month)
(
        partition invoices_past values less than (date '2023-01-01')
);

insert into invoices_partitioned
select  level,
        date '2023-01-01' + numtoyminterval(3 * (level - 1),'month'),
        null
from  dual
connect by level <= 6
/

insert into invoices_partitioned select * from invoices_partitioned;
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/

commit
/

execute dbms_stats.gather_table_stats(user,'invoices_partitioned')

set linesize 156
column high_value format a80

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

alter table invoices_partitioned drop partition for (date'2023-09-01');
purge recyclebin;

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

The script creates an interval partitioned table, with an interval of 3 months, then inserts 131,072 rows per partition (the strange re-execution of “insert into x select from x” was my lazy way of increasing the volume of data from the original one row per partition without having to think too carefully.

After creating the data we report the partition names and high values in order, then drop (and purge) the partition that should hold the value 1st Sept 2023 (which will be the partition with the high_value of 1st Oct 2023) and report the partition names and high values again so that you can see the “gap” in the high values and the adjustment to the partition_position values. Here are the “before” and “after” outputs:

PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39377             TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 7     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

7 rows selected.



PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

6 rows selected.

Now we check the execution plan for a query that would have accessed the partition we’ve just dropped:

explain plan for
select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';

select  *
  from  dbms_xplan.display();


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |   109K|  1173K|   104   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                      |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The execution plans says it’s going to visit partition number 4 (pstart/pstop) – which we know definitely cannot be holding any relevant data. If this were an ordinary range-partitioned table – as opposed to interval partitioned – it would be the correct partition for 1st Sept 2024, of course, but it isn’t, so it feels like the pstart/pstop ought to say something like “non-existent” and all the numeric estimates should be zero.

A quick trick for making an interval partition appear without inserting data into it is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually had the same effect of making Oracle create the partition, so I ran the query against user_tab_partitions again just to check that this hadn’t happend.

So what’s going to happen at run-time: is this an example of “explain plan” telling us a story that’s not going to match what shows up in v$sql_plan (dbms_xplan.display_cursor). Let’s run the query (with rowsource execution stats enabled) and find out:

set serveroutput off
alter session set statistics_level = all;
alter session set "_rowsource_execution_statistics"=true;

select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';


select  *
  from  table(dbms_xplan.display_cursor(format=>'allstats last partition'));


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  d42kw12htubhn, child number 0
-------------------------------------
select  *   from  invoices_partitioned   where invoice_date = date
'2023-09-01'

Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |      1 |        |       |       |      0 |00:00:00.01 |
|   1 |  PARTITION RANGE SINGLE|                      |      1 |    109K|     4 |     4 |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |      0 |    109K|     4 |     4 |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

It’s the same plan with the same “wrong” partition identified, and the same estimate for rows returned – but the access never actually happened: Starts = 0 on the table access full.

My hypothesis about this misleading reporting is that Oracle knows from the table definition everything about every partition that might eventually exist – the high_value for the “anchor” partition is known and the interval is known so the appropriate partition number for any partition key value can be derived [ed: see comment #4 for details]. Then, at some point, a disconnect appears between the theoretical partition position and the set of physically instantiated partitions, so the optimizer gets the message “theoretically it’s in the 4th partition” and collects the stats from “partition_position = 4” to do the arithmetic and produce the plan. [ed: a later thought – the 109K estimate in this particularly simple example is based on {number of rows in table}/ {number of distinct values in table}]

Fortunately there’s some metadata somewhere that means the run-time engine doesn’t try to access the wrong partition, so this little glitch doesn’t really matter for this simple query – beyond its ability to cause a little confusion.

It’s possible, though, that this type of behaviour leaves the optimizer with another reason for getting the arithmetic wrong and picking the wrong path if you have a number of “missing” partitions in an interval partitioned table that you’re querying with a range-based predicate that crosses several (non-existent) partitions. So treat this as a warning/clue if you recognise that pattern in any of your partitioned table.

Comments (8)

December 21, 2023

Descending Bug

Filed under: Bugs,descending indexes,Execution plans,Indexing,Oracle — Jonathan Lewis @ 3:12 pm GMT Dec 21,2023

This is another example of defects in the code to handle descending columns in indexes, with the added feature that the problem is addressed somewhere between 19.4 and 19.10 (it’s present in 19.3, gone in 19.11) – which means that if you upgrade to a recent RU of from some of earlier versions some of your production code may return rows in a different order. On the plus side, it will be the correct order rather than the previously incorrect order. It’s likely that if your code was exhibiting this bug you would have noticed it before the code got to production, so this note is more a confirmation than a realistic warning of a future threat.

The bug originally showed up in a thread on the Oracle developer forum more than a year ago but I was prompted to finish and publish this note after seeing an article on deadlocks by Frank Pachot where the behaviour of his demonstration code could vary with version of Oracle because of this bug.

Here’s some code to create a demonstration data set:

rem
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0       Right order
rem             19.3.0.0        Wrong order
rem             12.2.0.1        Wrong order
rem
rem     Notes
rem     From 12.1.0.2 to ca. 19.3(++?) the optimizer loses a "sort order by" 
rem     operation when a "descending" index meets an in-list iterator.
rem     

create table t1 
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        substr(dbms_random.string('U',6),1,6)   v1,
        rpad('x',100,'x')                       padding
from
        generator
/

alter table t1 modify v1 not null;

update t1 set v1 = 'BRAVO'      where id = 5000;
update t1 set v1 = 'MIKE'       where id = 1000;
update t1 set v1 = 'YANKEE'     where id = 9000;

create index t1_i1 on t1(v1 desc);

I’ve created a table with a column generated as short random strings, then set three rows scattered through that table to specific values, and created an index on that column – but the index is defined with the column descending.

(Reminder: if all the columns in an index are declared as descending that all you’ve done is waste space and introduce an opportunity for the optimizer to go wrong – descending columns in indexes only add value if the index uses a combination of ascending and descending columns).

Here’s a simple query – with the results when executed from SQL*Plus in 12.2.0.1. Note, particularly, the order by clause, the order of the results, and the body of the execution plan:

set serveroutput off

select  v1, id
from    t1
where   v1 in (
                'MIKE',
                'YANKEE',
                'BRAVO'
        ) 
order by 
        v1
;

select * from table(dbms_xplan.display_cursor(format=>'outline'));


V1                               ID
------------------------ ----------
YANKEE                         9000
MIKE                           1000
BRAVO                          5000

3 rows selected.


SQL_ID  6mpvantc0m4ap, child number 0
-------------------------------------
select v1, id from t1 where v1 in (   'MIKE',   'YANKEE',   'BRAVO'  )
order by  v1

Plan hash value: 4226741654

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |       |       |    22 (100)|          |
|   1 |  INLIST ITERATOR              |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN DESCENDING| T1_I1 |    40 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

The most important point, of course, is that the result set is in the wrong order. It’s interesting to note that there is no “sort order by” operation and that the index range scan is described as “descending”. A brief pause for thought suggests that if you do a descending range scan of a “descending index” then the results ought to come out in ascending order which might explain why the optimizer thought it could eliminate the sort operation. However that thought isn’t necessarily valid since the “inlist iterator” means Oracle should be executing “column = constant” once for each value in the list, which would make the ascending/descending nature of the index fairly irrelevant (for this “single-column” example).

When I created the same data set and ran the same query on 19.11.0.0 I got exactly the same execution plan, including matching Predicate Information and Outline Data (apart from the db_version and optimizer_features_enable values, of course), but the result set was in the right order. (It was still wrong in a test against 19.3, so the fix must have appeared somewhere in the 19.4 to 19.11 range.)

Workaround

In this example one of the ways to work around the problem (in 12.2) was to add the index() hint (which is equivalent to the index_rs_asc() hint) to the query, resulting in the following plan (again identical in 12c and 19c):

SQL_ID  6x3ajwf41x91x, child number 0
-------------------------------------
select  /*+ index(t1 t1_i1) */  v1, id from t1 where v1 in (   'MIKE',
 'YANKEE',   'BRAVO'  ) order by  v1

Plan hash value: 1337030419

-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |       |       |    23 (100)|          |
|   1 |  SORT ORDER BY                        |       |     3 |    33 |    23   (5)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |       |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |    40 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

The “index range scan” operation is no longer “descending”, and we have a “sort order by” operation. You’ll note that, thanks to blocking sort operation the table access is now “batched”.

Best Guess

The way Oracle handles an IN-list is to start by reducing it to a sorted list of distinct items, before iterating through each item in turn. Then, if there is an order by clause that matches the order of the sorted in-list, and Oracle can walk the index in the right order then it can avoid a “sort order by” operation.

I’m guessing that there may be two separate optimizer strategies in the “descending columns” case that have collided and effectively cancelled each other out:

Hypothetical Strategy 1: If there is a “descending index” that can be range scanned for the data the in-list should be sorted in descending order before iterating. (There is a flaw in this suggestion – see below)
Hypothetical strategy 2: Because the query has an order by (ascending) clause the index scan should be in descending order to avoid a sort operation.

The flaw in the first suggestion is that the Predicate Information suggests that it’s not true. This is what you get in every case (though the operation number changes to 4 when the plan includes a “sort order by” operation):

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00004$"=HEXTORAW('BDADBEA9B0FF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('B2B6B4BAFF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('A6BEB1B4BABAFF')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='BRAVO' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='MIKE' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='YANKEE'))

As you can see the values appearing in the access() predicate are the one’s complements of BRAVO, MIKE and YANKEE in that order; in no case was the order reversed, and previous experience says that predicates are used in the order they appear in the Predicate Information.

On the other hand, it’s arguable that the three predicate values should have been reported (in some form) at the inlist iterator operation – so this may be a case where the simplest strategy for presenting the plan doesn’t match the actual activity of the plan.

Post script

If I change the unhinted query to “order by v1 desc” the rows are reported in ascending order in 12.2.0.1, but in the correct descending order in 19.11.

Comments (3)

November 20, 2023

gby_pushdown

Filed under: CBO,Execution plans,Hints,Oracle,Parallel Execution — Jonathan Lewis @ 6:42 pm GMT Nov 20,2023

This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.

It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.

I’ll start with a script to create a simple data set and a trivial query with a parallel hint:

rem
rem     Script:         gby_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem

set linesize 180
set pagesize 60
set trimspool on
set tab off

create table t1 
as 
select 
        * 
from 
        all_objects 
where 
        rownum <= 50000
/


alter session set statistics_level = all;

set serveroutput off

prompt  ==================================================
prompt  Default (for this data) pushdown and hash group by
prompt  ==================================================

set feedback only

select 
        /*+ 
                qb_name(main)
                parallel(t1 2) 
--              no_gby_pushdown(@main)
--              no_use_hash_gby_for_pushdown(@main)
        */ 
        owner, object_type, count(*) ct
from 
        t1 
group by 
        owner, object_type
order by
        owner, object_type
/

set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));

In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.07 |       5 |      0 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.01 |       0 |      0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    121 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |    276 |    77   (8)|    121 |00:00:00.04 |    1043 |    991 |  1079K|  1079K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |    991 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.01 |    1043 |    991 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.

If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.

After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)

With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.

There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:

lan hash value: 3954212205

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.14 |       5 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.14 |       5 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.03 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    120 |00:00:00.03 |       0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       SORT GROUP BY      |          |      2 |    276 |    77   (8)|    120 |00:00:00.13 |    1043 |  9216 |  9216 |     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |    1043 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.

There is an interesting detail to look out for, though, in the Outline Data and Hint Report:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   1 -  MAIN
           -  qb_name(main)

   8 -  MAIN / T1@MAIN
           -  parallel(t1 2)

I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)

The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    77 (100)|     96 |00:00:00.08 |       5 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     96 |00:00:00.08 |       5 |  6144 |  6144 |     1/0/0|
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY        |          |      2 |    276 |    77   (8)|     96 |00:00:00.11 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE          |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |  50000 |    73   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |  50000 |    73   (3)|  50091 |00:00:00.02 |    1051 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T1       |     28 |  50000 |    73   (3)|  50091 |00:00:00.01 |    1051 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
   1 -  MAIN
           -  no_gby_pushdown(@main)
           -  qb_name(main)

   7 -  MAIN / T1@MAIN
           -  parallel(t1 2)

Key details to highlight here are:

There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
The no_gby_pushdown(@main) does appear in the Hint Report
The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).

Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.

Summary

When using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.

You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.

Comments (4)

November 6, 2023

Swap_Join_Inputs

Filed under: Column groups,Hints,Joins,Oracle,Statistics — Jonathan Lewis @ 2:47 pm GMT Nov 6,2023

This is just a short note (I hope) prompted by a conversation on the Oracle-L list server. A query from a 3rd party application was doing a very large full tablescan with hash join when it should have been doing a high precision index driven nested loop join, and the poor choice of plan was due an optimizer defect when handling column groups (fixed in 23c) when one or more of the columns involved is always, or nearly always, null.

As a follow-up the owner of the problem asked what hints should go into an SQL Patch to make the optimizer use the nested loop. There’s a simple guideline that usually works for this type of “wrong join” problem: report the “Outline Data” from the current execution plan; find the relevant join hint(s) (in this case a use_hash() hint and a full() hint), change those join hint(s) (e.g. use_nl(), index()), and write the entire edited outline data into an SQL Patch watching out for a recently fixed defect in the SQL patch code.

There are, however, various refinements that add complexity to this strategy, as you can appreciate from a note I wrote some years ago about how to hint a hash join properly. This note is an example of handling one such refinement.

The query was a little complex, and the optimizer had unnested a subquery that consisted of a union all of 4 branches, and one of those branches had contributed a very large cardinality estimate to the total for the view, so the optimizer had chosen a hash join between the unnested subquery and a large table. Stripped to a bare minimum that part of the plan looked like this:

HASH JOIN
      VIEW                   VW_NSO_1
      TABLE ACCESS FULL      PO_LINE_LOCATIONS_ALL

A quick scan of the Outline Data found the join hint (converted to lower case, with quotes removed): use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92), and an over-hasty response might be to convert the use_hash to a use_nl and leave it at that – except for three possible warnings:

we wanted to see a nested loop into po_line_locations_all, so the suggested use_nl() hint would be looping into the wrong “table”
the plan shows the view vw_nso_1 as the build table, while the reported hint is about vw_nso_1 being the second table in the join order
there are further references to vw_nso_1 (and to po_line_locations_all) in the outline data

Here’s the complete set of original hints that might have been relevant to this particular part of the plan:

full(@sel$ac90cd92 po_line_locations_all@sel$2)
no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 po_line_locations_all@sel$2 vw_nso_1@sel$ac90cd92)
use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
swap_join_inputs(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)

index_rs_asc(@sel$b584fdd1 po_line_locations_all@sel$16 (......))
batch_table_access_by_rowid(@sel$b584fdd1 po_line_locations_all@sel$16)

index_rs_asc(@sel$5ed1c707 po_line_locations_all@sel$13 (......))
batch_table_access_by_rowid(@sel$5ed1c707 po_line_locations_all@sel$13)

index_rs_asc(@sel$2f35337b po_line_locations_all@sel$10 (......))
batch_table_access_by_rowid(@sel$2f35337b po_line_locations_all@sel$10)

index_rs_asc(@sel$1b7d9ae9 po_line_locations_all@sel$7 (......))

This is where knowing about the “Query Block / Object Alias” section the execution plans is important – I’ve split the list into several parts based on the query block (@sel$xxxxxxxxxx) they are aimed at, and it’s only the first 5 we need to worry about.

Conveniently this part of the plan is a self-contained query block (@sel$ac90cd92) and we can see why we have an apparent contradiction between vw_nso_1 being the second table in the join order while being the build table: it’s second because of the leading() hint which dictates the join order, but it becomes the build table, hence appearing to be the first table in the join order, because of the swap_join_inputs() hint.

What we want is a join order where vw_nso_1 really is the first table in the join order, followed by a nested loop join into po_line_locations_all, using an index (not the full tablescan that the current hints dictate). It would probably be a good idea to get rid of the redundant no_swap_join_inputs() hints at the same time because that hint applies only to hash joins. So I think we need to replace the 5 hints above with the following 4 hints:

no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 vw_nso_1@sel$ac90cd92 po_line_locations_all@sel$2)
use_nl(@sel$ac90cd92 po_line_locations_all@sel$2)
index(@sel$ac90cd92 po_line_locations_all@sel$2(line_location_id))

The index hint references column line_location_id because that’s the access predicate used in original hash join and I’m assuming that there is an index that starts with that column. It’s always a bit hit and miss with hinting and it might have been sufficient (as a first attempt) to use the index hint without trying to reference a specific index, and there might be good reasons for adding more columns to the list, or simple naming the index rather than describing it.

It’s quite likely that if this change in the hints is sufficient the resulting Outline Data would look a little different anyway; in particular the index() hint that I’ve suggested might get expanded to index_rs_asc(), and there might be a batch_table_access_by_rowid() added. Basically you do test runs until you get the result you want and then use the resulting Outline Data for the patch (although, occasionally, you still find that the Outline Data doesn’t reproduce the plan that it came from).

Frivolous Footnote

There were 75 hints totalling 3,191 bytes in the original Outline Data. If the text gets too long and messy for you to cope with when you create the patch you can probably remove all the double quotes, all the table names from the fully qualified column names in indexes, all the outline() and outline_leaf() hints, all the opt_param() hints that reflect system level parameter changes and the begin/end_outline_data hints and the ignore_optim_embedded_hints hint. You could also change long index descriptions to index_names and, if you’re like me, change it all to lower case anyway because I hate reading in capitals – and if you do change it all to lower case you have to remove the quotes. When I did all this to the original outline data the result was 1,350 bytes for 30 hints.

Comments (3)

November 1, 2023

Push Group by

Filed under: Execution plans,Hints,Oracle,Transformations — Jonathan Lewis @ 10:40 am GMT Nov 1,2023

Jump to summary.

A new optimizer feature that appears in 23c (probably not 21c) was the ability to push group by clauses into union all set operations. This will happen unhinted, but can be hinted with the highly memorable [no_]push_gby_into_union_all() hint that appeared in 23.1.0.0 according to v$sql_hint. and the feature can be disabled by setting the (equally memorable) hidden parameter _optimizer_push_gby_into_union_all to false.

From a couple of simple experiments it looks as if the hint should be used to identify query blocks where you want an aggregation (group by) that appears “outside” a union all (inline) view to happen “inside” the view. Here’s a trivial demonstration that I’ve run on 23.3

rem
rem     Script:         push_gby_ua.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             23.3.0.0
rem

create table t1 
as 
select  *
from    all_Objects 
where   rownum <= 50000
;

set linesize 156
set pagesize 60
set trimspool on
set tab off
set serveroutput off

alter session set statistics_level = all;

select
        /*+
                -- qb_name(main)
                push_gby_into_union_all(@sel$2)
                no_push_gby_into_union_all(@sel$3)
        */
        owner, count(*)
from    (
        select /*  qb_name(u1) */ owner from t1 where owner = 'SYS'
        union all
        select /*  qb_name(u2) */ owner from t1 where owner = 'PUBLIC'
)       
group by owner
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost outline alias hint_report qbregistry qbregistry_graph '));

All I’ve done here is create a table that copies 50,000 rows from the view all_objects, then executed a query that reports the number of objects for owners SYS and PUBLIC by selecting the two sets of objects separately and aggregating a union all of those sets.

For maximum visibility I’ve shown the positive and negative versions of the hint – the aggregation doesn’t have to apply to all the branches of the view and it’s not unknown for the optimizer to make the wrong choices if it hasn’t managed to produce a good cost estimate.

I had to know that the two subqueries inside the view would be named sel$2 and sel$3 by default, and I’ve used these names in the hints. In general it can be quite hard to work out the internal names of query blocks and you’ll note that I had used the qb_name() hint for the two subqueries (and the main query block) but then turned those hints into comments because the qbregistry_graph format I’d used in my call to dbms_xplan can’t cope with user-supplied query block names (yet). [Correction: I must have done something wrong when I was using the qb_name hints. I’ve re-run the tests and the graph comes out correctly]

Here’s the execution plan (with some of the bits removed) that I got from 23.3 for this test:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |   295 (100)|      2 |00:00:00.03 |    2232 |   1114 |
|   1 |  HASH GROUP BY          |      |      1 |     15 |   295   (4)|      2 |00:00:00.03 |    2232 |   1114 |
|   2 |   VIEW                  |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   3 |    UNION-ALL            |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   4 |     SORT GROUP BY NOSORT|      |      1 |      1 |   147   (3)|      1 |00:00:00.02 |    1116 |   1114 |
|*  5 |      TABLE ACCESS FULL  | T1   |      1 |   3333 |   147   (3)|  39724 |00:00:00.01 |    1116 |   1114 |
|*  6 |     TABLE ACCESS FULL   | T1   |      1 |   3333 |   147   (3)|   9287 |00:00:00.01 |    1116 |      0 |
----------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$FC1F66D1")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SET$22FBD6DA")
      PUSH_GBY_INTO_UNION_ALL(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SET$1")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" GROUP_BY)
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$FC1F66D1" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("OWNER"='SYS')
   6 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   4 -  SEL$FC1F66D1
           -  push_gby_into_union_all(@sel$2)

   6 -  SEL$3
           -  no_push_gby_into_union_all(@sel$3)

It’s interesting to note that the Hint Report tells us that both my hints were valid (and used); but the Ouline Data echoes only one of them (the “positive” push_gby_into_union_all). Because I’ve used the same table twice it’s not instantly clear that the optimizer has pushed the subquery that I had specified but if you check the Predicate Information you can confirm that the SYS data has been aggregated inside the union all and the PUBLIC data has been passed up to the union all operator without aggregation. (In the absence of the hints both data sets would have been aggregated early.)

Here, in comparison, is the plan (slightly reduced, and with the qbregistry options removed) that I got from 19.11.0.0

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   265 (100)|      2 |00:00:00.09 |    1986 |    991 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |     13 |   265   (4)|      2 |00:00:00.09 |    1986 |    991 |  1422K|  1422K|  653K (0)|
|   2 |   VIEW               |      |      1 |   7692 |   263   (3)|  48446 |00:00:00.07 |    1986 |    991 |       |       |          |
|   3 |    UNION-ALL         |      |      1 |        |            |  48446 |00:00:00.06 |    1986 |    991 |       |       |          |
|*  4 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|  42034 |00:00:00.02 |     993 |    991 |       |       |          |
|*  5 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|   6412 |00:00:00.01 |     993 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OWNER"='SYS')
   5 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         E -  push_gby_into_union_all

As you can see, 19.11 treats the hint as an error and both subqueries against t1 pass their rows up to the union all without aggregation. The 19.11 plan also gives you some idea of why it can be worth pushing the group by: 23.3 doesn’t report any memory used for either of the aggregation operations that take place while the postponed (or, rather, unpushed) aggregation in 19.11 reports 1.4M of memory used. As a general principle we might expect several small aggregations have a lower peak of memory usage than one large aggregation. There’s also a CPU benefit when Oracle doesn’t have to push lots of rows up through a couple of operations.

In fact the absence of memory-related columns in the 23.3 plan is a little suspect and I may have to examine it further. It may simply be the case that the size of the “small allocation” that doesn’t get reported in earlier versions has been increased to (best guess) 1MB; it may be that dbms_xplan in 23c has got a little bug that omits that part of the report.

Summary

Oracle 23c has a new transformation that will probably help to reduce memory and CPU consumption when it comes into play. Queries that aggregate over union all views may change plans to push the aggregation into some or all of the separate subqueries inside the union.

The feature is cost-based but you can over-ride the optimizer’s choice of which subqueries should be aggregated early with the hint [no_]push_gby_into_union_all(@qbname). The feature can also be disabled completely by setting the hidden parameter _optimizer_push_gby_into_union_all to false.

Addendum

It occurred to me that the optimizer will transform an IN-list to a list of equalities with OR, and it’s also capable of using OR-expansion then there might be cases where an aggregate based on an IN-list could go through the two steps and then benefit from this new feature, for example:

select
        sts, count(*) ct
from    t1
where   sts in ('B','C')
group by
        sts
/

-------------------------------------------------------------------------------
| Id  | Operation	     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	      |       |       |     3 (100)|	      |
|   1 |  SORT GROUP BY NOSORT|	      |     2 |     4 |     3	(0)| 00:00:01 |
|   2 |   INLIST ITERATOR    |	      |       |       | 	   |	      |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |   100 |   200 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))

Alas, no. Although we can see the rewrite of the IN-list the optimizer doesn’t then use OR-expansion. And when I added the hint /*+ or_expand */ to try to push Oracle into the right direction the Hint Report told me:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -	SEL$1
	 U -  or_expand / No valid predicate for OR expansion

Maybe in the next release.

Comments (3)

October 27, 2023

SQL_trace 23

Filed under: Execution plans,Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 12:46 pm BST Oct 27,2023

That’s not the 23rd article I’ve written on sql_trace, it’s just that there’s a little “catch-up” detail about sql_trace that I recently discovered in 23.2 (though it might also be working in 21c or even in recent versions of 19c).

Although level 4 and 8 (binds and waits) of the SQL trace event are commonly known, and the more modern style of calls to sql_trace with the added plan_stat options are fairly well known it’s only in 23c (possibly 21c) that all the plan_stat options get implemented in dbms_session and dbms_monitor. From some time back in the 11g timeline we’ve been able to execute calls like:

alter system set events '
        sql_trace[SQL:9tz4qu4rj9rdp]
        bind=false,
        wait=true,
        plan_stat=all_executions
'
;

Taking advantage of the “oradebug doc” command (which can only be run by a highly privileged user), you could discover the available options for the plan_stat:

SQL> oradebug doc event name sql_trace

sql_trace: event for sql trace

Usage
-------
sql_trace
   wait       < false | true >,
   bind       < false | true >,
   plan_stat  < never | first_execution | all_executions | adaptive >,
   level      <ub4>

The default value for plan_stat is first_execution which is usually adequate, but if you were trying to find out why a particular query sometimes runs very slowly compared to usual you might want to set it to all_executions. If the query executes extremely frequently, though, and produces a fairly length execution plan you might decide to set the value to adaptive (see end notes) which ought to limit the frequency with which the execution plan is dumped into the trace file.

Note: the threat of very long plans for very quick executions of an extremely popular statement comes from two directions – the extra “execution” time to get the plan written to the trace file, and the total size of all the files dumped to the trace directory. Depending on your requirements you might be able to use the “UTS trace” setup to limit the volume kept on disc.

If you check the packages dbms_session and dbms_monitor you will find the following procedures:

QL> desc dbms_session
...
PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

SQL> desc dbms_monitor
...
PROCEDURE CLIENT_ID_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 CLIENT_ID                      VARCHAR2                IN
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE DATABASE_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SERV_MOD_ACT_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SERVICE_NAME                   VARCHAR2                IN
 MODULE_NAME                    VARCHAR2                IN     DEFAULT
 ACTION_NAME                    VARCHAR2                IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSION_ID                     BINARY_INTEGER          IN     DEFAULT
 SERIAL_NUM                     BINARY_INTEGER          IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

Chris Antognini wrote an article some years ago listing the “10046 trace” levels and the relationships between the available numeric levels and the named parameter format (adaptive = 64). In the note he pointed out that adaptive was not available as a plan_stat value in dbms_session and dbms_monitor, and that Enhancement Request #14499199 had been raised to correct this omission. Since then I’ve run a little script from time to time to check if the ER has been fulfilled:

olumn  sid     new_value my_sid
column  serial# new_value my_serial

select  sid, serial#
from    v$session       ses
where   ses.sid = (
                select  ms.sid
                from    v$mystat ms
                where   rownum = 1
        )
;

execute dbms_monitor.session_trace_enable( &my_sid, &my_serial, waits=>false, binds=>false, plan_stat=>'adaptive')

execute dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive');

execute dbms_monitor.session_trace_disable( &my_sid, &my_serial)
execute dbms_session.session_trace_disable;

This depends on the user having suitable privileges on the packages and on a couple of dynamic performance views, and all that happens is that it succeeds (or fails) to enable tracing. Here’s the output from 19.11.0.0 of the two calls to enable:

BEGIN dbms_monitor.session_trace_enable(         40,      56799, waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_MONITOR", line 123
ORA-06512: at line 1


BEGIN dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_SESSION", line 343
ORA-06512: at line 1

Then on the upgrade to 23.2.0.0, the boring but satisfying:

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

Readers with versions in the gap between 19.11.0.0 and 23.2.0.0 are invited to run a check and report the results in the comments to narrow down when the enhancement became available. [Update: see comments #1 and #2 – versions up to 19.21 show the same error; and comment #5 shows the same error in 21.3]

plan_stat=adaptive

There are a couple of notes on MOS about the meaning/effect of this setting. They’re not 100% consistent with each other.

Doc ID 21154.1 says:

This dumps the STAT information if a SQL took more than about 1 minute thereby giving information for the more expensive SQLs and for different executions of such SQLs.

Doc ID 8328200 (relating to Bug 8328200) says:

“adaptive” mode dumps STAT lines in SQL trace for every minute of dbtime per shared cursor.

As you can see you can read the two statements as trying to say the same thing, but there is some ambiguity in both statements, and some need for clarification of terms. So I’ve done a couple of simple experiments – basically running a PL/SQL loop that executes the same statement 120 times, where each execution takes a little less 3 seconds to complete.

Cutting the top and tail from each trace file left me with 120 “FETCH” lines; a few of these lines were immediately followed by STAT lines, most were followed by CLOSE lines with no STAT lines reported for the execution. Here are the first few lines of the results from a trace file generated by 23.2 from a call to: “grep -A+1 FETCH {tracefile}”:

FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
STAT #140175454862744 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=277822 pr=0 pw=0 str=1 time=2814245 dr=0 dw=0 us)'
--
FETCH #140175454862744:c=2758633,e=2885235,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36479173822
CLOSE #140175454862744:c=2,e=1,dep=1,type=3,tim=36479173972
--

Piping this into “grep -B+1 STAT” extracted just the FETCH/STAT pairs, which I could then pass through “grep FETCH” to get the output that I wanted to see: the “tim=” values for only those FETCH calls that were followed by STAT lines:

grep -A+1 FETCH {tracefile}  | grep -B+1 STAT  |  grep FETCH
 
FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
FETCH #140175454862744:c=2716296,e=2782323,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36541290925
FETCH #140175454862744:c=2804165,e=2916326,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36610338064
FETCH #140175454862744:c=2677000,e=2737197,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36676533249
FETCH #140175454862744:c=2722436,e=2796966,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36744048403
FETCH #140175454862744:c=2725066,e=2792661,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36811536449

If you do the arithmetic with the tim= values you’ll find that the time between the executions that reported their execution plan in STAT lines varies from about 65 seconds to 69 seconds. The results from 19.11 were similar, varying from 64.9 seconds to 67.9 seconds. So the dumping STAT lines is dependent on spending (approximately) one minute of dbtime in execution (as both MOS Documents indicate) but it’s once every minute, and it’s time that can be accumulated over many executions of the statement. Part of the deviation from one minute comes, of course, from the fact the STAT lines are dumped only as the cursor closes.

The other detail that needs investigation is the interpretation of the clause “per shared cursor”. If I ran my test from two separate sessions concurrently would I be accumulating “db minutes” twice as fast and dumping STAT lines twice as frequently? The answer was yes – providing the two sessions acquired the same child cursor in v$sql.

To demonstrate this, I ran two concurrent copies of the PL/SQL loop, starting one session about 65 seconds after the other. My first trace file showed its second set of STAT lines after about 63 seconds then roughly every 33 seconds, and the second trace file (starting about one minute late) showed its second set of STAT lines after about 33 seconds, repeating that interval between dumps until nearly the end of its run when the last gap (after the first session had finished) stretched to 50 seconds.

Is it good enough?

Maybe, maybe not. It gives you a sample of execution plans for queries that accumulate “a lot” of time while using a particular child cursor, and that might be sufficient for some purposes.

If you have a frequently executed query that usually takes a couple of seconds but occasionally takes 90 seconds then that 90 second execution will show up in the trace file, either because it’s a different plan, and gets dumped on the first execution, or because it will have increased the shared execution time by more than 60 seconds.

If you have a query that executes extremely frequently and takes a couple of hundredths of a second each time but occasionally takes 5 seconds the adaptive option may not help. As with the 90 second/change of plan case you’ll see the plan; but all it does if the plan doesn’t change is improve your chances of seeing the stats of that slow execution plan – it has to take place in that little window where its execution time just tips the running total over the next 60 second limit.

Comments (7)

October 18, 2023

Remove Subquery

Filed under: CBO,Execution plans,Oracle,subqueries,Transformations — Jonathan Lewis @ 2:33 pm BST Oct 18,2023

This is a note that echoes a feature (dating back at least as far as 10.2.0.4) that I’ve described in a previous post on Simpletalk. I’m raising it here for three reasons

first it says something about optimising SQL by rewriting it
secondly it advertises the hugely under-used feature of analytic functions
thirdly I’ve not seen anyone in the various forums asking about a particular pattern of surprising execution plans that they couldn’t understand

This last reason makes me wonder whether there are lots of queries in the wild that need a small “cosmetic” change to allow the optimizer to transform them into something completely different.

My example is based on the SH demo schema – the query I’ll be demonstrating came (I think) from a presentation given by Jože Senegačnik about 15 years and 7 major versions ago (10gR2) – so I’ll start with a bit of text to recreate a couple of tables from that schema as it was a few years ago.

rem
rem     Script:         remove_aggr_subq.sql
rem     Author:         Joze Senegacnik / Jonathan Lewis
rem     Dated:          June 2008
rem

create table products(
        prod_id                 number(6,0)     not null,
        prod_name               varchar2(50)    not null,
        prod_desc               varchar2(4000)  not null,
        prod_subcategory        varchar2(50)    not null,
        prod_subcategory_id     number          not null,
        prod_subcategory_desc   varchar2(2000)  not null,
        prod_category           varchar2(50)    not null,
        prod_category_id        number          not null,
        prod_category_desc      varchar2(2000)  not null,
        prod_weight_class       number(3,0)     not null,
        prod_unit_of_measure    varchar2(20),
        prod_pack_size          varchar2(30)    not null,
        supplier_id             number(6,0)     not null,
        prod_status             varchar2(20)    not null,
        prod_list_price         number(8,2)     not null,
        prod_min_price          number(8,2)     not null,
        prod_total              varchar2(13)    not null,
        prod_total_id           number          not null,
        prod_src_id             number,
        prod_eff_from           date,
        prod_eff_to             date,
        prod_valid              varchar2(1),
        constraint products_pk primary key (prod_id)
   )
;

create table sales (
        prod_id         number          not null,
        cust_id         number          not null,
        time_id         date            not null,
        channel_id      number          not null,
        promo_id        number          not null,
        quantity_sold   number(10,2)    not null,
        amount_sold     number(10,2)    not null,
        constraint sales_product_fk foreign key (prod_id)
                references products (prod_id)
   ) 
;

It’s a long time since I loaded, or even looked at, the SH schema but I assume the key details that I need will still be the same. All I’ve got is a products table with a declared primary key of prod_id, and a sales table with a prod_id column declared as not null with a foreign key constraint to the products table. Both tables have not null declarations on most columns.

Imagine writing a query to report all sales where the quantity_sold is less than the average quantity_sold for the corresponding product. The “obvious” choice of SQL for this would be something like:

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products    prd,
        sales       sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = sal1.prod_id
                )
;

.I’ve used hints to block dynamic sampling and adaptive plans, and I’ve used the qb_name() hint to name the two query blocks. The subquery calculates the average quantity_sold for the correlated prod_id and we’re probably assuming Oracle will execute the subquery for each row in the sales (sal1) table with savings from scalar subquery caching – especially if we were to create a “value-added” foreign key index of (prod_id, quantity_sold)). Here’s the execution plan I got from the query in its current form:

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN            |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN           |          |     4 |   244 |     5  (20)| 00:00:01 |
|   3 |    VIEW               | VW_SQ_1  |    82 |  2132 |     3  (34)| 00:00:01 |
|   4 |     HASH GROUP BY     |          |    82 |  2132 |     3  (34)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| SALES    |    82 |  2132 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL  | SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS FULL   | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - access("ITEM_1"="SAL1"."PROD_ID")
       filter("SAL1"."QUANTITY_SOLD"<"AVG(SAL2.QUANTITY_SOLD)")

The optimizer has decided it would make sense to unnest the subquery, generate an aggregate rowsource of the sales data, then eliminate the unwanted sales rows through a hash join to this aggregate rowsource. In passing, take note particularly of the Predicate Information for operation 2 – its’ a reminder that hash joins apply only for equality predicates so the check against average quantity_sold has to take place as a filter predicate after Oracle has found the correct average by probing the build table.

This unnesting will be appropriate for many cases of subquery usage but we could block it and force Oracle to do something that looked more like our original “for each row” visualisation by adding the hint no_unnest(@subq) to the hints at the top of the query (note how we can address the hint to a specific query block). The effect of this is to produce the following execution plan:

--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     2 |   150 |    86   (0)| 00:00:01 |
|*  1 |  FILTER             |          |       |       |            |          |
|*  2 |   HASH JOIN         |          |    82 |  6150 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  6 |    TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ NO_UNNEST QB_NAME
              ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SOLD")
              FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   2 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   6 - filter("SAL2"."PROD_ID"=:B1)

Now we can see that Oracle is using the subquery as a “filter subquery” as we had imagined it. For each row surviving the simple hash join between products and sales Oracle will execute the subquery (unless the relevant information is already in the scalar subquery cache). A little detail that may surprise users who are less familiar with execution plans is the appearance of the bind variable (:B1) in the predicate for operation 6 – this is the optimizer reminding you that the correlating predicate in the subquery uses a value that will be unknown until run-time when it arrives (repeatedly with constantly changing values) from the main query block.

Again, we can understand that this pattern will probably be suitable for some circumstances, but we may want to control where in the plan the subquery is used – for some queries it might be more efficient to execute the subquery before we do the join. We can tell the optimizer to do this by adding the hint push_subq(@subq) after the no_unnest(@subq) hint, in my case producing the following plan:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     4 |   300 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN           |          |     4 |   300 |     4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | SALES    |     4 |   140 |     2   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  4 |     TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - MAIN / SAL1@MAIN
   3 - SUBQ
   4 - SUBQ / SAL2@SUBQ
   5 - MAIN / PRD@MAIN

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ PUSH_SUBQ NO_UNNEST
              QB_NAME ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SO
              LD") FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   4 - filter("SAL2"."PROD_ID"=:B1)

The first thing that stands out in this plan is that the sales table has become the build (first) table in the hash join and the products table has become the probe table. (In fact this is mainly because (a) there’s no data/statistics, (b) I’ve blocked dynamic sampling, and (c) Oracle has used a few of its standard guesses.)

Another point that stands out is the “staggered” position of operation 3. I’ve included the ‘alias’ format option for the execution plan so that you can see that operations 3 and 4 represent the original subq query block with no transformation. Filter predicates using subqueries often produce odd little “twists” in execution plans which would be hard to explain if you followed the basic “first child first” rule and forgot to check whether there were any separate query blocks that needed to be walked in isolation.

“Cosmetic” effects

I said the query I started with was the “obvious” choice. I didn’t offer any justification for “obvious”, but the query shows an almost a perfect translation of the condition “sales quantity greater than the average sales quantity for the matching product”, and correlating on the prod_id from the sales row you’re looking at (i.e. sal1.prod_id) seems a highly intuitive choice.

However – prod_id is a foreign key to the products table, and the main query block includes the demand/predicate “sal1.prod_id = prd.prod_id” so, based on transitive closure, all the not null constraints, and the foreign key constraint, we should be happy to make the following, logically valid, minor edit to the original query (and it’s so minor I’ve highlighted the critical line in case you miss it):

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products prd, sales sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = prd.prod_id
                )
;

Here’s the resulting execution plan following this apparently trivial and logically irrelevant change:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    82 |  6068 |     5  (20)| 00:00:01 |
|*  1 |  VIEW                | VW_WIF_1 |    82 |  6068 |     5  (20)| 00:00:01 |
|   2 |   WINDOW SORT        |          |    82 |  7134 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN         |          |    82 |  7134 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| SALES    |    82 |  3854 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VW_COL_5" IS NOT NULL)
   3 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")

The plan been reduced to a simple two table join, lost the correlated subquery, and has signs of an analytic (window) function being used somehow to handle the comparison with the average. This is a feature enabled by the (very old) parameter _remove_aggr_subquery which defaults to true.

It’s not immediately visible what the optimizer is doing – the predicate vw_col_5 is not null clearly relates to the internal view vw_wif_1 (Windows Inline Function?), but we have to look at the appropriate “unparsed query” from the CBO/10053 trace to find out why that predicate gives us the right answer. This, with a lot of cosmetic editing, is the transformed query that Oracle finally optimised:

select 
        /*+ qb_name (main) */
        vw_wif_1.item_1         prod_id,
        vw_wif_1.item_2         prod_name,
        vw_wif_1.item_3         time_id,
        vw_wif_1.item_4         quantity_sold 
from    (
        select 
                prd.prod_id             item_1,
                prd.prod_name           item_2,
                sal1.time_id            item_3,
                sal1.quantity_sold      item_4,
                case 
                        when    sal1.quantity_sold < avg(sal1.quantity_sold) over ( partition by sal1.prod_id) 
                        then    sal1.rowid 
                end                     vw_col_5 
        from
                test_user.sales         sal1,
                test_user.products      prd 
        where 
                sal1.prod_id = prd.prod_id
        )       vw_wif_1 
where 
        vw_wif_1.vw_col_5 is not null
;

It’s interesting, and a little surprising, that the code uses a case clause to generate a rowid which is then tested for null. It seems to add complexity that a person would not introduce if you had asked them to do the rewrite by hand, but maybe it comes from a generic framework that addresses more subtle examples.

I said at the start of the note that I’ve not seen anyone asking about this “surprising” pattern of plan; that wasn’t quite true. I have found one question on the Oracle forums dated 2012 using version 10.2.0.4. It’s been possible for a long time – so possibly the fact that questions are so rare is that the correct choice of correlating predicate is rarely made and the “intuitive” choice doesn’t allow the transformation to take place. (It’s also possible that it appears so rarely because so few systems make proper use of constraints.)

Most significantly, though, it’s been possible to write this type of windowing code by hand since version 8.1.6, but it’s only in the last few years that responses to questions about max(subquery), avg(subquery) etc. have been suggestions to rewrite with analytic functions. Maybe some of the answers should have been “change the correlating column”.

Further Observations

As with all optimisation strategies, it’s possible that Oracle will use this feature when it shouldn’t or fail to use it when it should. Unfortunately there is no hint like /*+ [no_]remove_aggr_subq(@subq) */ though aggregate subquery removal seems to be embedded with subquery unnesting, so the /*+ no_unnest() */ hint might be sufficient to block the feature; I don’t know what you can do, though, to force it to happen if you think it’s legal but isn’t happening – other than rewriting the query by hand, of course.

If the no_unnest() hint doesn’t work you could use the hint: /*+ opt_param(‘_remove_aggr_subquery’,’false’) */ to disable the feature complete for the duration of the query, and you may find that that even works as an SQL Patch.

If you’re interested in CBO trace files, the mnemonic (which doesn’t appear in the legend near the start of the trace) for the feature is “RSW” (maybe “remove subquery windowing”?). There aren’t many explicit strings relating to this mnemonic in the binary, but the following are visible:

RSW: Not valid for subquery removal %s (#%d)
RSW:  Valid for having clause subquery removal %s (#%d)
RSW:  Valid for correlated where clause SQ removal %s (#%d)
RSW:  Valid for uncorrelated where clause SQ removal %s (#%d)

My example falls into the category: “Valid for correlated where clause SQ removal”, so there’s scope for someone else to discover what a query, it’s plan, and its unparsed query look like when a query falls into one of the other two categories.

Summary

For certain patterns of query involving aggregate subqueries as predicates the optimizer has been able to use a feature known as aggregate subquery removal to effect the intent of the subquery through the use of an analytical (window) function.

This feature has been available since (at least) 10.2.0.4 and you may already have several queries where the optimizer would use it if the correlating predicate was appropriate – in particular if the predicate is currently joining to the child end of a foreign key constraint then you need to test the effect of joining it to the parent end.

Comments (2)

October 5, 2023

CTE Upgrade

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 4:17 pm BST Oct 5,2023

The “common table expression” (CTE) also known as “with clause” or “factored subquery” has been the target of an optimizer upgrade in recent versions of Oracle that may cause a problem for some people – including, possibly, a few of my former clients and readers who may have adopted a suggestion I made for working around particular performance problems.

It was a recent video by Connor McDonald that brought the change to my attention so I’m going to use parts of his script to demonstrate the effect. We start with a very simple table, and a requirement to avoid an Oracle error:

rem
rem     Script:         with_clause_pushdown.sql
rem     Author:         Connor McDonld / Jonathan Lewis
rem     Dated:          Oct 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.2.0.0        Pushdown 
rem             21.8.0.0        Pushdown (Connor)
rem             19.21.0.0       Pushdown (Connor)
rem             19.11.0.0       No pushdown
rem

drop table t purge;

create table t as
select * from dba_objects
/

insert into t ( object_id, owner, object_name)
values (-1,'BAD','BAD');
commit;

prompt  =====================================
prompt  The original query crashes (probably)
prompt  =====================================

select  count(object_name)
from    t
where
        object_id > 0
and     sqrt(object_id) between 1 and 10
/

I have a query that will fail if it tries to take the square root for the row where object_id = -1 (Oracle hasn’t implemented complex numbers, or even imaginary ones). But the query should be okay because I’ve got a predicate that filters out all the rows where the object_id is not positive. Try running the query, though, and you’ll (probably) find that Oracle responds with “ORA-01428: argument ‘-1’ is out of range”.

The optimizer decides the order it wants to apply the predicates so you can’t guarantee that the order you need will be the order used. This used to be a problem that showed up fairly frequently on various Oracle forums at a time when applications were often guilty of storing numeric data in character columns and ran into problems with queries that had predicates like “character_column = numeric_constant” These would sometimes fail because of an implicit to_number() being applied in a row where the column value was not numeric, resulting in the ORA-01722 conversion error. There used to be several ways to bypass this type of problem, none of them particularly desirable but all perfectly adequate as temporary (one hoped) workarounds. Here’s one such workaround:

with pos as (
        select /*+ materialize */ *
        from   t
        where  object_id > 0 
) 
select  count(object_name)
from    pos
where   sqrt(object_id) between 1 and 10
/

We put the “protective” predicate into a “with” subquery and materialize the subquery so that the call to sqrt() is applied to an (internal) global temporary table that holds only the “safe” rows. When Connor demonstrated this method the query crashed with the ORA-01428 (sqrt(-1) problem) that the original query had exhibited. This led Connor to extol the virtues of avoiding dirty tricks to fool the optimizer because an enhancement might appear that made the dirty trick fail, and then he demonstrated an alternative, elegant, restatement of the query that couldn’t be called a dirty trick.

As someone not employed by Oracle Corp. my response was irritation that the optimizer was doing something that (clearly) it shouldn’t, and to wonder whether it was deliberate or a bug that could be worked around. The first step, of course, was to repeat Connor’s test on my default setup of Oracle – which happened to be 19.11 – to find that the optimizer did exactly what I expected and produced the right answer rather than an error. Exchanging email with Connor I learned that he had tested on 19.21 and 21.8 – so something must have changed between 19.11 and 19.21. [Ed: see comment #4 below – reader reports show that the change appeared in 19.21]

I don’t have a 21c VM handy on my laptop but I do have the 23cFREE developer VM (not yet upgraded to 23.3), so I started that up, constructed the model, and started work on the (newly) problematic query. Here are the execution plan from both versions of Oracle, 19.11 first followed by 23.2

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |     1 |    79 |   759   (1)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665C_11934AEE |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                           | 73563 |  9482K|   398   (1)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                             |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                             | 73563 |  5675K|   361   (1)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D665C_11934AEE | 73563 |  9482K|   361   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |     1 |    79 |   238   (5)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6681_1D6D63A |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                          |   211 |  8229 |   236   (5)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                            |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                            |   211 | 16669 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6681_1D6D63A |   211 |  8229 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The shape of the plan doesn’t change in the upgrade – but 23c crashed with error ORA-01428 while Oracle 19c produced the ~~correct~~ expected result. There’s a clue about why this happened in the plans, but there’s a terrible flaw in the in the way I reported the execution plans: where’s the predicate information!

Predicate Information (identified by operation id):    -- 19.11 version
---------------------------------------------------
   3 - filter("OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

Predicate Information (identified by operation id):    -- 23c version
---------------------------------------------------
   3 - filter(SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10 AND "OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

If you compare the filter() predicate information for operation 3 you can see that the optimizer in 23c has pushed the predicate from the main query block into the CTE’s query block( and, frankly, I’d call that a design error). The clue in the main body of the plan that this had happened was in the Rows column – 73,563 for 19.11 but only 211 for 23.2: the former looks about right for a copy of dba_objects, the latter is clearly much smaller that you might expect. If you’re wondering why the sqrt() predicate is repeated at operation 5, you’re not alone – I can’t think of a good reason why that’s there.

Workaround

Yes, I am going to tell you how to work around a (nominal) optimizer enhancement. This change may slip through unnoticed in testing and only be discovered, too late, in production, so some people may need a short term hack to deal with it. (Setting optimizer_features_enable to 19.1.0 didn’t make the problem go away on my system.)

Recognising that Oracle was invoking some form of filter predicate “pushdown” I searched the list of parameters, SQL Hints, and fix controls, for possibly contenders. There don’t appear to be any (new) hints related to this behaviour, but there is a parameter _optimizer_wc_filter_pushdown in 23.2 that doesn’t exist in 19.11, and it defaults to true; the description of this parameter is “enable/disable with clause filter predicate pushdown”. Setting this parameter to false (alter session/system or through an opt_param() hint) does what we need.

There are a couple of fix controls that also appear to be related, but I haven’t examined them yet:

30235754  QA control parameter for with clause filter predicate pushdown
29773708  with clause filter predicate pushdown

Again, these are not present in 19.11.

Summary

Somewhere in the 19c and 21c timelines an enhancement to the optimizer allows Oracle to minimise the size of materialized CTEs by pushing filter predicates from the query blocks that use the CTE into the CTE definition. This could result in some existing queries failing (probably with a “conversion error”) because they depended on the CTE applying some “limiting” predicates that stopped illegal values from reaching a later predicate.

If you are caught by this enhancements you can disable the feature by setting the parameter _optimizer_wc_filter_pushdown to false for the session, system, or specific query (you may be able to do the last through an SQL Patch). You ought to seek a strategic fix, however, and Connor’s video shows how you may be able to bypass the materialized CTE trick completely by using a CASE/END expression.

Comments (17)

September 5, 2023

Case Study

Filed under: Bugs,Hints,Joins,Oracle — Jonathan Lewis @ 4:14 pm BST Sep 5,2023

A recent post on the Oracle SQL and PL/SQL forum posted the following query with a complaint that it produced the wrong results:

select  count(*) 
from    all_synonyms
left join 
        all_objects b 
on      (b.owner,object_name)=(select table_owner,table_name from dual)
;

This caused a little confusion to start with since the opening complaint was that the query sometimes produced the wrong results and sometimes produced Oracle error ORA-01799: a column may not be outer-joined to a subquery. The variation in behaviour appears, of course, because the restriction applied to older versions of Oracle but had been lifted by 12c.

The other point of confusion is that there’s no easy way to tell from the output that the result is wrong – especially when the count was in the thousands. This point was addressed by the OP supplying an image of the first rows (and a subset of the columns) for the query when “count(*)” was changed to just “*” (and switched from the “all_” views to the “dba_” views: the output for the columns dba_objects.owner and dba_objects.object_name reported the same pair of values for every row – in the case of the OP this was “SYS” and “DUAL”.

Investigation

You could attack this problem from several directions – you could just edit the query to avoid the problem (the structure is a little curious); you could investigate the 10053 trace file to see what the optimizer is doing every step of the way, or you could try to simplify the model and see if the problem still appears. Since I like to keep things simple (to start with, at least) I created table my_synonyms as a copy of the data in dba_synonyms, and my_objects as a copy of the data in dba_objects and created an index on my_objects(owner, object_name). So my query turned into:

select
        *
from    
        my_synonyms   syn
left join 
        my_objects    obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

You’ll notice that I’ve given both tables a “meaningful” alias and used the aliases for every column, and I’ve also added a query block name (qb_name) to the subquery against dual – because I’d assumed that the subquery probably played a key role in messing the optimizer up (I was wrong) and I wanted to be able to track it easily.

The query produced the wrong results. In my case every row reported obj.object owner and obj.name as “SYS” / “PRINT_TABLE” – selecting 11,615 rows. The basic execution plan – pulled from memory using dbms_xplan.display_cursor() – looked like this:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       | 46548 (100)|          |
|   1 |  MERGE JOIN OUTER                      |                 | 11615 |  6261K| 46548   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL                    | MY_SYNONYMS     | 11615 |   805K|    18   (6)| 00:00:01 |
|   3 |   BUFFER SORT                          |                 |     1 |   481 | 46530   (1)| 00:00:02 |
|   4 |    VIEW                                | VW_LAT_881C048D |     1 |   481 |     4   (0)| 00:00:01 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |     1 |   132 |     4   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |     1 |       |     3   (0)| 00:00:01 |
|   7 |       FAST DUAL                        |                 |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJ"."OWNER"= AND "OBJ"."OBJECT_NAME"=)

The optimizer had transformed the query into a form using a lateral view – and there have been bugs with lateral views in the past, particularly relating to decorrelation – so maybe the problem was there and a few tests with various hints or fix_controls, or optimizer parameter settings to disable certain features might identify the source of the problem; but before doing that I thought I’d just get a small result set to check the results and execution plan in detail by adding a predicate “rownum <= 12”.

After re-executing with rowsource execution stats enabled, and report the plan with Query Block Names and Object Aliases, Outline Data, and Column Projection Information I observed two critical changes:

The query produced results that looked as if they were likely to be correct – I no longer had a repeating, incorrect, object owner and name on every row.
The plan had changed (see following).

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |      1 |        |     12 |00:00:00.01 |      24 |
|*  1 |  COUNT STOPKEY                         |                 |      1 |        |     12 |00:00:00.01 |      24 |
|   2 |   NESTED LOOPS OUTER                   |                 |      1 |     12 |     12 |00:00:00.01 |      24 |
|   3 |    TABLE ACCESS FULL                   | MY_SYNONYMS     |      1 |     12 |      7 |00:00:00.01 |       3 |
|   4 |    VIEW                                | VW_LAT_881C048D |      7 |      1 |     12 |00:00:00.01 |      21 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |      7 |      1 |     12 |00:00:00.01 |      21 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |      7 |      1 |     12 |00:00:00.01 |      17 |
|   7 |       FAST DUAL                        |                 |      7 |      1 |      7 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------------------

There’s a “count stopkey” operation to handle the “rownum <= 12” predicate, of course; but most significantly the “merge join outer” operation has changed to a “nested loop outer” operation.

Repeating the test but with “rownum <= 1200” I still got results that looked correct, and still got a “nested loop outer”. A third attempt, with “rownum <= “30000” switched back to wrong results and a “merge join outer” (with a “count stopkey”). So – initial conclusion – it’s the implementation of the merge join that has gone wrong.

Resolution

If the only problem is the choice of join mechanism we can patch the code (by hand or with an sql_patch) to bypass the problem. Given the simple pattern that produces the problem I should be able to find two critical hints in the plan’s Outline Data, one which is a leading() hint that dictates the order (my_synonyms, vw_lat_xxxxxxxx) – though the user will presumably have a different name generated for their lateral view – and a use_merge() or use_nl() hint that dictates the mechanism to use to join from my_synonyms to vw_lat_xxxxxxxx.

In both plans the Outline Data showed me the following (cosmetically adjusted) leading() hint:

leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)

This was then followed by a use_nl() hint (again cosmetically adjusted) for the correct results:

use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

and by a use_merge_cartesian() hint for the erroneous results (so it’s possibly just the “cartesian” bit that’s going wrong.:

use_merge_cartesian(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

So, as a final test, I edited the query to include the leading() and use_nl() hints, and ran it.

select  /*+
                leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)
                use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)
        */
        *
from    
        my_synonyms syn
left join 
        my_objects obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

The problem of the repeating “SYS” / “PRINT_TABLE” disappeared, and the total number of rows selected increased from 11,615 to 12,369 (mostly due to package and package bodies having the same owner and name and so doubling up a number of rows from the dba_synonyms table).

Summary

It’s almost always a good idea to simplify a messy problem.

Always check the execution plan, including Predicate Information, Outline Data and Query Block / Object Alias details.

Although I didn’t use the results in this case, executing with rowsource execution statistics enabled can be very helpful.

This case focused on a detail that looked as if a “newish” feature might be guilty; but the simple model suggested it’s an implementation detail of an old feature (possibly reached because the optimizer has applied a new feature). We can work around it very easily, and we can build a very simple test case to raise with Oracle in an SR.

Comments (5)

June 26, 2023

Case Study

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 11:32 am BST Jun 26,2023

Here’s an example of reading an execution plan that appeared on one of the Oracle forums recently. It’s a simple example, made harder by the initial (costmetic) presentation of the problem. Essentially the owner has said:

Here’s a query that’s taking too long.
Here’s a bit of the query that seems to be the slow bit
Here’s the live execution plan with rowsource execution stats enabled.

The query hadn’t arrived in an easily readable format – possibly because the OP didn’t know how to get a fixed font code display in the forum – and to make things harder the select list included several scalar subqueries and the text used the “commas at the start of line” convention. (Moreover it wasn’t immediately obvious that the first few lines of SQL shown were the first few lines of the problematic scalar subquery and not the start of the entire query.)

The OP had, however, supplied a link to a fixed format text of the execution plan, including the Query Block / Alias information, the Outline information (which showed they were running 19c with optimizer features set to 11.2.0.4), the Predicate information, and the Column Projection information.

I’ve extracted the key details from this output showing just the operations that claim to be responsible for most of the time of the execution. As you might guess, this is the sub-plan for the in-line scalar subquery that the OP had identified as the problem part, and in a follow-up posting had commented that with this subquery in place the query took one hour but with the query removed it “completed faster”.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name                          | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  25 |  SORT UNIQUE                                  |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |   1414 |00:19:44.48 |      91M|    176K|  2048 |  2048 |     2/0/0|
|  26 |   WINDOW SORT                                 |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |  37811 |00:19:44.47 |      91M|    176K|   214K|   214K|     2/0/0|
|  27 |    NESTED LOOPS                               |                               |   8149 |        |       |            |          |  37811 |00:19:58.14 |      91M|    176K|       |       |          |
|  28 |     NESTED LOOPS                              |                               |   8149 |      1 |    46 |   143   (0)| 00:00:01 |     17M|00:14:44.21 |      73M|  90450 |       |       |          |
|  29 |      TABLE ACCESS BY INDEX ROWID              | OE_ORDER_LINES_ALL            |   8149 |      1 |    31 |   139   (0)| 00:00:01 |     17M|00:10:17.69 |      19M|   8604 |       |       |          |
|* 30 |       INDEX RANGE SCAN                        | XXOM_OE_ORDER_LINES_N100      |   8149 |    142 |       |     4   (0)| 00:00:01 |     17M|00:00:20.96 |     100K|   6570 |       |       |          |
|* 31 |      INDEX RANGE SCAN                         | XXOM_3LP_SYM_ORA_ORDER_N11    |     17M|      1 |       |     3   (0)| 00:00:01 |     17M|00:03:45.52 |      54M|  81846 |       |       |          |
|* 32 |     TABLE ACCESS BY INDEX ROWID               | XXOM_3LP_SYM_ORA_ORDER_LINES  |     17M|      1 |    15 |     4   (0)| 00:00:01 |  37811 |00:05:59.72 |      19M|  86126 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  30 - access("OL"."INVENTORY_ITEM_ID"=:B1 AND "OL"."SHIP_FROM_ORG_ID"=:B2)
  31 - access("ML"."LINE_ID"="OL"."LINE_ID" AND "ML"."ORG_ID"="OL"."ORG_ID")
  32 - filter("MODEL_STRING" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
  25 - (#keys=1) FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  26 - (#keys=3) "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "MODEL_STRING"[VARCHAR2,240], "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "ML"."LINE_ID"[NUMBER,22],
       "ML"."ORG_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22], "ML".ROWID[ROWID,10], FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY
       "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  27 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."ORG_ID"[NUMBER,22], "MODEL_STRING"[VARCHAR2,240], "ML"."LINE_ID"[NUMBER,22]
  28 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  29 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22]
  30 - "OL".ROWID[ROWID,10], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22]
  31 - "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  32 - "ML".ROWID[ROWID,10], "MODEL_STRING"[VARCHAR2,240]

As I said in my reply, you have to be careful of two traps with execution plans with row source execution stats (gather_plan_statistics) enabled – first that enabling the feature can add a lot of overhead and be very misleading if the query includes a large number of very short operations, secondly that the workload and time information doesn’t always appear exactly where you feel it ought to be.

In this case we’ve got the OP saying the query completed in an hour while the report suggests that this part of the plan took 19 and three-quarter hours to complete! It seems likely that the discrepancy is a reporting error due to the side effect of the 17 million starts of operations 31 and 32, with a granularity error far in excess of the actual run-time.

If you want to check the rest of the execution plan, the driving body of the query (operations 123 to 175) has a few steps that report a few seconds with a total time of 26.90 seconds, and none of the other scalar subquery subplans report more that a few seconds. So it seems fairly clear that the necessary strategy for improving the performance of the whole thing is to find a way of improving the performance of this scalar subquery.

Note: addressing just the scalar subquery “in a vacuum” may be possible and may be sufficient. If this doesn’t work than the next – and more complex – strategy would be to take a wider view of the query and see if there were some way to get the optimizer to treat the scalar subquery differently – e.g. passing more access/filter predicates into the subquery, or finding a way of unnesting the subquery so that a brute force mechanism could be used once rather than the basic nested loop being used 8,149 times.

Analysis of subquery

I’ve shown the extracted plan for the critical subquery further up the page (which you can open from here in a new tab), here’s the in-line scalar subquery (with some cosmetic editing) that appeared in the original posting:

(
select distinct 
        first_value(model_string) over (
                partition by ol.inventory_item_id 
                order by ol.creation_date
        )
from
        xxom.xxom_3lp_sym_ora_order_lines       ml,
        apps.oe_order_lines_all                 ol
where 
        ol.inventory_item_id = pl.item_id                       -- po_lines_all
and     ol.ship_from_org_id  = pll.ship_to_organization_id      -- po_line_locations_all
and     ml.line_id           = ol.line_id
and     ml.org_id            = ol.org_id
and     model_string is not null
)       model_string,

I’ve highlighted the lines with the predicates that correlate the subquery back to the main query. You’ll note that there are two tables driving the subquery which means, in theory, that the join order and join method of those tables in the main query could (thanks to the way that scalar subquery caching is implemented) affect the number of times the subquery was called . For the purposes of this note I’m not going to worry about that detail, though.

The plan shows a simple nested loop, followed by a window sort (for the first_value() call), then a sort unique (for the distinct). We can see from the starts column that the subquery is called 8,149 times, and something that stands out quite sharply is that the index probe and table access for the second table in the nested loop (xxom_slp_sym_ora_order_lines) happens roughly 17 million times, i.e. for each row selected from oe_order_lines_all there’s an average of roughly 2,000 rows in identified and returned from xxom_slp_sym_ora_order_lines. Any tiny operation happening millions of times is likely to introduce some problematic side-effects, so it’s clearly worth having a look at this area of the plan (the fact that the values reported in the A-Time column are extreme just makes it stand out even more).

You have to be very cautious about the A-Time column when enabling rowsource execution statistics, it’s notorious for introducing double-counting and measurement error. For my own purposes I would ignore the silly figures at this point, but for the purposes of the case study I’ll make the suggestion that the 19 hours 58 minutes at operation 27 (nested loop) is the 14 hours and 44 minutes at operation 26 plus the 5 hours and 59 minutes at operation 32 (effectively double counting); then (leaving you to find the numbers and operations) 14:44 is 10:17 + 3:45 (double counting (again) with granularity errors); then 10:17 is double counting (yet again) 3:45 and 5:59; finally those last two figures have a measurement/granularity overhead that is larger than the actual time spent anyway. So ignore the exact details of the timing (once it’s given you a pointer), and look at the work done an output achieved.

Critically, we access xxom_3lp_sym_ora_order_lines 17 million times by rowid at operation 32 to return just 37,811 rows, and we also note that this 37,811 rows is the (8,149 versions of the) result set that is sorted for the first_value() function and sorted then again for the uniqueness operation. So a basic optimization strategy would be to see if we can reduce that 17 million to 37,811 earlier in the plan. Possibly there’s a different index that we could use that would allow us either to visit only the table rows we need (dropping the table access starts down to 37,811) or even to avoid visiting the table at all.

Option 1

Aiming for the simplest option – what do we do at the table: we have a filter predicate filter(“MODEL_STRING” IS NOT NULL), and the projection information says we’re only interested in “ML”.ROWID[ROWID,10], “MODEL_STRING”[VARCHAR2,240]. To sum up – if we can add model_string as a column of the index currently used then we could eliminate operation 32 entirely. The Outline Data (which I haven’t reproduced) tells us that the index in question has three columns (line_id, org_id, assembly_type), so we would have to decide whether to include the model_string as column 3 or column 4 – which we can’t do at this point because we have no information about how this index might otherwise be used.

Option 2

Adding a column to the index may eliminate some 17M function falls, 86K physical reads 19M buffer gets, but we still probe the index 17M times, taking 81K physical reads and 54M buffer gets; we also introduce the possibility of other queries performing less well because of the impact of the size and placement of the extra column – so we could consider a better approach: can we introduce a function-based index, or index on virtual columns, that would be much safer and also much smaller, and then modify the code to join through this index?

We access the table 17M times and find that the model_string is null in all but a few thousand cases. Does this give us a true picture of the whole table, or is it just a co-incidence in this case? If this is indicative of the whole table then an index defined as follows could be very effective:

(
        case when model_string is null then null else line_id end,
        case when model_string is null then null else org_id end,
	model_string
)

(For all recent versions of Oracle I would prefer to create two invisible, virtual columns, then create the index on the virtual columns and model_string. I would also investigate the meaning of the line_id and org_id columns since there may be scope of basic index compression on one or both these columns, which might dictate a better order for the first two columns).

With this definition a row with a null model_string will not appear in the index then, with a corresponding modification to the SQL, the work of probing the index would be reduced to a minimum – though it would still have to happen 17M times!

Option 3

It’s possible that the performance will be good enough after eliminating the probe of the second table and the corresponding nested loop operation, and reducing the work needed for the index probe; but it may be necessary to do far more. The fact that the available information suggests the model_string is often null and results in almost all the 17M probes returning no data prompts what might be the next (in terms of complexity) strategy to investigate – would it be possible to visit the xxom_3lp_sym_ora_order_lines table first and the oe_order_lines_all table second and only (we hope) a few thousand times.

Option 4

It doesn’t look as if a simple rewrite would be possible, but it’s very hard to tell given the number of tables in the query and the number of scalar subqueries in the select list and my ignorance of what all the tables represent and what the code is supposed to do, so I’m not going to think about that.

Instead I’ll go back to a point that I made at the start of this note: system is 19c but the optimizer features have been set back to 11.2.0.4. In 12.2.0.1 (possibly 12.1.0.1) Oracle introduced code to unnest scalar subqueries in the select list and it’s possible that an internal rewrite might manage to do something sensible with this subquery.

If it hadn’t been done already I’d try testing this query with the default optimizer features. If that was a disaster I’d repeat the exercise but start adding /*+ no_unnest */ hints to the scalar subqueries – maybe all but the critical one as a first test. I’d probably have to use some of the hints from the Outline Data of the 11g plan to disable some of the more esoteric 19c features.

If I didn’t get anywhere with the default 19c features I’d also do a few experiments with optimizer_features_enable set to 12.2.0.1 (which, I think, is the earliest version that supported scalar subquery unnesting – though it might have been 12.1.x.x).

Conclusion

A detailed execution plan, with runtime statistics generated through the gather_plan_statistics hint gives a very clear indication of where most of the query time is (apparently) spent. When a very small operation executes a very large number of times the effect of granularity errors often exaggerate the times so a little caution is always needed – especially where nested loops are concerned.

In the interests of keeping any changes as simple as possible a nice strategy is to ask two questions

can I make this operation happen less frequently
can I make this operation less resource intensive each time it happens

In our case, adding a column to an existing index could make one of the critical operations (table access by rowid) disappear completely; then, in a refinement of that localised approach, creating a function-based index (or index on virtual columns) might also make the remaining operation (index range scan) much less expensive.

Beyond that it was not immediately obvious how a simple rewrite could make the whole query more efficient, but allowing the optimizer to use some of its newer features might allow it to produce an internal rewrite that could eliminate a “row by row” process of calling scalar subqueries with embedded nested loops into a “bulk” process that unnested the subquery and allowed Oracle to find the relevant data for all the rows in a single operation before combining with a hash join.

Footnote

There is a little anomaly with the query that may have had some impact on the optimizer’s choice of execution path, but I didn’t want to get stuck into a detail that might be completely irrelevant when looking at a plan that flags up such a huge timing skew. The oddity appears in the Predicate information at operation 141 of the execution plan:

141 - filter(("PH"."ORG_ID"=9082 AND ((-1)=:BUYER OR "PH"."AGENT_ID"=:BUYER) AND (SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'OPEN' OR
SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'CLOSED')))

The sys_op_c2c() function calls tells us that there must be some “national language” activity going on somewhere in this query, and looking back at the full text of the query we can see this in the predicates:

AND COALESCE(ph.closed_code, 'OPEN') IN (N'OPEN',N'CLOSED')
AND PLL.SHIP_TO_ORGANIZATION_ID IN (N'12389',N'12390',N'12469',N'9083',N'12549')

If you’re wondering why I haven’t said anything about any sys_op_c2c() function calls appearing elsewhere in the predicate information, it’s because they don’t. Here (at operation 152) is the ship_to_organization_id predicate showing the character literals being coerced to numeric values:

 152 - access("MP"."ORGANIZATION_ID"="PLL"."SHIP_TO_ORGANIZATION_ID")
       filter(("MP"."ORGANIZATION_ID"=9083 OR "MP"."ORGANIZATION_ID"=12389 OR "MP"."ORGANIZATION_ID"=12390 OR "MP"."ORGANIZATION_ID"=12469 OR "MP"."ORGANIZATION_ID"=12549))

Character-set conversions and storing numbers as characters are mechanisms that make it hard for the optimizer to produce a reasonable cardinality estimate – and bad estimates usually produce bad plans.

Comments (1)

January 6, 2023

Case Study

Filed under: Execution plans,Oracle — Jonathan Lewis @ 4:34 pm GMT Jan 6,2023

Here’s a query that appeared on the MOS “SQL Performance” forum (needs a support account) with the request: “Can someone please assist me to optimize this query?”

It looks like a nice simple query (though looks can be deceiving) so I thought I’d use it as another example on how to think about tuning SQL.

SELECT 
        MSI.SEGMENT1, OL.SHIP_FROM_ORG_ID, 
        OL.ORDERED_QUANTITY SOLD_QTY, 
        OL.UNIT_SELLING_PRICE SELLING_PRICE
FROM 
        OE_ORDER_HEADERS_ALL   OH, 
        OE_ORDER_LINES_ALL     OL, 
        MTL_SYSTEM_ITEMS_B     MSI
WHERE 
        OH.HEADER_ID         = OL.HEADER_ID
AND     OL.SHIP_FROM_ORG_ID  = MSI.ORGANIZATION_ID
AND     OL.INVENTORY_ITEM_ID = MSI.INVENTORY_ITEM_ID
AND     NVL(oh.source_document_type_id,0) <> 10
AND     SUBSTR(SEGMENT1,4,3) = 'FIF'
AND     UPPER(OL.FLOW_STATUS_CODE) NOT IN ('ENTERED','CANCELLED')

I really dislike (and actually have trouble) reading blocks of text in upper case, so before I do anything else here’s the same code converted to (mostly) lower case, with one minor adjustment:

select
        msi.segment1, 
        ol.ship_from_org_id, 
        ol.ordered_quantity     sold_qty,
        ol.unit_selling_price   selling_price
from 
        oe_order_headers_all   oh, 
        oe_order_lines_all     ol, 
        mtl_system_items_b     msi
where 
        oh.header_id         = ol.header_id
and     ol.ship_from_org_id  = msi.organization_id
and     ol.inventory_item_id = msi.inventory_item_id
and     nvl(oh.source_document_type_id,0) <> 10
and     substr(msi.segment1,4,3) = 'FIF'
and     upper(ol.flow_status_code) not in ('ENTERED','CANCELLED')

I’ve highlighted line 15 to pick out the minor adjustment: I’ve added a table alias to the expression substr(segment1,4,3). Every column reference should include its table alias. The query was pretty good in following this guideline and it was fairly easy to work out the appropriate alias here because we can also see msi.segment1 in the select list and if there had been a column called segment1 in one of the other tables Oracle would have reported error “ORA-00918: column ambiguously defined“.

We’ve been told that the query is a “sales query” against Oracle EBS R12, and we’ve been given an execution plan – though the plan, unfortunmetly, is an image from one of the non-Oracle GUIs:

Conveniently the plan tells us that the three tables in the query really are tables and not views that hide complex subqueries; it also gives us some idea of the size of two of the tables (very big). What it doesn’t give us is any information about how and where the original predicates have been used.

Things we don’t know

What the query means (in business terms)
How long it takes the query to run
How long it probably should take the query to run
How fast the OP would like the query to run
How often the query will be run
How many rows from each table have to participate in deriving the result
How many rows and blocks there are in each table
How many distinct values there are for each of the columns in the where clause.
Whether there are any virtual columns (or extended stats) on the table
Whether there are any (numerically) skewed data distribution patterns
What physical patterns there might be in the rows identified by the predicates.
What the definitions of any available indexes are (including “function-based”)
Whether or not we’re allowed to change the code, or whether we have to “hint” it somehow.
Whether this is the only set of literal values that would appear or whether it’s generated code that allows many variations to appear.
Whether this is a case where a production query using bind variables is being tested with one known set of values.
Which version of Oracle and optimizer_features_enable / hacked optimizer parameters

Basic threats

Every single predicate that compares a column with a literal hides the column inside a function call – which means the optimizer may be “losing” important statistical information

Two of the three literal-based predicates are “negative”, i.e. “not equal” and “not in”, which is another detail that can mess up the optimizer’s arithmetic (though the specific effects may vary with version and the column statistics).

Observations and Guesswork

The optimizer cardinality estimate for upper(ol.flow_status_code) not in (‘ENTERED’,’CANCELLED’) is 76,240 with a tablescan cost of 989,658. That looks like a very small fraction of a very large table. But it’s possible that this is the standard estimate for “unknown value not in (list of 2 items)” which would be 5% of 5% in 19c. Certainly 76,240 * 20 * 20 = 30M sounds like a suitable number of rows for a table with a tablescan cost close to a million. Possibly a better cardinality estimate would change the plan.

The name flow_status_code suggests a column that would have a small number of distinct values with an enormous data skew. If we had extended stats, or a virtual column, on upper(flow_status_code) with a histogram in place the optimizer might pick a completely different path. If the estimate became very small it might be able to choose nested loops and an indexed access path all the way through the query; if the estimate became very large it might decide to use a different join order.

The optimizer cardinality estimate for nvl(oh.source_document_type_id,0) <> 10 is 6.65M with a tablescan cost of 95,823. The selectivity for this predicate is derived as 1 – selectivity(nvl(oh.source_document_type_id,0)= 10), which is derived as 1 – selectivity( oh.source_document_type_id = 10).

At this point I could launch into all sorts of speculation about the column: the number of distinct values, the number of nulls, the existence (or not) of a histogram, a comparison between the blocks and cardinality of this “order headers” table and the estimates made above for the “order lines” table – but trying to cover all the options would be a long and tangled document, so I’ll just show you one example that might be a valid model this predicate:

t1 is a table with 100,000 rows
data_type_id is a numeric column with 80 distinct values
there are 1,000 rows in t1 where data_type_id is null
I’ve gathered stats just before executing a couple of queries

Here are the two queries, each followed by the critical line from its execution plan:

SQL> select count(*) from t1 where nvl(data_type_id,0) <> 50;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|*  2 |   TABLE ACCESS FULL| T1   | 98763 |   289K|   553   (2)| 00:00:01 |
---------------------------------------------------------------------------


SQL> select count(*) from t1 where data_type_id = 50;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|*  2 |   TABLE ACCESS FULL| T1   |  1238 |  3714 |   551   (1)| 00:00:01 |
---------------------------------------------------------------------------

A quick arithmetic check: add the two row estimates: 98,763 + 1,238 = 99,001. Allowing for rounding errors in the arithmetic that’s the 99,000 rows where data_type_id is not null.

Unfortunately the actual count from the first query is 2,793 – the optimizer’s estimate is out by a factor of more than 35; and the actual count from the second query is 97,207 – the optimizer’s estimate is out (in the opposite direction) by a factor of more than 78.

The statistics are “up to date” – except 50 is an extremely common value for data_type_id and I haven’t created a histogram on the column.

So perhaps the original query is asking for the “rare” order headers but there’s no histogram to give the optimizer any clue that 10 (the unwanted value) is an extremely common value for the document_type_id. Would a suitable histogram allow the optimizer to produce a much lower cardinality estimate and a significant change in the execution plan?

It may be significant that the table order in the from clause, and the predicate order in the where clause suggests that the original programmer may have been aiming for an execution plan that started at the oe_order_headers_all table.

Finally we come to the predicate substr(msi.segment1,4,3) = ‘FIF’. I don’t know much about EBS but I can guess what the table names oe_order_headers_all and oe_order_lines_all represent; on the other hand I can’t think what is really behind the name mtl_system_items_b, especially when the column names that join it to the order lines table make it look as if it should be called something more like oe_products_all; and there’s no way to guess what the significance of a little chunk of a column called segment1 might be.

The only thing we can infer from the execution plan is that it must be a fairly huge table since the optimizer has chosen to probe it through a nested looped 76,240 times with a total incremental cost of 152,551 (= 1,244,546 – 1,091,995) rather than doing a tablescan and hash join.

The cardinality and cost estimates show a couple of interesting details. First, there’s a clear error in the cardinality estimate of the nested loop join since but the number of rows produced by the join is (very slightly) larger than the number of rows supplied to it by the hash join, even though it’s accessing the table by a unique scan of a unique index.

Secondly it’s worth commenting on the cost of the indexed access, which might seem low at only 2 when I’ve claimed it’s a “fairly huge” table which would presumably require traversing an index with a blevel of 2 (root, branch, leaf) before visiting the required table block – so a cost of 4 for the table visit would seem reasonable. In fact 4 is what we’d see for a non-unique index (on this data) and a query for “id = {constant}”; the optimizer has substracted one for the uniqueness, and one because it’s a nested loop join.

If we know that there are only a very few rows where substr(segment_1,4,3) = ‘FIF’, and if we think that this is a good starting table for the optimizer then we need to create a virtual column (or extended stats) and might need to support that with a histogram. Even then, of course, it might not be a good table to have first in the join order.

Joins

So far we’ve only been considering ways to improve the optimizer’s cardinality estimates in the hope that better information would give it a better execution plan. We believe that all three tables are very large and suspect that if the actual volume of relevant data is small we can get a better path that uses nested loops from beginning to end. But if we want that path to be efficient we’re going to need suitable indexes, including a precision index into the first table in the join.

What we need to do now is consider the resources that might be needed to ensure we have the “perfect” indexes for the optimum path, compared with the excess resources that would be used if we could find a “good enough” path.

Looking at the query, my feeling is that there are two possible paths that might work reasonably well (assuming the required volume of data is sufficiently small):

oe_order_headers_all -> or_order_lines_all -> mtl_system_items_b
oe_order_lines_all -> or_order_headers_all -> mtl_system_items_b

I suspect (in the absence of any supplied information) that a path starting with mtl_system_items_b will do too much random I/O into the order lines tables. (It’s a standard example of the problem displayed by pairings like: products -> order_lines and customers -> orders the order lines for any given product are likely to be be scattered widely across the table, as are the orders for any particular customer. Starting with mtl_system_items_b might do a lot of random I/O before discarding a lot of unwanted order lines.

Looking at the oe_order_lines_all table it strikes me that only a small number of rows will be newly entered or cancelled, and most of them will be in states like “completed”, “picked” invoiced”, etc, so the predicate on flow_status_code is probably not one that will eliminate a lot of data, so I’m just going to talk about the options for getting into the oe_order_headers_all table. Assuming the predicate “nvl(oh.source_document_type_id,0) <> 10” does identify a “small enough” number of rows then we probably have the a good enough index (the foreign key index – which probably should be created with a degree of compression and will probably also be the primary key) into oe_order_lines_all, and we know we have a unique index from there into mtl_system_items_b.

So how do we access that small number of rows as efficiently as possible with a minimum of overhead. We’ve set up a histogram on source_document_type_id so that the optimizer gets a better idea of the number of rows – but that still leaves us with a tablescan of a huge table unless we create a tailored index. Here’s an example of the type of approach we can take – based on a table t2 that has 50,000 rows of which almost all have a document_type of ‘XX’, which we want to ignore, and a few rows where the document_type is null which have to be included in the ones we keep:

SQL> execute dbms_stats.gather_table_stats(user,'t2',method_opt =>'for all columns size 1 for columns size 254 document_type')

SQL> select count(*) from t2 where nvl(document_type,'YY') <> 'XX';

  COUNT(*)
----------
       300

Execution Plan
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   278   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T2   |   300 |  1200 |   278   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("DOCUMENT_TYPE",'YY')<>'XX')

In keeping with the original supplied code I have a predicate which translates NULLs into a value that isn’t the ‘XX’ that we want to exclude. With the histogram in place the optimizer expresion has actually got exactly the right estimate.

So here’s a possible index definition that will allow us to create a very small index that identifies exactly those rows as efficiently as possible:

SQL> create index t2_i1 on t2 (case when nvl(document_type,'YY') <> 'XX' then 1 end);

SQL> execute dbms_stats.gather_table_stats(user,'t2',method_opt =>'for all hidden columns')

SQL> select count(*) from t2 where case when nvl(document_type,'YY') <> 'XX' then 1 end = 1;

  COUNT(*)
----------
       300

Execution Plan
----------------------------------------------------------
Plan hash value: 3503408237

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     2 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |       |     1 |     2 |            |          |
|*  2 |   INDEX RANGE SCAN| T2_I1 |   300 |   600 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(CASE  WHEN NVL("DOCUMENT_TYPE",'YY')<>'XX' THEN 1 END =1)

Two things to note; first, after I’ve created the “function-based” index I’ve gathered stats on “all hidden columns”. This is one way of gathering stats on the system-generated, invisible, virtual column that supports the index, and is a step that is often missed until the stats are gathered overnight by the automatic stats collection job at some random point in the future. Secondly I’ve had to change the query so that the predicate I use is an exact match for the index definition; this is why it’s often nicer to create a virtual column for the expression and index the virtual column – and to keep 3rd party or legacy code safe it’s often a good idea to declare such columns invisible.

Finally, just to show the efficiency of this strategy, here are a couple of the interesting stats about the index:

SQL> select table_name, index_name, num_rows , leaf_blocks from user_indexes where table_name = 'T2';

TABLE_NAME                INDEX_NAME             NUM_ROWS LEAF_BLOCKS
------------------------- -------------------- ---------- -----------
T2                        T2_I1                       300           1

The table has 50,000 rows, but the index has only 300 entries (that fit in a single leaf block) as the expression is NULL for the other 49,700. As an important bonus it’s fairly safe to assume that there won’t be any code in the system that will decide to use this strangely defined index when it should be using some other index.

Conclusion

We started with a query that was “too slow”. It contained some predicates that would hide any useful statistical information from the optimizer. In two of the three cases we could give the optimizer some useful statistics by creating virtual columns or extended stats on the expressions; and in the third case the structure of the expression was a special “nvl()” case that could simply have been missing a histogram on the underlying column.

Some of the information in the execution plan gave us clues about the scale of the tables – including the table which was accessed by a unique index – but on-site DBAs wouldn’t need to make guesses about some of the numbers I came up with, they could simply query the data dictionary or, for some details, query the data directly.

Reviewing table and column names, and making some assumptions about some of the data distributions (again details that could be extracted from the actual data), I picked a path that would probably be suitable if the required volume of data was relatively small, then demonstrated how we could add in an efficient index that would make this query as efficient as needed without requiring much overhead in index maintenance and without introducing the risk of other queries changing execution plans to use this new index.

Comments (6)

November 15, 2022

opt_estimate 4a

Filed under: CBO,Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 11:21 am GMT Nov 15,2022

I wrote a batch of notes about the opt_estimate() hint a couple of years ago, including one where I explained the option for using the hint to specify the number of rows in a query block. I’ve just come across a particular special case for that strategy that others might find a use for. It’s something to do when using the “select from dual … connect by” trick for multiplying rows.

Here’s a little data to model the idea – I’ve used the all_tables view to generate some “well-known” data since I want to add a tiny bit of complexity to the query while still leaving it easy to understand the index. The results from this demonstration come from Oracle 21.3.0.0, and I’ve included the hint /*+ no_adaptive_plan */ to stop Oracle from getting too clever during optimisation.

rem
rem     Script:         opt_estimate_dual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2022
rem
rem     Last tested 
rem             21.3.0.0
rem

create table tables_table as select * from all_objects where object_type = 'TABLE';
create table objects_table as select * from all_objects;

alter table objects_table add constraint ot_pk primary key(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    => 'tables_table',
                method_opt => 'for columns size 60 owner'
        );
end;
/

set serveroutput off

with driver as (
        select  /*+ materialize */
                tt.owner, tt.object_id, v1.rn
        from    tables_table tt,
                (
                select
                        /*+  opt_estimate(query_block scale_rows=10) */
                        rownum rn
                from    dual
                connect by
                        level <= 10
                ) v1
        where
                tt.owner = 'OUTLN'
)
select  /*+ no_adaptive_plan */
        dr.rn, dr.owner, dr.object_id,
        ot.object_id, ot.owner, ot.object_type, ot.object_name
from
        driver dr,
        objects_table   ot
where
        ot.object_id = dr.object_id
/

select * from table(dbms_xplan.display_cursor(format => 'hint_report'));

In my system tables_table holds 727 rows and objects_table holds 58383 rows. Three rows in tables_table correspond to tables owned by user ‘OUTLN’ which means I expect the driver CTE (common table expression / “with” subquery) to generate 30 rows and, given the join on unique id, the query to return 30 rows.

I’ve used the /*+ materialize */ hint to force Oracle to create an in-memory temporary table for the driver CTE, the /*+ no_adaptive_plan */ hint to stop Oracle from getting too clever during optimisation, and the critical /*+ opt_estimate() */ hint to help the optimizer understand the effect of my “connect by” on dual. Here’s the execution plan I get if I don’t include that last hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    14 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6632_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |     3 |    78 |     9   (0)| 00:00:01 |
|   4 |     VIEW                                 |                            |     1 |    13 |     2   (0)| 00:00:01 |
|   5 |      COUNT                               |                            |       |       |            |          |
|   6 |       CONNECT BY WITHOUT FILTERING       |                            |       |       |            |          |
|   7 |        FAST DUAL                         |                            |     1 |       |     2   (0)| 00:00:01 |
|   8 |     BUFFER SORT                          |                            |     3 |    39 |     9   (0)| 00:00:01 |
|*  9 |      TABLE ACCESS FULL                   | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |     3 |   453 |     5   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |     3 |   276 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6632_31D19D4 |     3 |    78 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize

I’ve highlighted operations 4 and 8 in the plan: operation 4 is the view of dual that has generated 10 rows – unfortunately the optimizer has only considered the stats of the dual table, and hasn’t factored in the effects of the “connect by with rownum”. Operation 8 shows us that the optimizer has (correctly, thanks to the histogram I requested) estimated 3 rows for the tablescan of tables_table. The result of these two estimates is that operation 3 reports an estimate of 3 ( = 3 * 1 ) rows to be used in probing objects_table.

This is the plan after enabling the /*+ opt_estimate() */ hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    45 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6633_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |    30 |   780 |    13   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |    10 |   130 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |    10 |   130 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   NESTED LOOPS                           |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  11 |    NESTED LOOPS                          |                            |    30 |  4530 |    32   (0)| 00:00:01 |
|  12 |     VIEW                                 |                            |    30 |  2760 |     2   (0)| 00:00:01 |
|  13 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6633_31D19D4 |    30 |   780 |     2   (0)| 00:00:01 |
|* 14 |     INDEX UNIQUE SCAN                    | OT_PK                      |     1 |       |     0   (0)|          |
|  15 |    TABLE ACCESS BY INDEX ROWID           | OBJECTS_TABLE              |     1 |    59 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("TT"."OWNER"='OUTLN')
  14 - access("OT"."OBJECT_ID"="DR"."OBJECT_ID")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  no_adaptive_plan

   2 -  SEL$1
           -  materialize

There are three things that stand out in this report.

I’ve highlighted operations 4 and 6: operation 4 is the tablescan of tables_table that correctly estimates 3 rows; operation 6 is the view operation that now correctly estimates 10 rows.
With the correct estimate for the view the estimate for the join to objects_table is now correct and the join order for the merge join cartesian at operation 3 has been reversed.
The Hint Report tells us that the opt_estimate() hint is not (always) an optimizer hint! This is a real pain because when the opt_estimate() hints you’ve tried to use don’t appear to work it’s not easy to work out what you’ve done wrong.

To make a point, I can take the demo a little further by changing the /*+ opt_estimate() */ hint to scale_rows=120. Here’s the body of the resulting plan:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |   369 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D663A_31D19D4 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN                  |                            |   360 |  9360 |    13   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                    | TABLES_TABLE               |     3 |    39 |     7   (0)| 00:00:01 |
|   5 |     BUFFER SORT                          |                            |   120 |  1560 |     6   (0)| 00:00:01 |
|   6 |      VIEW                                |                            |   120 |  1560 |     2   (0)| 00:00:01 |
|   7 |       COUNT                              |                            |       |       |            |          |
|   8 |        CONNECT BY WITHOUT FILTERING      |                            |       |       |            |          |
|   9 |         FAST DUAL                        |                            |     1 |       |     2   (0)| 00:00:01 |
|  10 |   HASH JOIN                              |                            |   360 | 54360 |   356   (1)| 00:00:01 |
|  11 |    VIEW                                  |                            |   360 | 33120 |     2   (0)| 00:00:01 |
|  12 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D663A_31D19D4 |   360 |  9360 |     2   (0)| 00:00:01 |
|  13 |    TABLE ACCESS FULL                     | OBJECTS_TABLE              | 58383 |  3363K|   354   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The earlier plans used a nested loop join into objects_table. In this plan we can see at operation 10 that the optimizer has selected a hash join because the larger row estimate for the CTE has increased the cost of the query beyond the inflection point between nested loop and hash joins.

Summary

If you need to use the “connect by” in an inline view then you may find that the optimizer gets a very bad estimate of the number of rows the view definition will generate and that an /*+ opt_estimate() */ hint in the view using the “scale_rows=nnn” option will produce better estimates of cardinality, hence a better plan.

Footnote

In this particular case where I’ve used the dual table by itself in an inline view I could have used the “rows=NNN” option to get the same effect.

In any case I could have added a /*+ qb_name() */ hint to the inline view, and included a qualifying “@{queryblock}” in the /*+ opt_estimate() */ hint.

Using hints is hard, especially when they’re not documented. There is a lot more to learn about this hint: for example, telling the optimizer about the size of a rowsource doesn’t help if it’s going to use its estimate of distinct values in the next steps of the plan – a correction you’ve managed to inject at one stage may disappear in the very next optimizer calculation.

This catalogue lists more articles on the opt_estimate() hint and its relatives.

Comments (3)

September 29, 2022

Case Study

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:27 pm BST Sep 29,2022

A recent question on the Oracle Developer Community forum asked for help with a statement that was taking a long time to run. The thread included the results from a trace file that had been passed through tkprof so we have the query and the actual execution plan with the Row Source Operation details.

Here’s the query – extracted from the tkprof output:

SELECT DISTINCT
       pll.po_line_id,
       ploc.line_location_id,
       (SELECT ptl.line_type
          FROM apps.po_line_types_tl ptl
         WHERE ptl.line_type_id = pll.line_type_id AND ptl.LANGUAGE = 'US')
           "Line_Type",
       ploc.quantity_accepted,
       NULL
           release_approved_date,
       NULL
           release_date,
       NULL
           release_hold_flag,
       NULL
           release_type,
       DECODE (ploc.po_release_id, NULL, NULL, ploc.quantity)
           released_quantity,
       (SELECT items.preprocessing_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "PreProcessing_LT",
       (SELECT items.full_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "Processing_LT",
       (SELECT items.postprocessing_lead_time
          FROM apps.mtl_system_items_b items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id = ploc.SHIP_TO_ORGANIZATION_ID)
           "PostProcessing_LT",
       ploc.firm_status_lookup_code,
       NVL (
           (SELECT pla.promised_date
              FROM apps.po_line_locations_archive_all pla
             WHERE     pla.po_header_id = pha.po_header_id
                   AND pla.po_line_id = pll.po_line_id
                   AND pla.line_location_id = ploc.line_location_id
                   AND pla.revision_num =
                       (SELECT MIN (revision_num)
                          FROM apps.po_line_locations_archive_all plla2
                         WHERE     plla2.promised_date IS NOT NULL
                               AND plla2.line_location_id =
                                   ploc.line_location_id)),
           ploc.promised_date)
           "Original_Promise_Date",
       (SELECT items.long_description
          FROM apps.mtl_system_items_tl items
         WHERE     items.inventory_item_id = pll.item_id
               AND items.organization_id IN
                       (SELECT fin.inventory_organization_id
                          FROM apps.financials_system_params_all fin
                         WHERE fin.org_id = pha.org_id)
               AND items.LANGUAGE = 'US')
           "Item_Long_Description",
       NVL (ploc.approved_flag, 'N')
           approved_code,
       pvs.country
           "Supplier_Site_Country",
       pll.note_to_vendor,
         NVL (ploc.quantity, 0)
       - NVL (ploc.quantity_cancelled, 0)
       - NVL (ploc.quantity_received, 0) * ploc.price_override
           "Shipment_Amount",
       ploc.attribute4
           "PO_Ship_Date",
       (SELECT meaning
          FROM apps.fnd_lookup_values
         WHERE     lookup_type = 'SHIP_METHOD'
               AND lookup_code = ploc.attribute9
               AND language = 'US')
           "Ship_Method",
       (SELECT prla.note_to_receiver
          FROM apps.po_req_distributions_all  prda
               INNER JOIN apps.po_requisition_lines_all prla
                   ON prda.requisition_line_id = prla.requisition_line_id
         WHERE prda.distribution_id = pdi.req_distribution_id)
           "Note_To_Receiver",
       DECODE (pha.USER_HOLD_FLAG, 'Y', 'Y', pll.USER_HOLD_FLAG)
           "Hold_Flag",
       (SELECT ABC_CLASS_NAME
          FROM APPS.MTL_ABC_ASSIGNMENT_GROUPS  ASG
               INNER JOIN APPS.MTL_ABC_ASSIGNMENTS ASSI
                   ON ASG.ASSIGNMENT_GROUP_ID = ASSI.ASSIGNMENT_GROUP_ID
               INNER JOIN APPS.MTL_ABC_CLASSES classes
                   ON ASSI.ABC_CLASS_ID = classes.ABC_CLASS_ID
         WHERE     ASG.organization_id = ploc.SHIP_TO_ORGANIZATION_ID
               AND ASG.ASSIGNMENT_GROUP_NAME = 'MIN ABC Assignment'
               AND ASSI.inventory_item_id = pll.item_id)
           ABCClass,
       (SELECT CONCATENATED_SEGMENTS AS charge_accountsfrom
          FROM apps.gl_code_combinations_kfv gcc
         WHERE gcc.code_combination_id = pdi.code_combination_id)
           AS charge_accounts
  FROM apps.po_headers_all         pha,
       apps.po_lines_all           pll,
       apps.po_line_locations_all  ploc,
       apps.po_distributions_all   pdi,
       apps.per_all_people_f       papf,
       apps.AP_SUPPLIERS           pv,
       apps.AP_SUPPLIER_SITES_ALL  pvs,
       apps.AP_SUPPLIER_CONTACTS   pvc,
       apps.ap_terms               apt,
       apps.po_lookup_codes        plc1,
       apps.po_lookup_codes        plc2,
       apps.hr_locations           hlv_line_ship_to,
       apps.hr_locations           hlv_ship_to,
       apps.hr_locations           hlv_bill_to,
       apps.hr_organization_units  hou,
       apps.hr_locations_no_join   loc,
       apps.hr_locations_all_tl    hrl1,
       apps.hr_locations_all_tl    hrl2
 WHERE     1 = 1
       AND pll.po_header_id(+) = pha.po_header_id
       AND ploc.po_line_id(+) = pll.po_line_id
       AND pdi.line_location_id(+) = ploc.line_location_id
       AND ploc.shipment_type IN ('STANDARD', 'PLANNED')
       AND papf.person_id(+) = pha.agent_id
       AND TRUNC (SYSDATE) BETWEEN papf.effective_start_date
                               AND papf.effective_end_date
       AND papf.employee_number IS NOT NULL
       AND pv.vendor_id(+) = pha.vendor_id
       AND pvs.vendor_site_id(+) = pha.vendor_site_id
       AND pvc.vendor_contact_id(+) = pha.vendor_contact_id
       AND apt.term_id(+) = pha.terms_id
       AND plc1.lookup_code(+) = pha.fob_lookup_code
       AND plc1.lookup_type(+) = 'FOB'
       AND plc2.lookup_code(+) = pha.freight_terms_lookup_code
       AND plc2.lookup_type(+) = 'FREIGHT TERMS'
       AND hlv_line_ship_to.location_id(+) = ploc.ship_to_location_id
       AND hlv_ship_to.location_id(+) = pha.ship_to_location_id
       AND hlv_bill_to.location_id(+) = pha.bill_to_location_id
       AND hou.organization_id = pha.org_id
       AND hou.location_id = loc.location_id(+)
       AND hrl1.location_id(+) = pha.ship_to_location_id
       AND hrl1.LANGUAGE(+) = 'US'
       AND hrl2.location_id(+) = pha.bill_to_location_id
       AND hrl2.LANGUAGE(+) = 'US'
       AND hou.organization_id IN (2763)
       AND NVL (pha.closed_code, 'OPEN') IN ('OPEN', 'CLOSED')
       AND NVL (pll.closed_code, 'OPEN') IN ('OPEN', 'CLOSED')
       AND NVL (ploc.cancel_flag, 'N') = 'N'
       AND pha.authorization_status IN
               ('APPROVED', 'REQUIRES REAPPROVAL', 'IN PROCESS')

As you can see there are 10 inline scalar subqueries (highlighted) in the query with a select distinct to finish off the processing of an 18 table join. That’s a lot of scalar subqueries so it’s worth asking whether the code should be rewritten to use joins (though in newer vesions of Oracle some of the subqueries might be transformed to outer joins anyway – but the OP is using 11.2.0.4). We also know that a distinct is sometimes a hint that the code has a logic error that has been “fixed” by eliminating duplicates.

Ignoring those points, let’s consider the execution plan from the tkprof output which (with a tiny bit of extra formatting) is as follows:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.46       1.75          0          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    50346    279.02    1059.39     179103   30146895          0      755164
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50348    279.49    1061.14     179103   30146898          0      755164

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 678  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         9          9          9  TABLE ACCESS BY INDEX ROWID PO_LINE_TYPES_TL (cr=20 pr=0 pw=0 time=680 us cost=2 size=32 card=1)
         9          9          9   INDEX UNIQUE SCAN PO_LINE_TYPES_TL_U1 (cr=11 pr=0 pw=0 time=323 us cost=1 size=0 card=1)(object id 63682480)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267756 pr=28 pw=0 time=22598079 us cost=4 size=13 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720936 pr=0 pw=0 time=4644552 us cost=3 size=0 card=1)(object id 42812859)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267747 pr=0 pw=0 time=2442479 us cost=4 size=13 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720936 pr=0 pw=0 time=1238342 us cost=3 size=0 card=1)(object id 42812859)

    576365     576365     576365  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_B (cr=2267743 pr=0 pw=0 time=2029190 us cost=4 size=14 card=1)
    576365     576365     576365   INDEX UNIQUE SCAN MTL_SYSTEM_ITEMS_B_U1 (cr=1720932 pr=0 pw=0 time=967729 us cost=3 size=0 card=1)(object id 42812859)

    672743     672743     672743  TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=5507736 pr=163043 pw=0 time=535914552 us cost=3 size=27 card=1)
    672743     672743     672743   INDEX UNIQUE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=4560824 pr=163043 pw=0 time=533161038 us cost=2 size=0 card=1)(object id 42811947)
    755121     755121     755121    SORT AGGREGATE (cr=3540960 pr=163043 pw=0 time=530079821 us)
   1040963    1040963    1040963     TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=3540960 pr=163043 pw=0 time=534243973 us cost=5 size=15 card=1)
   1776649    1776649    1776649      INDEX RANGE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=1123074 pr=6392 pw=0 time=37128373 us cost=3 size=0 card=2)(object id 42811947)

    587486     587486     587486  TABLE ACCESS BY INDEX ROWID MTL_SYSTEM_ITEMS_TL (cr=3436629 pr=3564 pw=0 time=64125044 us cost=5 size=34 card=1)
    587486     587486     587486   INDEX RANGE SCAN MTL_SYSTEM_ITEMS_TL_U1 (cr=2852930 pr=869 pw=0 time=45628505 us cost=4 size=0 card=1)(object id 136492495)
         1          1          1    TABLE ACCESS BY INDEX ROWID FINANCIALS_SYSTEM_PARAMS_ALL (cr=645351 pr=0 pw=0 time=5743158 us cost=2 size=10 card=1)
    322268     322268     322268     INDEX SKIP SCAN FINANCIALS_SYSTEM_PARAMS_U1 (cr=323083 pr=0 pw=0 time=5104895 us cost=1 size=0 card=1)(object id 42770563)

        10         10         10  TABLE ACCESS BY INDEX ROWID FND_LOOKUP_VALUES (cr=51 pr=1 pw=0 time=3620 us cost=5 size=60 card=1)
        20         20         20   INDEX RANGE SCAN FND_LOOKUP_VALUES_X99 (cr=31 pr=1 pw=0 time=2133 us cost=4 size=0 card=1)(object id 42759866)

    634276     634276     634276  NESTED LOOPS  (cr=3540930 pr=5535 pw=0 time=181518759 us cost=5 size=28 card=1)
    634276     634276     634276   TABLE ACCESS BY INDEX ROWID PO_REQ_DISTRIBUTIONS_ALL (cr=1631471 pr=5253 pw=0 time=65405333 us cost=3 size=12 card=1)
    634276     634276     634276    INDEX UNIQUE SCAN PO_REQ_DISTRIBUTIONS_U1 (cr=994522 pr=5252 pw=0 time=31023194 us cost=2 size=0 card=1)(object id 42788583)
    634276     634276     634276   TABLE ACCESS BY INDEX ROWID PO_REQUISITION_LINES_ALL (cr=1909459 pr=282 pw=0 time=115275921 us cost=2 size=16 card=1)
    634276     634276     634276    INDEX UNIQUE SCAN PO_REQUISITION_LINES_U1 (cr=944449 pr=268 pw=0 time=12285440 us cost=1 size=0 card=1)(object id 42789681)

    511989     511989     511989  NESTED LOOPS  (cr=3533763 pr=6 pw=0 time=8999321 us cost=5 size=55 card=1)
    511989     511989     511989   NESTED LOOPS  (cr=2850293 pr=6 pw=0 time=7086027 us cost=4 size=45 card=1)
    576055     576055     576055    TABLE ACCESS BY INDEX ROWID MTL_ABC_ASSIGNMENT_GROUPS (cr=612378 pr=0 pw=0 time=2002832 us cost=2 size=29 card=1)
    576055     576055     576055     INDEX UNIQUE SCAN MTL_ABC_ASSIGNMENT_GROUPS_U2 (cr=36323 pr=0 pw=0 time=951307 us cost=1 size=0 card=1)(object id 42783622)
    511989     511989     511989    TABLE ACCESS BY INDEX ROWID MTL_ABC_ASSIGNMENTS (cr=2237915 pr=6 pw=0 time=4672006 us cost=3 size=16 card=1)
    511989     511989     511989     INDEX UNIQUE SCAN MTL_ABC_ASSIGNMENTS_U1 (cr=1551490 pr=4 pw=0 time=2533524 us cost=2 size=0 card=1)(object id 42757737)
    511989     511989     511989   TABLE ACCESS BY INDEX ROWID MTL_ABC_CLASSES (cr=683470 pr=0 pw=0 time=1488045 us cost=1 size=10 card=1)
    511989     511989     511989    INDEX UNIQUE SCAN MTL_ABC_CLASSES_U1 (cr=171481 pr=0 pw=0 time=693745 us cost=0 size=0 card=1)(object id 42789694)

     13320      13320      13320  TABLE ACCESS BY INDEX ROWID GL_CODE_COMBINATIONS (cr=34801 pr=0 pw=0 time=802675 us cost=3 size=49 card=1)
     13320      13320      13320   INDEX UNIQUE SCAN GL_CODE_COMBINATIONS_U1 (cr=21481 pr=0 pw=0 time=397344 us cost=2 size=0 card=1)(object id 42775044)


    755164     755164     755164  HASH UNIQUE (cr=30147018 pr=179103 pw=0 time=1058922684 us cost=749257 size=197349453 card=482517)
    768890     768890     768890   HASH JOIN  (cr=7289842 pr=6926 pw=0 time=244582512 us cost=696202 size=197349453 card=482517)
    140451     140451     140451    TABLE ACCESS FULL PER_ALL_PEOPLE_F (cr=38207 pr=0 pw=0 time=313692 us cost=18484 size=13278261 card=428331)
    768890     768890     768890    NESTED LOOPS OUTER (cr=7251635 pr=6926 pw=0 time=242897348 us cost=672652 size=30016980 card=79410)
    755121     755121     755121     NESTED LOOPS OUTER (cr=5538283 pr=6031 pw=0 time=154841427 us cost=443987 size=28382903 card=78623)
    755121     755121     755121      NESTED LOOPS OUTER (cr=5508916 pr=6031 pw=0 time=153523676 us cost=443982 size=18184959 card=51809)
    755121     755121     755121       NESTED LOOPS OUTER (cr=5386279 pr=6031 pw=0 time=151985656 us cost=443978 size=11642422 card=34142)
    755121     755121     755121        NESTED LOOPS  (cr=5090949 pr=6031 pw=0 time=139220421 us cost=375644 size=11574138 card=34142)
    792959     792959     792959         NESTED LOOPS  (cr=1747964 pr=134 pw=0 time=64597738 us cost=109035 size=19934760 card=73560)
    254919     254919     254919          HASH JOIN OUTER (cr=315780 pr=6 pw=0 time=14811187 us cost=29121 size=5413350 card=22650)
    254919     254919     254919           NESTED LOOPS OUTER (cr=286919 pr=0 pw=0 time=12395919 us cost=13792 size=5209500 card=22650)
    254919     254919     254919            HASH JOIN RIGHT OUTER (cr=107134 pr=0 pw=0 time=12153146 us cost=13790 size=3868572 card=17426)
      3834       3834       3834             VIEW  HR_LOCATIONS (cr=3913 pr=0 pw=0 time=15826 us cost=125 size=360 card=60)
      3834       3834       3834              NESTED LOOPS  (cr=3913 pr=0 pw=0 time=15055 us cost=125 size=1080 card=60)
      3834       3834       3834               TABLE ACCESS FULL HR_LOCATIONS_ALL (cr=262 pr=0 pw=0 time=11211 us cost=125 size=304 card=38)
      3834       3834       3834               INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=3651 pr=0 pw=0 time=6183 us cost=0 size=20 card=2)(object id 42783719)
    254919     254919     254919             HASH JOIN RIGHT OUTER (cr=103221 pr=0 pw=0 time=11917174 us cost=13666 size=3764016 card=17426)
      3834       3834       3834              VIEW  HR_LOCATIONS (cr=3898 pr=0 pw=0 time=14651 us cost=125 size=360 card=60)
      3834       3834       3834               NESTED LOOPS  (cr=3898 pr=0 pw=0 time=14267 us cost=125 size=1080 card=60)
      3834       3834       3834                TABLE ACCESS FULL HR_LOCATIONS_ALL (cr=247 pr=0 pw=0 time=9532 us cost=125 size=304 card=38)
      3834       3834       3834                INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=3651 pr=0 pw=0 time=9539 us cost=0 size=20 card=2)(object id 42783719)
    254919     254919     254919              HASH JOIN RIGHT OUTER (cr=99323 pr=0 pw=0 time=11817243 us cost=13541 size=3659460 card=17426)
        45         45         45               INDEX RANGE SCAN FND_LOOKUP_VALUES_U1 (cr=21 pr=0 pw=0 time=614 us cost=4 size=49 card=1)(object id 63685210)
    254919     254919     254919               HASH JOIN RIGHT OUTER (cr=99302 pr=0 pw=0 time=11729251 us cost=13537 size=2805586 card=17426)
        59         59         59                INDEX RANGE SCAN FND_LOOKUP_VALUES_U1 (cr=20 pr=0 pw=0 time=445 us cost=4 size=49 card=1)(object id 63685210)
    254919     254919     254919                NESTED LOOPS  (cr=99282 pr=0 pw=0 time=11653162 us cost=13533 size=1951712 card=17426)
         1          1          1                 NESTED LOOPS OUTER (cr=116 pr=0 pw=0 time=113273 us cost=3 size=40 card=1)
         1          1          1                  NESTED LOOPS  (cr=113 pr=0 pw=0 time=113227 us cost=2 size=32 card=1)
         1          1          1                   INDEX UNIQUE SCAN HR_ALL_ORGANIZATION_UNTS_TL_PK (cr=110 pr=0 pw=0 time=113164 us cost=1 size=17 card=1)(object id 63680720)
         1          1          1                   TABLE ACCESS BY INDEX ROWID HR_ALL_ORGANIZATION_UNITS (cr=3 pr=0 pw=0 time=59 us cost=1 size=15 card=1)
         1          1          1                    INDEX UNIQUE SCAN HR_ORGANIZATION_UNITS_PK (cr=2 pr=0 pw=0 time=7 us cost=0 size=0 card=1)(object id 42789144)
         1          1          1                  TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=3 pr=0 pw=0 time=42 us cost=1 size=8 card=1)
         1          1          1                   INDEX UNIQUE SCAN HR_LOCATIONS_PK (cr=2 pr=0 pw=0 time=7 us cost=0 size=0 card=1)(object id 42797079)
    254919     254919     254919                 TABLE ACCESS BY INDEX ROWID PO_HEADERS_ALL (cr=99166 pr=0 pw=0 time=11505632 us cost=13530 size=1254672 card=17426)
    255397     255397     255397                  INDEX SKIP SCAN PO_HEADERS_ALL_X3 (cr=1753 pr=0 pw=0 time=725236 us cost=352 size=0 card=37674)(object id 42773719)
    254883     254883     254883            INDEX UNIQUE SCAN AP_TERMS_TL_U1 (cr=179785 pr=0 pw=0 time=183291 us cost=0 size=8 card=1)(object id 42798416)
    482528     482528     482528           TABLE ACCESS FULL AP_SUPPLIER_SITES_ALL (cr=28861 pr=6 pw=0 time=227983 us cost=13727 size=4323123 card=480347)
    792959     792959     792959          TABLE ACCESS BY INDEX ROWID PO_LINES_ALL (cr=1432184 pr=128 pw=0 time=53002963 us cost=5 size=96 card=3)
    793375     793375     793375           INDEX RANGE SCAN PO_LINES_U2 (cr=504726 pr=20 pw=0 time=17603112 us cost=2 size=0 card=5)(object id 42755253)
    755121     755121     755121         TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ALL (cr=3342985 pr=5897 pw=0 time=71357938 us cost=4 size=68 card=1)
   1138558    1138558    1138558          INDEX RANGE SCAN PO_LINE_LOCATIONS_N15 (cr=1707311 pr=5830 pw=0 time=37903421 us cost=3 size=0 card=2)(object id 63697005)
    723002     723002     723002        VIEW PUSHED PREDICATE  HR_LOCATIONS (cr=295330 pr=0 pw=0 time=11391536 us cost=2 size=2 card=1)
    723002     723002     723002         NESTED LOOPS  (cr=295330 pr=0 pw=0 time=11004720 us cost=2 size=18 card=1)
    723002     723002     723002          INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=146911 pr=0 pw=0 time=1391389 us cost=1 size=10 card=1)(object id 42783719)
    723002     723002     723002          TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=148419 pr=0 pw=0 time=9233363 us cost=1 size=8 card=1)
    723002     723002     723002           INDEX UNIQUE SCAN HR_LOCATIONS_PK (cr=117800 pr=0 pw=0 time=836734 us cost=0 size=0 card=1)(object id 42797079)
    755119     755119     755119       INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=122637 pr=0 pw=0 time=829404 us cost=0 size=20 card=2)(object id 42783719)
    755121     755121     755121      INDEX UNIQUE SCAN HR_LOCATIONS_ALL_TL_PK (cr=29367 pr=0 pw=0 time=716408 us cost=0 size=20 card=2)(object id 42783719)
    768883     768883     768883     TABLE ACCESS BY INDEX ROWID PO_DISTRIBUTIONS_ALL (cr=1713352 pr=895 pw=0 time=75314769 us cost=3 size=17 card=1)
    768883     768883     768883      INDEX RANGE SCAN PO_DISTRIBUTIONS_N1 (cr=1096671 pr=874 pw=0 time=24392643 us cost=2 size=0 card=1)(object id 42782429)

The plan is a bit long, but you may recall that a query with scalar subqueries in the select list reports the plans for each of the separate scalar subqueries before reporting the main query block – and I’ve inserted blank lines in the output above to improve the visibility of the individual blocks / scalar subqueries.

An odd little detail of this tkprof output was that there was no report of the wait information recorded against the query, though the following information appeared as the summary for the trace file, giving us a very good idea of the wait events for the individual query:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        6      0.85       2.14          0          6          0           0
Execute      6      0.00       0.00          0          7        104          85
Fetch    50358    279.03    1059.39     179103   30146895          0      755329
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50370    279.88    1061.54     179103   30146908        104      755414

Misses in library cache during parse: 3

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   50363        0.00          0.00
  SQL*Net message from client                 50362      157.17        227.70
  row cache lock                                141        0.03          0.67
  library cache lock                             77        0.01          0.21
  library cache pin                              75        0.01          0.27
  Disk file operations I/O                      791        0.00          0.01
  gc current block 3-way                     835881        0.15        305.35
  gc current block 2-way                     471360        0.24        144.04
  KJC: Wait for msg sends to complete            40        0.00          0.00
  gc cr multi block request                       8        0.00          0.00
  gc current block congested                  10014        0.03          4.23
  gc cr block 3-way                           20215        0.06          4.69
  gc current grant busy                          20        0.00          0.00
  gc cr grant 2-way                          165010        0.07         25.13
  db file sequential read                    179103        0.05        196.31
  gc cr grant congested                         729        0.19          0.36
  gc current block busy                       71431        0.05        118.15
  gc cr block 2-way                            1800        0.01          0.31
  latch free                                      3        0.00          0.00
  gc cr block congested                         197        0.01          0.06
  latch: cache buffers chains                    45        0.00          0.00
  latch: gc element                              15        0.00          0.00
  gc cr block busy                               15        0.02          0.07
  latch: object queue header operation            1        0.00          0.00
  KSV master wait                                 2        0.00          0.00
  ASM file metadata operation                     1        0.00          0.00
  SQL*Net more data to client                     1        0.00          0.00
  gc current grant 2-way                          6        0.00          0.00

An important initial observation is that the query returned 750,000 rows in 50,000 fetches (all figures rounded for convenience) and that’s consistent with the SQL*Plus default arraysize of 15. So there might be a little time saved by setting the arraysize to a larger value (but only a few 10s of seconds – based on the 227 seconds total minus the 157 second maximum wait for the “SQL*Net message from client” figures and there may be some benefit of increasing the SQL*Net SDU size at the same time). Critically, though, we should ask “why do you want a query to return 750,000 rows?”, and “how fast do you think is ‘reasonable’?” You’ll also note from the “gc” waits that the system is based on RAC with at least 3 nodes – and RAC is always a suspect when you see unexpected time spent in a query.

Where in the driving query block does most of the time go between the last hash join (line 62) and the hash unique (line 61) – it’s in the query block whose plan starts at line 28 where we see 163,000 physical blocks read (pr=) and 535 seconds (time= microseconds) of which 6,400 blocks come from the index range scan operation at line 32 but most comes from line 31 fetching 1 million rows (by index rowid) from table po_lines_locations_archive_all.

    672743     672743     672743  TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=5507736 pr=163043 pw=0 time=535914552 us cost=3 size=27 card=1)
    672743     672743     672743   INDEX UNIQUE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=4560824 pr=163043 pw=0 time=533161038 us cost=2 size=0 card=1)(object id 42811947)
    755121     755121     755121    SORT AGGREGATE (cr=3540960 pr=163043 pw=0 time=530079821 us)
   1040963    1040963    1040963     TABLE ACCESS BY INDEX ROWID PO_LINE_LOCATIONS_ARCHIVE_ALL (cr=3540960 pr=163043 pw=0 time=534243973 us cost=5 size=15 card=1)
   1776649    1776649    1776649      INDEX RANGE SCAN PO_LINE_LOCATIONS_ARCHIVE_U1 (cr=1123074 pr=6392 pw=0 time=37128373 us cost=3 size=0 card=2)(object id 42811947)

This part of the workload comes from 672,743 executions of the subquery starting at line 36 of the original query text:

           (SELECT pla.promised_date
              FROM apps.po_line_locations_archive_all pla
             WHERE     pla.po_header_id = pha.po_header_id
                   AND pla.po_line_id = pll.po_line_id
                   AND pla.line_location_id = ploc.line_location_id
                   AND pla.revision_num =
                       (SELECT MIN (revision_num)
                          FROM apps.po_line_locations_archive_all plla2
                         WHERE     plla2.promised_date IS NOT NULL
                               AND plla2.line_location_id =
                                   ploc.line_location_id))

If we want to improve the performance of this query with a minimum of re-engineering, recoding and risk then a good point to start would be to examine this query block in isolation and see if there is a simple, low-cost way of improving its efficiency. (Note: this may not be a route to optimising the whole query “properly”, but it may give a quick win that is “good enough”.)

We could go a little further down this route of optimising the scalar subqueries by looking at the time spent in each of them in turn. Taking out the top line of each of the separate sections of the plan and extracting just the pr, pw and time values (which I’ll scale back from microseconds to seconds) we get the following

pr=      0      pw=0    time=   0
pr=     28      pw=0    time=  23
pr=      0      pw=0    time=   2
pr=      0      pw=0    time=   2
pr= 163043      pw=0    time= 536
pr=   3564      pw=0    time=  64
pr=      1      pw=0    time=   0
pr=   5535      pw=0    time= 182
pr=      6      pw=0    time=   9
pr=      0      pw=0    time=   1

The 8th scalar subquery (line 42 in the plan, line 75 in the query) gives us an opportunity to reduce the run time by 182 seconds, so might be worth a little investment in programmer time.

The 6th subquery (line 34 in the plan, line 49 in the query) adds only 64 seconds to the run time, so we might be less inclined to do anything about it.

You might note that the 2nd, 3rd and 4th subqueries are against the same table with the same predicate to get three different columns – this group is the “obvious” choice for recoding as a single join rather than three separate subqueries, but if you look at the total times of the three subqueries the “extra” two executions add only two seconds each to the total time – so although this scalar subquery coding pattern is undesirable, it’s not necessarily going to be worth expending the effort to rewrite it in this case.

If you’re wondering, by the way, why different subqueries are reporting different numbers of rows returned (and each one should return at most one row on each execution), there are two reasons for any subquery to be reporting fewer than the 768,890 rows reported by the basic driving hash join:

an execution may simply return no rows,
there may be some benefits from scalar subquery caching.

One of the nice details about newer versions of Oracle is that the “starts” statistic is also reported in the trace/tkprof output so you would be able to see how much your query had benefited from scalar subquery caching.

If we add together the time reported by each of the scalar subquery sections of the plan the total time reported is approximately 819 seconds. Cross-checking with the difference in the times reported for operations 61 and 62 (hash unique of hash join) we see: 1,059 seconds – 245 seconds = 814 seconds. This is a good match (allowing for the accumulation of a large number of small errors) for the 819 seconds reported in the subqueries – so the hash unique isn’t a significant part of the query even though it has virtually no effect on the volume of data. You’ll note that it didn’t spill to disc (pw = 0) but completed in memory.

Summary

I’ve written a quick note on this query because the coding style was undesirable and the execution plan quite lengthy. I’ve reviewed how the style of the SQL is echoed in the shape of the plan. I’ve then pursued the idea of optimising the code “piece-wise” to see if there were any opportunities for improving the performance “enough” without going through the effort of a complete redesign of the query. [Update: One of the participants in the thread is currently walking through the mechanics of manually unnesting the most expensive scalar subquery into an outer join.]

Given the information in the Row Source Operation section of the tkprof output it proved easy to identify where the largest amounts of times appeared that might be reduced by localised optimsation.

In passing I pointed out the possibility of reducing the time spent on network traffic by increasing the array fetch size, and increasing the SDU size (at both ends of the connection) for the SQL*Net messages to client.

Footnote (addendum)

I made a passing reference to the waits that told us that the user was running RAC. These waits merit a few follow-up comments.

The numbers for “gc” waits are high. Of particular note are the 71,000 waits and 118 seconds waited on “gc current block busy” which wave a big red flag telling us that there’s too much DML modifying the same object(s) from multiple nodes at the same time. (The even larger numbers for the “gc current block 2/3- way” say the same, but “busy” really emphasises the “hot-spot” aspect of the problem.)

Ideally we would like to see exactly where in the execution plan the bulk of those waits is occurring and, since the OP has been able to supply a trace file for the query, it’s possible that the query can be re-run to produce the SQL Monitor report (if the OP is suitably licenced) that summarises the Active Session History (ASH) for each line of the plan.

If the ASH data were available for a run of the report we could then do some analysis of parameter values recorded in v$active_session_history to see if that supplied further information. Unfortunately the view v$event_name doesn’t tell us what the parameter values mean for most of the “gc current%” waits, but a couple of the ones that do have descriptions report parameter1 as the file# and parameter2 as the block#, so maybe that’s true for many of them. (For some of the waits parameter1 is listed as the “le” (lock element), which doesn’t really help very much.)

Another diagnostic that could be very helpful is to take a snapshot of the session activity stats (v$sesstat) for the session as this might tell us that part of the load comes from “unexpected” work going on. In particular if we do an analysis of the “db file sequential read” waits we may find that many of the waits are for blocks in the undo tablespace, which would prompt us to examine the session stats to see what they tell us through the “% – undo records applied” statistics.

As a final comment – linking the “gc” activity back to my comment “That’s a lot of scalar subqueries so it’s worth asking whether the code should be rewritten to use joins” – if you add an extra table to a query with a simple join condition to add columns from that table to the query select list then Oracle can pin various index blocks; if you replace the join with a scalar subquery (which means you’re after just one column from one row each time) then Oracle has to pin and release all the index blocks on each call to the subquery. The benefit of the subquery approach is that scalar subquery caching may mean the subquery is rarely executed (check the highly suggestive stats in the plan for the first and seventh scalar subquery blocks – lines 16 and 39); the downside to the subquery approach is that you may end up spending a lot more time in buffer cache handling which, for RAC, includes the global cache (gc) management.

Comments (1)

July 26, 2022

Hinting

Filed under: Execution plans,Hints,Oracle,Upgrades — Jonathan Lewis @ 1:05 pm BST Jul 26,2022

This is just a lightweight note on the risks of hinting (which might also apply occasionally to SQL Plan Baselines). I’ve just rediscovered a little script I wrote (or possibly last tested/edited) in 2007 with a solution to the problem of how to structure a query to use an “index fast full scan” (index_ffs) following by a “table access by rowid” – a path that is not available to the optimizer for select statements (even when hinted) though it became available (sometimes inappropriately) for deletes and updates in 12c.

It’s possible that this method was something I designed for a client using 9i, but the code still behaves as expected in 11.1.0.7. Here’s the setup and query:

rem
rem     Script:         wildcard.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2007
rem
rem     Last tested
rem             11.1.0.7
rem

create table t1
as
select
        cast(dbms_random.string('a',8) as varchar2(8))  str,
        rpad('x',100)                                   padding
from
        all_objects
where
        rownum <= 10000
;

alter table t1 modify str not null;
create index t1_i1 on t1(str);

begin
        dbms_stats.gather_table_stats(
                user, 't1', 
                cascade => true,
                method_opt => 'for all columns size 1'
        );
end;
/

explain plan for
select  
        /*+ 
                qb_name(main) 
                unnest(@subq1)
                leading(@sel$2fc9d0fe t1@subq1 t1@main)
                index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))
                use_nl(@sel$2fc9d0fe t1@main)
                rowid(@sel$2fc9d0fe t1@main)
        */
        * 
from    t1 
where   rowid in (
                select  /*+ qb_name(subq1) */
                        rowid 
                from    t1 
                where   upper(str) like '%CHD%'
)
;

select * from table(dbms_xplan.display(format=>'outline alias'));

As you can see, I’ve got an IN subquery (query block subq1) to generate a list of rowids from the table for the rows that match my predicate and then my main query (query block main) selects the rows identified by that list.

I’ve added hints to the main query block to unnest the subquery (which will result in a new query block appearing) then do a nested loop from the t1 referenced in subq1 (t1@subq1) to the t1 referenced in main (t1@main), starting with an index fast full scan of t1@subq1 and accessing t1@main by rowid.

The unnest hint was actually redundant – unnesting happened automatically and uncosted. You’ll notice all the other hints are directed at a query block called sel$2fc9d0fe which is the resulting query block name when subq1 is unnested into main.

Here’s the resulting execution plan showing, amongst other details in the Outline Data, that this really was running on 11.1.0.7

Plan hash value: 1953350015

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 65500 |   509   (0)| 00:00:07 |
|   1 |  NESTED LOOPS               |       |   500 | 65500 |   509   (0)| 00:00:07 |
|*  2 |   INDEX FAST FULL SCAN      | T1_I1 |   500 | 10500 |     9   (0)| 00:00:01 |
|   3 |   TABLE ACCESS BY USER ROWID| T1    |     1 |   110 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$2FC9D0FE
   2 - SEL$2FC9D0FE / T1@SUBQ1
   3 - SEL$2FC9D0FE / T1@MAIN

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$2FC9D0FE" "T1"@"MAIN")
      LEADING(@"SEL$2FC9D0FE" "T1"@"SUBQ1" "T1"@"MAIN")
      ROWID(@"SEL$2FC9D0FE" "T1"@"MAIN")
      INDEX_FFS(@"SEL$2FC9D0FE" "T1"@"SUBQ1" ("T1"."STR"))
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ1")
      OUTLINE_LEAF(@"SEL$2FC9D0FE")
      ALL_ROWS
      DB_VERSION('11.1.0.7')
      OPTIMIZER_FEATURES_ENABLE('11.1.0.7')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(UPPER("STR") LIKE '%CHD%')

Running the test under 19.11.0.0 (and adding the hint_report option to the dbms_xplan format) this is the resulting plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 | 55000 |    47   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 | 55000 |    47   (0)| 00:00:01 |
--------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$48592A03 / T1@MAIN

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$48592A03" "T1"@"MAIN")
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      ELIMINATE_SQ(@"SUBQ1")
      OUTLINE_LEAF(@"SEL$48592A03")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("T1"."STR") LIKE '%CHD%')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 5 (U - Unused (1), N - Unresolved (4))
---------------------------------------------------------------------------
   0 -  SEL$2FC9D0FE
         N -  index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))
         N -  leading(@sel$2fc9d0fe t1@subq1 t1@main)
         N -  rowid(@sel$2fc9d0fe t1@main)
         N -  use_nl(@sel$2fc9d0fe t1@main)

   0 -  SUBQ1
         U -  unnest(@subq1)

Clearly the plan has changed – but the hint report says that Oracle has NOT ignored my hints; instead it tells us that they cannot be resolved. If we check the Query Block Name / Object Alias list and the Outline Data we see why: there is no query block named @sel$2fc9d0fe and the reason it doesn’t exist is that the optimizer has applied a previously non-existent transformation ‘eliminate_sq’ (which appeared in 12c) to subq1.

So, on the upgrade from 11.1.0.7 to 19.11.0.0 an SQL Plan Baseline that forced the path we wanted would no longer work (though it might be reported as “applied”) because there is a new transformation that we had (necessarily) not been blocking.

The solution is easy: add the hint no_eliminate_sq(@subq1) to our query and try again.

We still get the full tablescan even though the hint report tells us that the optimizer used the new hint. Here’s the new Outline Data, and the Hint Report showing that the hint was used.

  Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$8C456B9A" "T1"@"SUBQ1")
      OUTLINE(@"SUBQ1")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ1")
      OUTLINE(@"SEL$2FC9D0FE")
      ELIMINATE_JOIN(@"SEL$2FC9D0FE" "T1"@"MAIN")
      OUTLINE_LEAF(@"SEL$8C456B9A")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("STR") LIKE '%CHD%')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 7 (U - Unused (4))
---------------------------------------------------------------------------
   0 -  SUBQ1
           -  no_eliminate_sq(@subq1)
           -  qb_name(subq1)

   1 -  SEL$8C456B9A
         U -  leading(@sel$2fc9d0fe t1@subq1 t1@main)
           -  qb_name(main)

   1 -  SEL$8C456B9A / T1@MAIN
         U -  rowid(@sel$2fc9d0fe t1@main)
         U -  use_nl(@sel$2fc9d0fe t1@main)

   1 -  SEL$8C456B9A / T1@SUBQ1
         U -  index_ffs(@sel$2fc9d0fe t1@subq1(t1.str))

But now the Outline Data is showing us a new hint – eliminate_join(@sel$2fc9dofe t1@main). So we’re not losing the subquery, but we’ve lost the join thanks to a transformation that was actually available in 10.2 but presumably couldn’t be applied to our code pattern until at least 12.1. So let’s try again adding in another blocking hint no_eliminate_join(@sel$2fc9dofe t1@main).

We still get the full tablescan – and this time the Outline Data tells us that the problem hint is now eliminate_join(@sel$2fc9dofe t1@subq1) – which we might have anticipated, and now address by adding no_eliminate_join(@sel$2fc9dofe t1@subq1) to the query and having one more go. This finally gets us back to the path that we had previously seen in 11.1.0.7.

(In fact, adding the hint optimizer_features_enable(‘11.1.0.’) to the original set of hints would – in this case, at least – have been enough to get the original execution plan.)

Summary

This note is just another simple demonstration that hints do not guarantee plan stability across upgrades – and then showing that it can take a few experimental steps to discover what’s new in the optimizer that is making your previous set of hints ineffective.

Typically the problem will be the availability of new transformations (or enhancements to existing transformations) which manage to invalidate the old hints before the optimizer has had a chance to consider them. This is (to some extent) why a SQL Plan Baseline always captures the value of optimiser_features_enable() as part of the baseline.

Comments (8)

July 22, 2022

Trim CPU

Filed under: Execution plans,Hash Join,Joins,Oracle,Performance,Problem Solving — Jonathan Lewis @ 6:56 am BST Jul 22,2022

Prompted by an unexpectedly high CPU usage on a hash join of two large dadta sets Stefan Koehler posted a poll on twitter recently asking for opinions on the following code fragment:

FROM
    TAB1
INNER JOIN TAB2 ON
    TAB1.COL1 = TAB2.COL1
AND TRIM(TAB1.COL3) > TRIM(TAB2.COL3)

While I struggle to imagine a realistic business requirement for the second predicate and think it’s indicative of a bad data model, I think it is nevertheless quite instructive to use the example to show how a hash join can use a lot of CPU if the join includes a predicate that isn’t on equality.

Trivia

Before examining the potential for wasting CPU, I’ll just point out two problems with using the trim() function in this way – because (while I hope that col3 is character string in both tables) I’ve seen code that uses “to_date(to_char(date_column))” instead of trunc(date_column):

Cut-n-paste from SQL*Plus:

SQL> select 1 from dual where trim(100) > trim(20);

no rows selected

==================================================================

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

SQL> select d1, d2 from t2 where trim(d1) > trim(d2);

20-jul-2022 15:24:46 19-aug-2022 15:26:44

1 row selected.

SQL> alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';

SQL> select d1, d2 from t2 where trim(d1) > trim(d2);

no rows selected

The trim() function converts numerics and dates to strings using the default format for the session before the comparison takes place, so not only can you get unexpected (i.e. wrong) results, two users can get contradictory results from the same data at the same time because they’ve specified different session defaults!

The CPU issue

The critical point that everyone should remember is this: hash joins can only operate on equality (though, to avoid ambiguity, one should point out that “equality” does also mean “not equals”, which is why hash anti-joins can be efficient).

This means that even though the clause “where tab1.col1 = tab2.col1 and tab1.col3 > tab2.col3” might specify the matching rows for an individual tab1 row with high precision and great efficiency for a nested loop join with the right index, a hash join has a completely different workload. Every tab1 row has to have its col3 compared with every tab2 row that matches on col1. The secondary tests multiply up to “n-squared”, and if any col1 value is is highly repetitive then the work done on checking col3 becomes excessive.

It’s easier to see this in a worked example, so here’s some sample data:

rem
rem     Script:         trim_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table tab1 as select * from all_Objects where owner != 'PUBLIC' and object_type != 'SYNONYM' and rownum <= 200;

create table tab2 as select * from all_Objects where owner != 'PUBLIC' and object_type != 'SYNONYM';

On a new pdb in 19.11 and 21.3 the second statement gave me roughly 46,000 rows. checking owners and row counts I got the following results:

SQL> select owner, count(*) from tab1 group by owner;

OWNER                      COUNT(*)
------------------------ ----------
SYS                             128
SYSTEM                           65
OUTLN                             7

SQL> select owner, count(*) from tab2 group by owner;

OWNER                      COUNT(*)
------------------------ ----------
SYS                           40104
SYSTEM                          417
OUTLN                             7

... plus about 17 rows aggregating 6,000 rows

And here’s the query (indicating 4 variations) that I’m going to use to demonstrate the CPU issue, followed by its execution plan and rowsource_execution_statistics:

set serveroutput off
alter session set statistics_level = all;

select
        count(*)
from
        tab1
inner join 
        tab2 
on
        tab1.owner = tab2.owner
-- and  trim(tab1.object_name)  > trim(tab2.object_name)
-- and  rtrim(tab1.object_name) > rtrim(tab2.object_name)
-- and  ltrim(tab1.object_name) > ltrim(tab2.object_name)
and     tab1.object_name > tab2.object_name
;

select * from table(dbms_xplan.display_cursor(format=>'projection allstats last'));

SQL_ID  74m49y5av3mpg, child number 0
-------------------------------------
select  count(*) from  tab1 inner join  tab2 on  tab1.owner =
tab2.owner -- and trim(tab1.object_name)  > trim(tab2.object_name) -- and rtrim(tab1.object_name) > rtrim(tab2.object_name) 
-- and ltrim(tab1.object_name) > ltrim(tab2.object_name) and tab1.object_name > tab2.object_name

Plan hash value: 2043035240

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.39 |     942 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.39 |     942 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    101K|    329K|00:00:00.39 |     942 |  1335K|  1335K|  814K (0)|
|   3 |    TABLE ACCESS FULL| TAB1 |      1 |    200 |    200 |00:00:00.01 |       5 |       |       |          |
|   4 |    TABLE ACCESS FULL| TAB2 |      1 |  46014 |  46014 |00:00:00.01 |     937 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("TAB1"."OWNER"="TAB2"."OWNER")
       filter("TAB1"."OBJECT_NAME">"TAB2"."OBJECT_NAME")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=1; rowset=407)
   3 - (rowset=256) "TAB1"."OWNER"[VARCHAR2,128], "TAB1"."OBJECT_NAME"[VARCHAR2,128]
   4 - (rowset=256) "TAB2"."OWNER"[VARCHAR2,128], "TAB2"."OBJECT_NAME"[VARCHAR2,128]

Comparing the basic colums the CPU time recorded at the Hash Join operation was 0.39 seconds, of which only a tiny amount was in the feeding tablescans. There are two things to note from the plan.

First is confirmation of my comments about the join having to be an equality and the inequality being applied later. You can see this in the Predicate Information in the way the user’s predicate list has been split at operation 2 into an access() predicate and a filter() predicate. The access predicate finds the right hash bucket and row(s) within bucket – the filter predicate is applied as a secondary test.

The second point to note is that the Column Projection Information shows us that the basic column values are passed up to the Hash Join, which tells us that the hash join operation has to do the trimming. The big question at that point is – how many times does the same value from the same incoming row get trimmed.

Remember that there are 128 rows in tab1 where where owner = ‘SYS’, so when a ‘SYS’ row arrives from tab2 the hash join has to find the right bucket then walk through the rows in that bucket (which will probably be nothing but those SYS rows). So how many times does Oracle evaluate trim(SYS). Arguably it needs to for each tab1 row in the bucket (though the hash table might have been built to include the trimmed value) but clearly it ought not to re-evaluate it 128 times for the column in the single tab2 row – and we’ll come back to that point later.

Let’s go back to the 3 variants on the first test. We were interested in the comparing trim() with trim(), but since trim() is equilavent to ltrim(rtrim()) I wondered if ltrim (left trim) and rtrim (right trim) took different amount of time, and whether the trim() time would be close to the sum of ltrim() time and rtrim() time.

Without showing the plans etc. here are the time reported in my 19.11.0.0 test at the hash join operation (the 21.3 times were very similar):

no trim – 0.39 seconds
ltrim() – 1.02 seconds
rtrim() – 2.66 seconds
trim() – 2.70 seconds.

Clearly that’s a lot of extra CPU on top of the base CPU cost. This is not entirely surprising since string operations tend to be expensive, neverthless the differences are large enough to be more than random fluctuations and operational error.

Remember that this is just two tables of 200 and 46,000 rows respectively. It turned out that the rowsources that Stefan was using were in the order of 800K and 2M rows – with CPU time increasing from 1,100 seconds to 2,970 seconds because of the trim().

So how many times was the trim() function called in total?

Faking it.

If we assume that the trim() built-in SQL function behaves in the same way as a deterministic PL/SQL function we can at least count the number of calls that take place by writing a deterministic function to put into the SQL. Something like:

create or replace package p1 as
        n1 number;
        function f1(v1 in varchar2) return varchar2 deterministic;
end;
/

create or replace package body p1 as 

        function f1 (v1 in varchar2)
        return varchar2 
        deterministic
        is
        begin
                p1.n1 := p1.n1 + 1;
                return trim(v1);
        end;

end;
/

set serveroutput off
alter session set statistics_level = all;

exec p1.n1 := 0

select
        count(*)
from
    tab1
inner join tab2 on
    tab1.owner = tab2.owner
and     p1.f1(tab1.object_name) > p1.f1(tab2.object_name)
-- and  p1.f1(tab1.object_name) > trim(tab2.object_name)
-- and  trim(tab1.object_name)  > p1.f1(tab2.object_name)
;

select * from table(dbms_xplan.display_cursor(format=>'projection allstats last'));

set serveroutput on
execute dbms_output.put_line(p1.n1);

I’ve created a package with a public variable n1 so that I can set it and read it from “outside”, then I’ve created (and lied about) a function that increments that variable and returns its input, claiming that it’s deterministic. Once I’ve got the package in place I’ve:

set the variable to zero
run a query that does one of
- use my function twice
- use my function once – on the build table
- use my function once – on the probe table
report the execution plan with stats
print the value of the variable

The timings are not really important, but here’s the execution plan when I used the function on both sides of the inequality:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:21.15 |    1513 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:21.15 |    1513 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |  23007 |    329K|00:00:21.13 |    1513 |  1335K|  1335K|  860K (0)|
|   3 |    TABLE ACCESS FULL| TAB1 |      1 |    200 |    200 |00:00:00.01 |       5 |       |       |          |
|   4 |    TABLE ACCESS FULL| TAB2 |      1 |  46014 |  46014 |00:00:00.02 |     937 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TAB1"."OWNER"="TAB2"."OWNER")
       filter("P1"."F1"("TAB1"."OBJECT_NAME")>"P1"."F1"("TAB2"."OBJECT_NAME"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=1)
   3 - (rowset=256) "TAB1"."OWNER"[VARCHAR2,128], "TAB1"."OBJECT_NAME"[VARCHAR2,128]
   4 - "TAB2"."OWNER"[VARCHAR2,128], "TAB2"."OBJECT_NAME"[VARCHAR2,128]

Apart from the change of function name the plan is the same – although it now takes over 21 CPU seconds to complete, of which most of the time is probably building and tearing down the PL/SQL stack. The important figure, though is the number of function calls I saw recorded in p1.n1: it was a little over 10 million calls to generate the 329 thousand rows (A-Rows for the hash join).

When I ran the code with only one call to my deterministic function it was called 5 million times regardless of whether it was used for the build or probe table. Oracle did nothing to minimise the number of times the function was called.

Predictive Mode

Near the start of this note I showed you a little query to aggregate the rows of the two tables by owner; with a little enhancement I can reuse that code to show you exactly how many times the deterministic function was called:

select
        v1.owner, ct1, ct2, ct1 * ct2, sum(ct1 * ct2) over() tot_ct
from
        (select owner, count(object_name) ct1 from tab1 group by owner) v1,
        (select owner, count(object_name) ct2 from tab2 group by owner) v2
where
        v2.owner = v1.owner
/

OWNER                  CT1        CT2    CT1*CT2     TOT_CT
--------------- ---------- ---------- ---------- ----------
SYS                    128      40104    5133312    5160466
SYSTEM                  65        417      27105    5160466
OUTLN                    7          7         49    5160466

3 rows selected.

The number of comparisons done by the filter() predicate 5,160,466: double it to get the number of function calls. For every single one of the 40,104 SYS rows in tab2 the function was called for every single one of the SYS rows in tab1, for both sides of the inequality.

It’s a shame that Oracle doesn’t calculate and project the “virtual columns” that will be used in the join predicates, because in my case that would have reduced the number of calls from 10 million to 40,232 – a factor of roughly 250. That would probably be worth a lot of CPU to Stefan.

Damage Limitation

For my silly little query that went from 0.39 seconds to 2.70 seconds you might decide there’s no point in trying to improve things – in fact many of the sites I’ve visited probably wouldn’t even notice the CPU wastage (on one call); but when the query runs for 2,970 seconds and a little fiddling around shows that it could run in 1,100 seconds you might be inclined to see if there’s something you could do improve things.

Andrew Sayer suggested the possibility of rewriting the query with a pair of CTEs (“with” subqueries) that were forced to materialize the trim() in the CTE. The cost of physically creating the two large GTTs might well be much less than the CPU spent on the trim()ed join.

Alternatively – and dependent on the ownership and quality of the application – you could write a check constraint on each table to ensure that the column value was always equal to the trim() of the column value.

A similar option would be to add an (invisible) column to each table and use a trigger to populate the column with the trimmed value and then use the trimmed column in the query.

Conclusion

I don’t think that anything I’ve done or described in this note could be called rocket science (or telescope science as, perhaps, it should be in honour of Webb); but it has shown how much insight you can gain into what Oracle is doing and how you may be able to pin-point excess work using a few simple mechanisms that have been around for more than 15 years.

Comments (4)

March 12, 2024

December 21, 2023

Workaround

Best Guess

Post script

November 20, 2023

Summary

November 6, 2023

Frivolous Footnote

November 1, 2023

Summary

Addendum

October 27, 2023

plan_stat=adaptive

Is it good enough?

October 18, 2023

“Cosmetic” effects

Further Observations

Summary

October 5, 2023

Workaround

Summary

September 5, 2023

Investigation

Resolution

Summary

June 26, 2023

Analysis of subquery

Option 1

Option 2

Option 3

Option 4

Conclusion

Footnote

January 6, 2023

Things we don’t know

Basic threats

Observations and Guesswork

Joins

Conclusion

November 15, 2022

Summary

Footnote

September 29, 2022

Summary

Footnote (addendum)

July 26, 2022

Summary

July 22, 2022

Trivia

The CPU issue

Faking it.

Predictive Mode

Damage Limitation

Conclusion

Click on the motto below

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS