Oracle Scratchpad

April 13, 2022

Adaptive Joins

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 1:53 pm BST Apr 13,2022

There’s a question on the Oracle Forums at the moment about a query that’s taking a long time to parse. Even after being reduced to one prebuilt (currently remote) table with two non-correlated outer joins to it the parse time is several hundred seconds. This seems fairly bizarre – I have seen some very long parse times from Oracle, but 10 minutes for 3 tables is well over the top; it did remind me, though of a note I started a few years ago of a 4 table join taking 4 seconds to parse, so I thought I’d present the query, the plan, and a little chat on debugging. Here’s the query:

select
        /*+ parallel(t4,3) */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

I’m expecting a simple cascade of hash joins, with t1, t2 and t3 – the “small” tables – turning into “build” tables, then t4 – the “large” table – passing through each of them in turn until the penultimate rowsource is aggregated.

Here’s the execution plan — which looks pretty much as I expected it to – but there’s something wrong about it that isn’t visible in the output. Why is the query (plan) saying it took 0.07 seconds to complete (A-time), returning only 60 rows, when my SQL*Plus session didn’t return any data for 4 seconds

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   300 (100)|          |        |      |            |     60 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR                |          |      1 |        |            |          |        |      |            |     60 |00:00:00.07 |       5 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)          | :TQ10004 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,04 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    HASH GROUP BY               |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |     60 |00:00:00.01 |       0 |      0 |  1394K|  1394K|     3/0/0|
|   4 |     PX RECEIVE                 |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |    180 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND HASH              | :TQ10003 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,03 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY            |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,03 | PCWP |            |    180 |00:00:00.14 |    6114 |   6018 |  1394K|  1394K|     3/0/0|
|*  7 |        HASH JOIN               |          |      3 |   8460 |   299   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.14 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|   8 |         JOIN FILTER CREATE     | :BF0000  |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|   9 |          PX RECEIVE            |          |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|  10 |           PX SEND BROADCAST    | :TQ10000 |      0 |     22 |     2   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |            PX SELECTOR         |          |      3 |        |            |          |  Q1,00 | SCWC |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 12 |             TABLE ACCESS FULL  | T3       |      1 |     22 |     2   (0)| 00:00:01 |  Q1,00 | SCWP |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 13 |         HASH JOIN              |          |      3 |  27322 |   297   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  14 |          JOIN FILTER CREATE    | :BF0001  |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  15 |           PX RECEIVE           |          |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  16 |            PX SEND BROADCAST   | :TQ10001 |      0 |     21 |     2   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  17 |             PX SELECTOR        |          |      3 |        |            |          |  Q1,01 | SCWC |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 18 |              TABLE ACCESS FULL | T2       |      1 |     21 |     2   (0)| 00:00:01 |  Q1,01 | SCWP |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 19 |          HASH JOIN             |          |      3 |  92953 |   294   (3)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  20 |           JOIN FILTER CREATE   | :BF0002  |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  21 |            PX RECEIVE          |          |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  22 |             PX SEND BROADCAST  | :TQ10002 |      0 |     19 |     2   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  23 |              PX SELECTOR       |          |      3 |        |            |          |  Q1,02 | SCWC |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|* 24 |               TABLE ACCESS FULL| T1       |      1 |     19 |     2   (0)| 00:00:01 |  Q1,02 | SCWP |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|  25 |           JOIN FILTER USE      | :BF0000  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  26 |            JOIN FILTER USE     | :BF0001  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  27 |             JOIN FILTER USE    | :BF0002  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  28 |              PX BLOCK ITERATOR |          |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWC |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|* 29 |               TABLE ACCESS FULL| T4       |     48 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.06 |    6114 |   6018 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
  12 - SEL$1 / T3@SEL$1
  18 - SEL$1 / T2@SEL$1
  24 - SEL$1 / T1@SEL$1
  29 - SEL$1 / T4@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_fix_control' '16923858:5')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T4"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T3"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T4"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$1")
      USE_HASH(@"SEL$1" "T4"@"SEL$1")
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$1" "T3"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T4"@"SEL$1" BROADCAST NONE)
      PX_JOIN_FILTER(@"SEL$1" "T4"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T2"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T3"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T3"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T2"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T3"@"SEL$1")
      GBY_PUSHDOWN(@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T3"."ID"="T4"."ID3")
  12 - filter((TO_NUMBER("T3"."SMALL_VC")=1 OR TO_NUMBER("T3"."SMALL_VC")=2 OR TO_NUMBER("T3"."SMALL_VC")=3 OR TO_NUMBER("T3"."SMALL_VC")=4 OR TO_NUMBER("T3"."SMALL_VC")=5))
  13 - access("T2"."ID"="T4"."ID2")
  18 - filter((TO_NUMBER("T2"."SMALL_VC")=1 OR TO_NUMBER("T2"."SMALL_VC")=2 OR TO_NUMBER("T2"."SMALL_VC")=3 OR TO_NUMBER("T2"."SMALL_VC")=4))
  19 - access("T1"."ID"="T4"."ID1")
  24 - filter((TO_NUMBER("T1"."SMALL_VC")=1 OR TO_NUMBER("T1"."SMALL_VC")=2 OR TO_NUMBER("T1"."SMALL_VC")=3))
  29 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0002,"T4"."ID1"),SYS_OP_BLOOM_FILTER(:BF0001,"T4"."ID2"),SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID3")))


One possible trouble-shooting step is simply to re-run the query, taking a snapshot of the session activity stats (v$mystat) and the session events (v$session_event) to see what they tell you (if anything) – here’s the critical extract from the session stats:

Name                                      Value
----                                      -----
CPU used when call started                  423
CPU used by this session                    429
parse time cpu                              411
parse time elapsed                          419

Most of the time is parse time, spent on the CPU. (If this had been a much larger scale problem and had occurred in the recent past I might have looked at ASH (v$active_session_hsitory) for any samples for the correct SQL_ID, and seen the problem revealed in the in_parse, in_hard_parse columns.

So let’s enable event 10053 and run the query again – but since it’s “only” 4 seconds, let’s tweak the timer option to report any step that took longer than 0.1 seconds. The default timer setting is a whole second (10^6 microseconds), so we set the fix-control to 5 to get 0.1 seconds (10^5 microseconds).

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

-- run the query, find the trace file

 grep TIMER or19_ora_23370.trc 

Here’s the output from the call to grep: it looks like group by placement (GBP) is causing a problem.

TIMER:  GBP: costing SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER: GBP: iteration (#1) SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER:  GBP: costing SEL$1565E019 cpu: 0.293 sec elapsed: 0.298 sec
TIMER: GBP: iteration (#2) SEL$1565E019 cpu: 0.294 sec elapsed: 0.299 sec
TIMER:  GBP: costing SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER: GBP: iteration (#3) SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER:  GBP: costing SEL$B5D97CA0 cpu: 0.533 sec elapsed: 0.540 sec
TIMER: GBP: iteration (#4) SEL$B5D97CA0 cpu: 0.534 sec elapsed: 0.540 sec
TIMER:  GBP: costing SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.531 sec
TIMER: GBP: iteration (#5) SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.532 sec
TIMER:  GBP: costing SEL$ED1298E3 cpu: 0.522 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#8) SEL$ED1298E3 cpu: 0.523 sec elapsed: 0.524 sec
TIMER:  GBP: costing SEL$5968095A cpu: 0.523 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#9) SEL$5968095A cpu: 0.524 sec elapsed: 0.523 sec
TIMER:  GBP: costing SEL$4CA81688 cpu: 0.525 sec elapsed: 0.527 sec
TIMER: GBP: iteration (#12) SEL$4CA81688 cpu: 0.526 sec elapsed: 0.528 sec
TIMER: Group-By Placement SEL$1 cpu: 3.766 sec elapsed: 3.793 sec
TIMER: Cost-Based Transformations (Overall) SEL$1 cpu: 3.769 sec elapsed: 3.795 sec
TIMER: Access Path Analysis (Final) SEL$1 cpu: 0.288 sec elapsed: 0.289 sec
TIMER: SQL Optimization (Overall) SEL$1 cpu: 4.072 sec elapsed: 4.108 sec

If you check further up the page, though, you’ll see in the Outline Information that Oracle has not used group by placement (it has done a “group by pushdown” but that’s different, and relates to aggregation in parallel execution. So one quick hack we could try is to add the hint /*+ no_place_group_by(@sel$1) */ to the query just to see what happens – and here’s the effect on the parse time:

Name                                      Value
----                                      -----
parse time cpu                               33
parse time elapsed                           34

Problem solved – provided we can get the hint into the code (by hand, or SQL Patch, etc.) But the question still remains: where did the time go? The trace file was fairly long (375,000 lines for the original, compared to 32,000 for the hinted) but a rapid scan seemed in order – and something very quickly caught my attention. It was pretty easy to spot because something big and nasty had happened 8 times.

The answer was in “Adaptive Plans”, which (mostly) get flagged with the label “AP:” in the 10053 trace file, for example:

AP: Computing costs for inflection point at min value 0.00
AP: Using binary search for inflection point search
AP: Costing Join for ADM inflection point at card 0.00
AP: Costing Join for ADM inflection point at card 0.00

You can see here that the optimizer is searching for an “inflection point”, that means it’s “thinking about” an adaptive join, and searching for the number of rows where a switch between a nested loop join and a hash join makes sense.

Notice particularly the comment about “using binary search”. After calculating the cost of the “best” join using the current estimates of cardinality for the incoming rowsource the optimizer starts calculating the possible costs (nested loop or hash, it ignores merge) for a set of alternative cardinalities until it finds the cardinality where the nested loop join and hash join have the same cost. That’s the number of rows that Oracle will use at run time to decide whether it should switch from its initial selection of join method to the alternative.

If the initial join selection was a hash join (i.e. large volume startegy) Oracle will use an alternative cardinality of 1 to start its list of estimates – on the assumption that that would produce a nested loop plan and then keep doubling the estimate until the resulting plan switched to a hash join, then close in on the break point by halving and doubling the remaining gap between the NL estimate and the HJ estimate.

If the initial join selection was a nested loop join Oracle will use a “worst case scenario” for the incoming estimate (acting, broadly speaking, as if every filter at that point had had a selectivity of 1 – i.e. 100% of the available data), and start by halving the estimate. This is fine when the starting estimate is (for example) for the first table in the join order and the esimate is just a few million rows.

Here’s the first estimate in my case (you’ll need a wide screen, or very small print):

6772:AP: Computing costs for inflection point at max value 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00 

After you’ve halved that number about 1,000 times you’re down to values in the region of a few thousand. And when you have to cost a couple of joins every time you halve, and when you’ve gone through the whole process 8 times that’s a lot of CPU.

In fact, with the no_place_group_by hint there was still one point where the optimizer did this adaptive join work – which probably accounts for most of the final 0.33 CPU seconds – but it didn’t show up in any separately reported timer events.

Of course the final little test of this parse time hypothesis is to add the hint /*+ no_adaptive_plan */ – so I did that, and the parse time was still about 0.3 seconds! Investigation of the 10053 trace showed that even with the hint in place the optimizer still went through that one huge binary chop – but when it had worked out the inflection point it printed the message:

AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

According to the hint report the hint was valid, however, so that behaviour looks a little buggy. It then occurred to me that maybe I could have hinted /*+ no_adaptive_plan(@sel$1) */ – and that worked properly with the trace reporting:

AP: Checking validity for query block SEL$1, sqlid=7fjtvwktcmsgq
AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

Had I not realised that this was possible I have fallen back on the hint /*+ opt_param() */ to set the parameter optimizer_adaptive_plans to false for the query in which case the following message (and no other “AP:” message) appeared near the start of the trace:

AP: Adaptive joins bypassed for query block SEL$1 due to disabled by adaptive join parameter

If you want to experiment on your own system here’s the script to create the data – the script name reflects the fact that I found this example by accident while working on something completely different:

rem
rem     Script:         12c_vector_transform_c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2015
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             12.1.0.2

drop table t1;
drop table t2;
drop table t3;
drop table t4;
purge recyclebin;

create table t1 
as
select
        rownum          id,
        to_char(mod(rownum,10)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(mod(rownum,12)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(mod(rownum,14)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

spool 12c_vector_transform_c

set timing on
set linesize 255
set trimspool on
set pagesize 90
set arraysize 100
set serveroutput off

execute snap_my_stats.start_snap

alter session set statistics_level = all;

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

select
        /*+ 
--              opt_param('optimizer_adaptive_plans' 'false')
--              no_adaptive_plan
--              no_adaptive_plan(@sel$1)
--              no_place_group_by(@sel$1)
                parallel(t4,3)
        */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats all note -bytes'));

alter session set events '10053 trace name context off';

set serveroutput on
execute snap_my_stats.end_snap

spool off

March 22, 2022

Upgrade Surprise

Filed under: 19c,Bugs,Oracle,Transformations,Upgrades — Jonathan Lewis @ 10:04 am GMT Mar 22,2022

Here’s a little surprise that showed up in the most recent (March 2022) article that I sent to Simpletalk for the series on transformations. I had been using 19c (19.11.0.0) to create and run my little demos but the editor had used 12.2.0.1 to check the examples and questioned a comment I had made about a “default plan”.

Here’s the query in question. I was using the emp and dept tables from the Scott schema to demonstrate a point about subquery execution:

rem
rem     Script:         unnest_demo_simpletalk_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem 

break on deptno skip 1

select
        /*+ 
                qb_name(main)
                gather_plan_statistics 
        */
        *
from    emp e1
where   e1.sal + nvl(e1.comm,0) > (
                select  /*+ qb_name(subq) */
                        avg(e2.sal + nvl(e2.comm,0))
                from    emp e2
                where   e2.deptno = e1.deptno
        )
order by
        e1.deptno, e1.empno
/

As you can see, I’ve used a correlated aggregate subquery to report all employees who earned more than the average for their department, where “earnings” is calculated as the sum of salary and commission.

Here’s the plan I got when I ran this query under 19c:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |      1 |      6 |00:00:00.01 |      24 |  2048 |  2048 | 2048  (0)|
|*  2 |   FILTER             |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   3 |    TABLE ACCESS FULL | EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   4 |    SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:00.01 |      18 |       |       |          |
|*  5 |     TABLE ACCESS FULL| EMP  |      3 |      5 |     14 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("E1"."SAL"+NVL("E1"."COMM",0)>)
   5 - filter("E2"."DEPTNO"=:B1)

When my editor ran the query on 12.2.0.1, and when I started up an instance of 12.2.0.1 and ran the query, the plan looked like this:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|*  1 |  FILTER              |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|   2 |   SORT GROUP BY      |      |      1 |      4 |     14 |00:00:00.02 |      29 |      6 |  2048 |  2048 | 2048  (0)|
|*  3 |    HASH JOIN         |      |      1 |     70 |     70 |00:00:00.02 |      29 |      6 |  1922K|  1922K| 1053K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9E18A930")
      MERGE(@"SEL$AA0D0E02" >"SEL$B4BE209F")
      OUTLINE(@"SEL$B4BE209F")
      UNNEST(@"SUBQ")
      OUTLINE(@"SEL$AA0D0E02")
      OUTLINE(@"SEL$D6166863")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      FULL(@"SEL$9E18A930" "E2"@"SUBQ")
      FULL(@"SEL$9E18A930" "E1"@"MAIN")
      LEADING(@"SEL$9E18A930" "E2"@"SUBQ" "E1"@"MAIN")
      USE_HASH(@"SEL$9E18A930" "E1"@"MAIN")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("E1"."SAL"+NVL("E1"."COMM",0)>SUM("E2"."SAL"+NVL("E2"."COMM",0))/COUNT("E2"."SAL"+NVL("E2"."COMM",0))
              )
   3 - access("E2"."DEPTNO"="E1"."DEPTNO")

(I’ve added in a request for the ‘outline’ in the display_cursor() format.) The correlated subquery has been unnested and the resulting inline view has been subject to complex view merging. It was only at this point that I realised that the 19c plan was a little surprising and not what I should have expected.

After checking that the configuration and statistics (including the system stats) were the same on the two instances I re-ran the query on 12c with the /*+ no_unnest */ hint to make it use the plan that 19c had produced and I found (as expected) that the plan with filter subquery produced by 19c had a higher cost than the unnesting plan produced by 12c.

Next I re-ran the query on 19c with the /*+ unnest */ hint to make it use the plan that 12c had produced – but it didn’t! 19c “ignored” the hint and carried on using the filter subquery plan. It did, however, supply the following extra information when I added the ‘hint_report’ option to the to the display_cursor() format:

Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  MAIN
           -  qb_name(main)

   4 -  SUBQ
         U -  unnest / Failed basic validity checks
           -  qb_name(subq)

The code in 19c thinks that it would be illegal to unnest the subquery that 12c was unnesting (does this mean that some people may be getting wrong results in 12c). So I checked the 10053 (CBO) trace file to see if there was any further information there that would “explain” the problem. This is what I found:

SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#1)
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

The reference to “Null accepting” looks a little suspect but prompted a few experiments (modifying the data to eliminate nulls, adding not null declarations to columns, simplifying the query etc.) that suggested that the problem was essentially that the optimizer did not want to unnest when the comparison was with the expression (sal + comm) regardless of the operator, and even when all the relevant columns had been populated, declared not null, and the nvl() function had been removed.

It doesn’t seem reasonable in this case, but possibly the block is a generic strategy to avoid errors in some more subtle cases, and perhaps the block will be refined and relaxed in future versions. (Or maybe it’s a bug that the wrong test is being called at this point – there was nothing in the query requiring “null acceptance” by the time I got through the last test.)

I did find a workaround that you could use to avoid any code rewrite:

alter table emp add nvl_earnings 
        invisible 
        generated always as (sal + nvl(comm,0)) 
        virtual
;

There’s seems to be no good reason why this should work – but it does. The subquery unnests and the Predicate Information in the plan doesn’t give any clue that it’s using a virtual column.

Summary:

When you upgrade from 12c there are some queries involving correlated subqueries that no longer unnest the subquery. This may have a significant impact on performance and it may not be possible to bypass the problem unless you rewrite the query to do a manual unnest although I did find a virtual column workaround for my specific example. So far I’ve tested the query on 19.11.0.0 and 21.3.0.0, the behaviour is the same in both versions.

Footnote:

After failing to find anything on MOS about the problem I emailed a draft of this note to Nigel Bayliss at Oracle – who did find a promising match on MOS.

The failure to unnest may be the consequence of the fix for bug 30593046: “A query having a scalar subquery returned a wrong result”. The fix was introduced in 19.9.0.0 but was too restrictive, leading to the creation of bug 33325981: “Query Loses Subquery Unnesting From Plan in 19.9 and Above”.

The fix for 33325981 was distributed in 19.13.0.0 and 21.4.0.0 (plus a couple of earlier RURs, with patches available for various versions back to 12.2.0.1). Unfortunately the newer fix still doesn’t go far enough in reducing the restrictions and my example still doesn’t unnest.

Make sure you check any code that depends on “expression-based” subquery unnesting before you upgrade to 19.9, as it may change plan, which may make a difference to performance and a requirement for a workaround.

March 2, 2022

Generated Predicates

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 11:25 am GMT Mar 2,2022

A question arrived on the MOS Community forum yesterday (needs an account if you want to see the original) that reported a couple of fragments of a CBO trace (10053) file:

----- Current SQL Statement for this session (sql_id=4c85twjpdg8g9) -----
select /*+ 123456 */ count(*) from gl_detail where prepareddatev='2022-01-22 15:00:00'

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "NC63"."GL_DETAIL" "GL_DETAIL" 
WHERE "GL_DETAIL"."PREPAREDDATEV"='2022-01-22 15:00:00' 
AND SUBSTR("GL_DETAIL"."PREPAREDDATEV",1,10)='2022-01-22'

The question was:

Why after transformations ,oracle add condition SUBSTR(“GL_DETAIL”.”PREPAREDDATEV”,1,10)=’2022-01-22′

Mark Powell asked for the execution plan and information about indexes (normal and function-based) and histograms, as well as asking for the Oracle version. I asked about constraints and virtual columns and, in particular, the possibility of a virtual column being used as a partition key.

We didn’t get explicit answers to all our questions, but we did get “no constraints, no virtual columns, no partitioning”, and we also got the full 10053 trace file which, given the simplicity of the query, was mercifully short .. a mere 95KB and 2,800 lines.

The key aid to reading 10053 trace files is knowing what you’re expecting to see before you start looking. And with a generated predicate there was likely to be something that would tell me about about the “column” that caused the predicate to appear and the arithmetic that was the consequence of that predicate coming into existence. So I started with the section headed “SINGLE TABLE ACCESS PATH” where the cardinality estimate (for each individual table) would be calculated. This showed two columns being considered for the single table in the query:

  Column (#77): 
    NewDensity:0.000000, OldDensity:0.000035 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:8314506
  Column (#77): PREPAREDDATEV(
 
  Column (#88): 
    NewDensity:0.000188, OldDensity:0.000583 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:4551
  Column (#88): SYS_NC00088$(

Check the name of column #88 – sys_nc00088$ – that’s an internally generated virtual column which may well be to be associated with a function-based index, so let’s back up a bit to the “BASIC STATISTICAL INFORMATION” and (thirteen sets of) index stats for the table where we find:

  Index: I_GL_DETAIL_7  Col#: 88
    LVLS: 3  #LB: 433301  #DK: 4551  LB/K: 95.00  DB/K: 5922.00  CLUF: 26953639.00

The obvious first guess is that column #88 is the invisible virtual column underpinning an index that has been created on substr(prepareddatev,1,10) and here’s a quick and dirty test script to demonstrate that this could be the correct guess.

create table t1 (v1 varchar2(20), v2 varchar2(1));
create index t1_i1 on t1(substr(v1,1,10));

select column_name, virtual_column, hidden_column from user_tab_cols where table_name = 'T1';
select * from user_ind_expressions where table_name = 'T1';

insert into t1 values('2022-03-02 09:01:00', 'x');
commit;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace traceonly explain

select /*+ full(t1) */  * from t1 where v1 = '2022-03-02 09:01:00';

set autotrace off

And here’s the output cut and pasted from an SQL*Plus session running 11.2.0.4 (which is the version the CBO trace file came from).

Table created.


Index created.


COLUMN_NAME          VIR HID
-------------------- --- ---
V1                   NO  NO
V2                   NO  NO
SYS_NC00003$         YES YES

3 rows selected.


INDEX_NAME           TABLE_NAME                COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- ------------------------- ---------------------------------------- ---------------
T1_I1                T1                        SUBSTR("V1",1,10)                                      1

1 row selected.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("V1"='2022-03-02 09:01:00' AND
              SUBSTR("V1",1,10)='2022-03-02')

We see the “extra” predicate and a column with a name of the form sys_ncXXXXX$. The results from more recent versions of Oracle should be the same. I think there’s a pretty good chance that if the OP runs suitable queries against XXX_tab_cols and XXX_ind_expressions they’ll see similar results that explain the predicate that surprised them.

Footnote

There are various notes on the blog about constraints and transitive closure generating extra predicates, and how the optimizer can use function-based indexes that have definitions that are “good enough” though not perfect matches for user-supplied predicates. This is just another little detail in how the optimizer tries to find as much helpful information as it can from the data dictionary. The earliest note I can find on my blog about this at present is about partition elimination and generated predicates – which prompted various comments about function-based indexes and predicate generation.

February 21, 2022

CTE Enhancement

Filed under: 12c,Oracle,Performance,Transformations — Jonathan Lewis @ 9:24 am GMT Feb 21,2022

For many years I’ve told people that when you materialize a CTE (common table expression / “with” subquery) the result set will be written to the temporary tablespace using direct path writes and will be read back using cached reads. This stopped being an accurate description in 12c.

There is a clue about this in the way that the corresponding execution plans and I’ll be pointing that out later. The key difference between earlier versions of Oracle and newer versions is that the GTT (global temporary table) that holds the materialized result set is not necessarily written to disc, and may even avoid allocating a temporary segment.

I started looking at this feature a couple of days ago after remembering that someone, somewhere, had mentioned some details about a temporary object being kept in the PGA rather than being written to disc if the size wasn’t too big. I couldn’t remember if this was GTTs or temporary LOBs (or something completely different) and I only had a vague memory that there was a critical size that might have been 256KB; so I started experimenting with materializing CTEs.

Here’s the script I used to establish a baseline in 11g. I took a fairly arbitrary starting guess that if there was a PGA/Temp split is might be around 64KB.

rem
rem     Script:         cte_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

alter session set events '10046 trace name context forever, level 8';

prompt  ==============================
prompt  First sample - just under 64KB
prompt  ==============================

set arraysize  35
set pagesize  35

set pause Waiting...
set pause on

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 63
)
select
        substr(v1,-10)
from
        g1
;

alter session set events '10046 trace name context off';

execute snap_my_stats.end_snap
execute snap_ts.end_snap

prompt  ==============================
prompt  Second sample - just over 64KB
prompt  ==============================

execute snap_ts.start_snap
execute snap_my_stats.start_snap

with g1 as (
        select  /*+ materialize */
                lpad(rownum,1024)       v1
        from
                dual
        connect by
                level <= 64
)
select
        substr(v1,-10)
from
        g1
;

execute snap_my_stats.end_snap
execute snap_ts.end_snap

set pause off

The calls to the package snap_my_stats are the usual ones I use (very old source at this Wayback URL) to report the session’s activity (v$mystat) between start and end snapshot and the calls in the snap_ts package do the same for the I/O per tablespace, summing bu tablespace across v$filestat and v$tempstat.

This script basically materializes and reports a GTT with a single column of 1,024 bytes, and in this case either 63 or 64 rows. You’ll note that I’ve built another assumption into the code that the CTE (if kept in memory) won’t follow all the details of a “real” table block, but will be a simple array with a tiny overhead per row – I’ve also assumed that the optimizer isn’t smart enough (or foolhardy enough) to push the substr() call inside the CTE.

I’ve set pause on and set both the arraysize and pagesize to a value less than the total number of rows I’ll be fetching so that I can check a couple of dynamic performance views (in particular v$sort_usage) from another session while the queries are running.

As a starting point, here are some critical values I’ve selected from the various outputs for just the 63 row case when running 11.2.04:

-------------
Session stats
-------------
Name                                                 Value
----                                                 -----
physical reads                                           9
physical reads cache                                     9
physical writes                                          9
physical writes direct                                   9
physical writes direct temporary tablespace              9

---------
T/S Stats 
---------
TS#        Reads      Blocks   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max Tablespace
----       -----      ------   ---------    ---      ------      ------   ---------    --- -------------------
   3           1           9        .000      1           1           9        .000      0 TEMP


Since I've enabled extended tracing at level 8 (waits) I can also show you the I/O waits reported for the reads and writes:
WAIT #140185232180256: nam='direct path write temp' ela= 180 file number=201 first dba=35073 block cnt=9 obj#=-1 tim=1645178184267768
...
WAIT #140185232180256: nam='db file scattered read' ela= 183 file#=201 block#=35073 blocks=9 obj#=-40016369 tim=1645178184268342

A dump of the 9 blocks of the temporary file (the “file number-201” translates to tempfile 1 since I have db_files = 200) starting at block 35073 looks like an ordinary table with pctfree 10 (which is one of the annoying details of GTTs – you can’t adjust it), 3 ITL slots (which is normal for CTAS) and 7 rows per block.

So, for 11g, we conclude that the effect of materializing a CTE is simply to create a GTT in the temporary tablespace, write it out using direct path writes, then read it back into the buffer cache using db file scattered reads. (You might want to confirm that this always happens, even if the CTE holds only one row.)

If you take advantage of the pause to issue “alter system flush buffer_cache” from another session you can also dump the segment header block (35072 in my case) to see that it’s a normal table segment header block – using freelist management, not ASSM because that’s the way temporary tablespaces have to be declared. The segment header block didn’t get written to disc in the normal course of the test.

12c Enhancement

This is the moment where the second query, and the pause that allows me to query v$sort_usage, becomes significant. When I started 12.2.0.1 with the 63 row query I saw:

  • No I/O on the temporary tablespace
  • No entry in v$sort_usage

To my great satisfaction the 64 row query did report I/O to the temporary tablespace (10 blocks this time – needing one extra block to cater for the 64th row) with v$sort_usage reporting a segment being created on my behalf. Obviously I re-ran the test a couple of times, flushing the buffer cache and shared pool, and connecting to a new session each time. The results were totally consistent: 63 rows => no GTT, 64 rows => GTT.

If you’re feeling a little suspicious at this point, bear with me.

This is the point where I switched to 19.11.0.0 – and both queries ran in memory with no sign of a GTT being created. Luckily I had cloned the query several times in the script generating different pairs of numbers of rows: 127/128, 255/256, 511/512, 1023/1024, and when I hit 1024 (and 1023) my session produced a GTT.

Somewhere between 512 and 1023 rows I was hitting a critical breakpoint – so I nearly started working through a binary chop to find the actually breakpoint; luckily, though, I had a little inspiration: if the overhead per row was 3 bytes (as it would be for a normal table column of more than 254 bytes) then 1023 rows would have an overhead of about 3KB – so I should test 1021 rows if I wanted to test a memory of just under 1MB.

Sure enough, at 1021 rows the GTT didn’t appear, at 1022 rows it did – time after time after time.

But …

My tests seemed to be totally repeatable. Again, I connected to a new session a few times, I flushed the buffer cache, I flushed the shared pool, I checked v$sort_usage. Consistently the results seemed to say:

  • 12.2 uses the PGA up to 64KB then dumps to a GTT
  • 19.11.0.0 uses the PGA up to 1MB then dumps to a GTT

Except – that night I had to shut down the two virtual machines because sometimes, for no obvious reason, I can’t hibernate my laptop while the VMs are running; and when I started everything up again the following morning and started re-running the tests as I wrote up the results something had changed. In fact my 19.11 instance didn’t dump to a GTT until I had reached nearly 10MB of data and my 12.2 wasn’t even dumping at 1MB; and I have no idea why a complete restart made such a difference.

After spending a little time trying to figure out what had changed – and I think it may have been that I’d been running the previous day’s tests after doing a lot of heavy work with temporary LOBs trying to pin down an anomaly with the handling of the temporary tablespace – I finally tried a google search using keywords that might be relevant and found this article that Keith Laker wrote about 5 years ago.

The feature is known as In-memory “cursor-duration” temporary table. I mentioned a clue in the execution plans at the start of this note: materialization shows up with a “temp table transformation” operation followed, in 11g, by with a child operation of “load as select”; but in 12.2 the child operation is “load as select (cursor duration memory)”. I really should have started my invesigation by putting the entire text of that operation into a search engine.

Summary

(Basically the highlights from Keith’s article):

  • The “in-memory cursor-duration”temporary table” change appeared in 12.2
  • It can be used in a number of transformations that the optimizer does
  • It’s not possible to force the use of the feature for a given query, it’s down to an internal algorithm
  • The mechanism uses memory that is “essentially” PGA
  • Despite the name this feature does not require you to licence the In-Memory option
  • If you’re still using an older version of Oracle this could be a good reason for upgrading as it can reduce the I/O load particularly for “analytic” types of query at a cost of using extra memory.

All the work I had done trying to find a break-point where Oracle switched from using PGA to using a GTT had been a waste of time – and the apparently consistent results on the first day had been an “accident” dictated (possibly) by some PGA-related activity that had taken place before I started running my tests .

Footnotes and geeky things

Five years on from the publication date of Keith’s article we shouldn’t be surprised to see some changes. Keith notes that the mechanism will apply only to serial queries that do more than one pass over the table – but there are two points to raise there:

  • possibly the two-pass thing is because it usually takes two passes over a CTE before Oracle will materialize a CTE automatically; my example shows the in-memory effect on a single pass – but that was a forced materialization.
  • the restrictions on parallelism may have been relaxed by now – if you check for hidden parameters you will find: _in_memory_cdt_maxpx, default value 4, described as “Max Parallelizers allowed for IMCDT”.

Interestingly there are more “cdt” parameters in 12.2 than there are in 19.11, and there are clear indications of some changes in algorithm and mechanism:

12c parameters like '%cdt%
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_cdt_shared_memory_limit                   0                    Shared Memory limit for CDT
_cdt_shared_memory_query_percent           20                   Per Query Percentage from CDT pool
_imcdt_use_mga                             ON                   MGA for In Memory CDT
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

19g parameters like '%cdt%'
Parameter                                  System Value         Description
--------------------------------------------------------- -------------------- ---------------------------------
_hcs_enable_in_mem_cdt_hint                FALSE                add hint opt_param('_in_memory_cdt', 'off')
_in_memory_cdt                             LIMITED              In Memory CDT
_in_memory_cdt_maxpx                       4                    Max Parallelizers allowed for IMCDT

The parameter “_in_memory_cdt” can take the values ON, LIMITED, or OFF – which tells you that even if you can’t force a query to use in-memory CDTs you can (if you really want to) stop a query from using the feature. There are a few notes about this parameter and its significance to RAC and parallel execution (for 12.2) on MOS – if you have an account – Doc ID 2388236.1 What is _in_memory_cdt Parameter?

The reference to MGA (the “managed global area”) in 12.2 is also quite interesting. This is documented as a Solaris feature using OSM to share memory between processes. For more general details you can review MOS Doc ID 2638904.1 MGA (Managed Global Area) Reference Note (again, only if you have an account).

The “new” oradebug mechanism shows (from 18c) a couple of relevant components under SQL compilation and execution that you could trace if you want to investigate further.

Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler ((null))
    ICDT_Compile               In Memory CDT Compilation (qks3t)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    ICDT_Exec                  In Memory CDT Execution (qes3t, kxtt)

January 17, 2022

Hash Aggregation – 1

Filed under: CBO,Hints,Oracle,Performance,Tuning — Jonathan Lewis @ 1:15 pm GMT Jan 17,2022

Here’s an observation I made some time in 2007, but didn’t mention online until a (possibly) relevant question appeared on the Oracle database forum in 2017; and the topic reappeared in a nearly unrelated question a little while ago. The 2017 question was this:

I am running following SQL, which it taking almost 4-5 seconds and returning only 18 records.

SELECT
   SUM(TOTAL_AMOUNT) C1,
   a.LEVEL2_ENAME AS c2,
   c.CURR_YEARMO AS c3
FROM TERRITORY_PRESET_MSNP a,
   CV_RESTATED_MSNP b
   LEFT OUTER JOIN
   MONTH_D c
   ON b.YEARMO = c.CURR_YEARMO,
   PRODUCT_OFFERING d
WHERE   b.PO_ID = d.ROW_ID
    AND b.DATASOURCE_ID = 10
    AND b.YEARMO = 201704
    AND b.OWNER_TERR_ID = a.TERR_ID
    AND c.CURR_YEARMO = 201704
    AND a.YEARMO = 201706
GROUP BY c.CURR_YEARMO, a.LEVEL2_ENAME
ORDER BY C3, C2;

If I remove the ORDER BY clause it is returning results in 1 second.

Before saying anything else, I’ll just make a couple of points about the SQL:

  • It’s not a good idea to mix traditional Oracle syntax with “ANSI” syntax – it’s likely to make things harder for the next person to read the code and there’s just a slight possibility that the rewrite that Oracle applies to hide the ANSI syntax may block some of the possible execution paths.
  • The C3, C2 in the order by clause are the column aliases for the curr_yearno, level2_ename columns used in the group by clause.  Although Oracle allows you to use aliases in the order by (but not in the group by) doing so can only make the SQL a little harder to interpret (especially in a case like this when you have both clauses).
  • There’s a left outer join to month_d (aliased as c), but the where clause then includes the predicate c.CURR_YEARMO = 201704 which will eliminate any rows where curr_yearmo is null, thus converting (we hope – but the mix and match syntax might introduce a side-effect) the outer join to an inner join – so maybe that’s a design error in the SQL.

Addressing the question, though, the first thought (rapidly eliminated) is that perhaps this is the standard complaint of the GUI interface: “it’s fast until I add an order by clause”.

The commonest reason for this complaint is that the typical GUI interface shows you the first few rows and waits for you to page down, so your impression of the response time is “the time to see the first few rows” rather than “the time to get the complete result set” when it might take much more time to return the entire result set. When you add an order by clause it’s possible that Oracle will have to sort the entire result set before you see any of it. It’s often the difference between “first few rows” and “entire result set” that triggers the complaint.

In this case the “rapid elimination” of this thought is due to the OP saying the result set was only 18 rows. which is likely to produce the gut feeling that it shouldn’t take very long for Oracle to sort 18 rows if it had to find all of them before showing displaying them. On the other hand the thought might need a little follow-up, after all:

  • it’s possible that the GUI is only displaying 15 rows at a time and it’s takes a lot of time to find the extra 3 rows. Just think of a tablescan with a filter subquery when the rows you want are the first few in the table. Without an order by the rows can be displayed as they are found, with an order by Oracle will have to get to the end of the tablescan before the rows can be sorted and displayed.
  • the optimizer can produce terrible estimates and the order by clause might prompt it to say “if I start with a different table, driving through a specific index, and changing the join order then I won’t have to do any sorting for the order by clause” The resulting path may be a very bad idea if the arithmetic produces the wrong results.

The OP hasn’t shown us the execution plan – and that’s what we really need to see; but there is an alternative  guess that we could make about what the optimizer is doing that would affect the performance this much.

The query is an aggregate query – we have a group by. Since 10g Oracle has been able to use “hash aggregation” – which shows up as the HASH GROUP BY operation in an execution plan. Here’s a little demo script, with a couple of sample queries:

rem
rem     Script:         sort_hash.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem 

create table t1
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4  -- > comment to avoid wordpress format issue
)
select
        trunc(dbms_random.value(0,262144))      n_256K,
        trunc(dbms_random.value(0,131072))      n_128K,
        trunc(dbms_random.value(0,8192))        n_8k
from
        generator       v1,
        generator       v2
where
        rownum <= 8 * 1048576  -- > comment to avoid wordpress format issue
;

set arraysize 1000
set timing on
set autotrace traceonly 

prompt  ===========
prompt  No Order by
prompt  ===========

select
        n_8K, count(*) ct
from
        t1
group by
        n_8k
;


prompt  =============
prompt  With Order by
prompt  =============

select
        n_8K, count(*) ct
from
        t1
group by
        n_8k
order by
        1
;


My table has 8M rows, and my queries target the column with 8K distinct values. I’ve enabled autotrace from SQL*Plus, set a large arraysize (to reduce time lost to SQL*Net round-trips), and set timing on so we can get an elapsed time for total execution. I’ve set autotrace to “traceonly” so that the SQL*Plus client will fetch the data but won’t doesn’t waste resources formatting it, but I’m not actually interested in the handful of execution statistics that will be reported.

Here are the two sets of results from a test run on 19.11.0.0. Note, particularly, the figure for Elapsed:


===========
No Order by
===========

8192 rows selected.

Elapsed: 00:00:00.58

Execution Plan
----------------------------------------------------------
Plan hash value: 136660032

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
---------------------------------------------------------------------------

=============
With Order by
=============

8192 rows selected.

Elapsed: 00:00:03.03

Execution Plan
----------------------------------------------------------
Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
---------------------------------------------------------------------------

The time has jumped from slightly under 0.6 seconds to just over 3 seconds as the critical operation changes from a HASH GROUP BY to a SORT GROUP BY (even though the estimated cost, hence predicted run-time, of execution has not changed).

Your first thought at this point is probably along the lines of “surely it doesn’t take 2.4 seconds to sort 8,192 small rows, why doesn’t Oracle do a hash group by followed by a sort order by?” The answer seems to be “it just doesn’t”. So here’s one way to make it happen (with execution plan and elapsed time from 19.11.0.0 again):

select
        dist_8k, ct
from
        (
        select  /*+ no_merge */
                n_8K dist_8k, count(*) ct
        from
                t1
        group by
                n_8k
        )
order by 
        dist_8k
;

8192 rows selected.

Elapsed: 00:00:00.59

Execution Plan
----------------------------------------------------------
Plan hash value: 1705136228

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   1 |  SORT ORDER BY       |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   2 |   VIEW               |      |  8192 |   208K|  6938  (61)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
-----------------------------------------------------------------------------


Again the estimated cost of execution doesn’t (to the nearest whole number) change from the basic cost of the hash aggregation – but we have brought the time back down to just under 0.6 seconds.

It’s worth reminding you at this point that if you can re-engineer any SQL that’s performing badly and can see that the driving core of the query can be reduced to something much simpler and more efficient, then wrapping that core into an inline view with the /*+ no_merge */ hint (and possibly putting it up into a “with subquery” clause) might be the safest first step and most effective way of improving performance.

There is an option for avoiding the query rewrite here – hint the path you want to see:


select  /*+ use_hash_aggregation */
        n_8K, count(*) ct
from
        t1
group by
        n_8k
order by 
        1
;

8192 rows selected.

Elapsed: 00:00:00.59

Execution Plan
----------------------------------------------------------
Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   1 |  SORT ORDER BY      |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |  8192 | 32768 |  6938  (61)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |  8388K|    32M|  3460  (21)| 00:00:01 |
----------------------------------------------------------------------------

The nice thing about this, of course, is that you don’t actually have to edit the text; the hint could be attached to the query through an SQL Patch (or by abusing the SQL Profile or SQL Plan Baseline mechanisms).

The difficult part of hinting is finding the correct query block name for a more complex query. I simply added the hint /*+ use_hash_aggregation */ but the hint can be aimed at a query block so, in the absence of explicit query block names I could have used the hint /*+ use_hash_aggregation(@sel$1) */ using the default naming.

In a more complex case you can find the appropriate query block name by using the ‘alias’ format option when generating the execution plan. Consider the following query (where t2 and t3 are created from view all_objects), with its initial execution plan:


explain plan for
select 
        t2.owner, count(*)
from 
        t2 
where 
        t2.object_id in (
                select  t3.object_id 
                from    t3 
                where   t3.object_type = 'TABLE'
        )
group by 
        t2.owner
order by 
        t2.owner
/


select * from table(dbms_xplan.display(format=>'alias'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2646727453

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |    23 |   575 |   319   (8)| 00:00:01 |
|   1 |  SORT GROUP BY        |      |    23 |   575 |   319   (8)| 00:00:01 |
|*  2 |   HASH JOIN RIGHT SEMI|      |  2298 | 57450 |   317   (7)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | T3   |  2298 | 34470 |   158   (7)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2   | 57448 |   561K|   156   (6)| 00:00:01 |
------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$5DA710D3
   3 - SEL$5DA710D3 / T3@SEL$2
   4 - SEL$5DA710D3 / T2@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
   3 - filter("T3"."OBJECT_TYPE"='TABLE')

The Query Block Name / Object Alias information tells us that the query block holding the sort group by at operation 1 is named SEL$5DA710D3 so we can use that as the target query block in the hint: /*+ use_hash_aggregation(@SEL$5DA710D3) */ and the plan changes to:


-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |    23 |   575 |   319   (8)| 00:00:01 |
|   1 |  SORT ORDER BY         |      |    23 |   575 |   319   (8)| 00:00:01 |
|   2 |   HASH GROUP BY        |      |    23 |   575 |   319   (8)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|      |  2298 | 57450 |   317   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T3   |  2298 | 34470 |   158   (7)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2   | 57448 |   561K|   156   (6)| 00:00:01 |
-------------------------------------------------------------------------------

It’s possible that you’ll only notice a significant difference in performance (and, perhaps, PGA memory allocated) when you aggregate a large number of rows into a small result set. And it does seem that this is one case where the only way to get the plan you want without a significant code rewrite is through a hint

tl;dr

If you have some code which does a “group by X, Y, Z order by X, Y, Z” (the order of the columns/expressions used doesn’t have to be the same for the two clauses) then Oracle will use a sort group by operation to handle the aggregation and ordering in a single step, even though we can find cases where hash aggregation followed by sort ordering is more efficient.

If you come across such a case then injecting the hint /*+ use_hash_aggregation(@query_block_name) */ may be the only way to change the execution plan if you’re not allowed to edit the SQL.

Footnote

In the second of the two links to the Oracle Developer Forum you’ll see that one poster pointed out that if the order by clause uses a suitable expression to substitute for one of the columns in the group by clause then you don’t need to hint the code, e.g.

group by
        numeric_column
order by
        numeric_column + 0

Very cute, but not a good idea.

Footnote 2

There is another part to the testing I started in 2007, and I’ll come back to that later in Hash Aggregation – 2.

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Mathias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

December 3, 2021

Column Groups

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 10:40 am GMT Dec 3,2021

Here’s a little test that should have been the second test I ran when column groups were first made available in Oracle. It’s a check on the second of the two basic algorithms that Oracle uses for combining predicates:

  1. sel(predA and predB) = sel(predA) * sel(predB)
  2. sel(predA or predB) = sel(predA) + sel(predB) – sel(predA and predB)

As I pointed out many years ago in Cost Based Oracle – Fundamentals these are the formulae for combining probabilities of independent events. But if the predicates are not independent the formulae make the optimizer produce bad cardinality estimates and that’s why the Oracle developers introduced column groups – virtual columns that gather statistics about combinations of columns.

So let’s set up an extreme example and test the results (I’ve run it one 21c and 19c, but the issue is the same for all versions with column groups):

rem
rem     Script:         column_group_and_or.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        mod(rownum,100)                 n1,
        mod(rownum,100)                 n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator 
;

prompt  =========================================
prompt  Run the script twice, 
prompt  first without gathering the column group
prompt  then after gathering the column group
prompt  =========================================

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for columns (n1, n2) size 1'
        );
end;
/

column column_name format a32

select
        column_name, num_distinct, data_default
from
        user_tab_cols
where
        table_name = 'T1'
/

This script gives you a table with 10,000 rows, where the n1 and n2 columns always have exactly the same values as each other, and each column has 100 rows each of 100 distinct values. If you allow the pl/sql block to run the column stats query will report the following:

COLUMN_NAME                      NUM_DISTINCT DATA_DEFAULT
-------------------------------- ------------ --------------------------------------------------------------------------------
N1                                        100
N2                                        100
V1                                      10000
PADDING                                     1
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            100 SYS_OP_COMBINED_HASH("N1","N2")

5 rows selected.

After creating the data (and statistics) we can execute the following simple statements:

set autotrace traceonly explain

select  * 
from    t1
where
        n1 = 50
/

select  * 
from    t1
where
        n1 = 50
and     n2 = 50
/

select  *
from    t1
where
        n1 = 50
or      n2 = 50
/

set autotrace off

We know that the first query will return 100 rows, and that the optimizer has enough information to get the correct estimate: the selectivity of “n1 = 50” is 1/100, the number of rows in the table is 10,000, so the estimate should be 1/100 * 10,000 = 100.

For the second query: if we don’t create the column group we expect the optimizer to multiply the selectivities of the two predicates together before multiplying by the number of rows in the table to get an estimate of (1/100 * 1/100) * 10,000 = 1

If we’ve created the column group we expect the optimizer to use the column group selectivity to calculate its estimate (1/100 * 10,000) = 100.

Then we get to the third query: if we don’t create the column group we expect the optimizer to slot 1/100 into the 4 places it has to appear in the formula to get an estimate (1/100 + 1/100 – (1/100 * 1/100)) * 10,000 = 199.

If we have created the column group the third expression in the formula can use the column group 1/num_distinct for the third expression to get: (1/100 + 1/100 – (1/100)) * 10,000 = 100

Here are the three execution plans (with a little cosmetic work) when the column group exists:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 11800 |    26   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 11800 |    26   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 11800 |    26   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 11800 |    26   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 AND "N2"=50)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   199 | 23482 |    27   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   199 | 23482 |    27   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 OR "N2"=50)

As you can see from the results in the Rows column, the optimizer has used the column group statistics for the AND estimate, but not for the OR estimate, resulting in an estimate that is out by a factor of nearly 2.

It should be noted that failure to use the column group for AND’ed predicates can introduce a massive error in cardinality estimates but for OR’ed predicates the worst it can be is a factor of 2.

Summary

There is a possible enhancement to the the optimizer’s use of column group statistics that could make the arithmetic consistent for AND and OR predicates and could allow the optimizer to produce better estimates for OR conditions.

Although any error in cardinality estimates can lead to undesirable execution plans the scale of the error due to the current inconsistency will be at worst a factor or 2, which is unlikely to introduce many surprise plans. The nature of the error means the estimate will be too big, which might push the optimizer into using tablescans and hash joins when indexed accesses and nested loops might be the better choice.

October 15, 2021

use_nl redux

Filed under: CBO,Execution plans,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 2:58 pm BST Oct 15,2021

A question has just appeared on a note I wrote in 2012 about the incorrect use of the use_nl() hint in some sys-recursive SQL, linking forward to an explanation I wrote in 2017 of the use_nl() hint – particularly the interpretation of the form use_nl(a,b), which does not mean “use a nested loop from table A to table B)”.

The question is essentially – “does Oracle pick the join order before it looks at the hints”?

I’m going to look at one of the queries in the question (based on the 2017 table creation code) and explain how Oracle gets to the plan it finally picks. I’ll be using an instance of 21.3 in the examples below. Here’s the query, followed by the plan:

select
        /*+ use_nl(b) */
        a.v1, b.v1, c.v1, d.v1
from
        a, b, c, d
where
        d.n100 = 0
and     a.n100 = d.id
and     b.n100 = a.n2
and     c.id   = a.id
/


| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 20000 |  1054K|    78   (4)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 20000 |   703K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "C"@"SEL$1")
      USE_HASH(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "A"@"SEL$1")
      LEADING(@"SEL$1" "B"@"SEL$1" "A"@"SEL$1" "D"@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "D"@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      FULL(@"SEL$1" "B"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('21.1.0')
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ID"="A"."ID")
   3 - access("A"."N100"="D"."ID")
   4 - filter("D"."N100"=0)
   5 - access("B"."N100"="A"."N2")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   6 -  SEL$1 / "B"@"SEL$1"
         U -  use_nl(b)

Note
-----
   - this is an adaptive plan

Points to note:

  • The Hint Report says the plan final did not use the use_nl(b) hint.
  • Whatever you may think the join order is by looking at the bodyy of the plan, the leading() hint in the Outline Information tells us that the join order was (B A D C) – and that explains why the use_nl(b) hint could not be used, because B was never “the next table in the join order”.
  • The “visible” order of activity displayed in the plan is C D B A, but that’s because we swap_join_inputs(D) to put it above the (B,A) join, then swap_join_inputs(C) to put that above D.

So did Oracle completely pre-empt any plans that allowed B to be “the next table”, thus avoiding the hint, or did it consider some plans where B wasn’t the first table in the join order and, if so, would it have used a nested loop into B if that plan had had a low enough cost?

The only way to answer these questions is to look at the CBO (10053) trace file; and for very simple queries it’s often enough to pick out a few lines as a starting point – in my case using egrep:

egrep -e "^Join order" -e"Best so far" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Join order aborted2: cost > best plan cost
Join order[5]:  D[D]#0  C[C]#3  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[6]:  D[D]#0  C[C]#3  B[B]#2  A[A]#1
Join order aborted2: cost > best plan cost

Join order[7]:  A[A]#1  D[D]#0  B[B]#2  C[C]#3
Join order aborted2: cost > best plan cost
Join order[8]:  A[A]#1  D[D]#0  C[C]#3  B[B]#2
Join order aborted2: cost > best plan cost
Join order[9]:  A[A]#1  B[B]#2  D[D]#0  C[C]#3
Join order aborted2: cost > best plan cost
Join order[10]:  A[A]#1  C[C]#3  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[11]:  A[A]#1  C[C]#3  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost

Join order[12]:  B[B]#2  D[D]#0  A[A]#1  C[C]#3
Join order aborted2: cost > best plan cost
Join order[13]:  B[B]#2  A[A]#1  D[D]#0  C[C]#3
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
Join order[14]:  B[B]#2  A[A]#1  C[C]#3  D[D]#0
Join order aborted2: cost > best plan cost
Join order[15]:  B[B]#2  C[C]#3  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Join order[16]:  C[C]#3  D[D]#0  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[17]:  C[C]#3  A[A]#1  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[18]:  C[C]#3  A[A]#1  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost
Join order[19]:  C[C]#3  B[B]#2  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Oracle has considerd 19 possible join orders (out of a maximum of 24 (= 4!). In theory we should see 6 plans starting with each of the 4 tables. In fact we see that the optimizer’s first choice started with table D, producing 6 join orders, then switched to starting with table A, producing only 5 join orders.

The “missing” order is (A, B, C, D) which should have appeared between join orders 9 and 10. If we check the trace file for join order 9 (A, B, D, C) in more detail we’ll see that the optimizer aborted after calculating the join from A to B because the cost had already exceeded the “Best so far” by then; so it didn’t even calculate the cost of getting to the 3rd table (D) in that join order. Clearly, then, there was no point in considering any other order that started with (A, B) – hence the absence of (A, B, C, D), which would otherwise have been the next in the list

I’ve highlighted all the join orders where the optimizer didn’t abort with a “Best so far” line, but the method I used was engineered for ease and the result is misleading, that line is only the cost of getting data from the first table in join order so I need to expand the output in each case to 4 lines (one for each table). This is what the 4 non-aborted summaries look like:

egrep -A+3 -e"Best so far" or21_ora_15956.trc

Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 2  cost: 30137.036118  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 30163.548157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 3  cost: 78.079517  card: 10000.000000  bytes: 550000.000000
              Table#: 2  cost: 30163.348157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 2  cost: 2483.956340  card: 1000000.000000  bytes: 32000000.000000
              Table#: 1  cost: 2530.068379  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 2556.580418  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
              Table#: 1  cost: 52.204078  card: 20000.000000  bytes: 720000.000000
              Table#: 0  cost: 78.479517  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 104.991556  card: 20000.000000  bytes: 1380000.000000

As you can see, when we start with (B A) the estimated cost drops dramatically.

Now that we’ve see that Oracle looks at many (though not a completely exhaustive set of) plans on the way to the one it picks the thing we need to do (in theory) is check that for every single calculation where B is “the next table”, Oracle obeys our hint. Each time the optimizer join “the next” table it usually considers the cost of a Nested Loop, a Sort Merge, and a Hash Join in that order; if the optimizer is obeying the hint it will only consider the nested loop join. Here’s a suitable call to egrep with the first four join orders::

egrep -e "^Join order" -e "^Now joining" -e"^NL Join" -e"^SM Join" -e"^HA Join" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Now joining: A[A]#1
NL Join
SM Join
SM Join (with index on outer)
HA Join
Now joining: B[B]#2
NL Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Now joining: C[C]#3
NL Join
SM Join
HA Join
Now joining: B[B]#2
NL Join

Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Now joining: B[B]#2
NL Join
Now joining: A[A]#1
NL Join
SM Join
HA Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Now joining: C[C]#3
NL Join
Join order aborted2: cost > best plan cost


As you can see, the only join considered when “Now joining” B is a nested loop join; for all other tables the three possible joins (and sometimes two variants of the Sort Merge join) are evaluated.

You may also notice another of the clever strategies the optimizer uses to minimise its workload. On the second join order the optimizer goes straight to “Now joining C” because it has remembered the result of joining A from the previous join order.

This is only a very simple example and analysis, but I hope it’s given you some idea of how the optimizer works, and how clever it tries to be about minimising the work; and how it can obey a hint while still producing an execution plan that appears to have ignored the hint.

October 7, 2021

Hints and Costs

Filed under: 12c,CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:06 pm BST Oct 7,2021

This note is one I drafted three years ago, based on a question from the Oracle-L. It doesn’t directly address that question because at the time I was unable to create a data set that reproduced the problem’ but it did highlight a detail that’s worth mentioning, so I’ve finally got around to completing it (and testing on a couple of newer versions of Oracle).

I’ll start with a model that was supposed to demonstrate the problem behind the question:


rem
rem     Script:         122_or_expand.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:        
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem             12.2.0.1
rem

create table t1
segment creation immediate
pctfree 80 pctused 20
nologging
as
select
        *
from
        all_objects
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk
        primary key(object_id)
        using index pctfree 80
;

variable b1 number
variable b2 number
variable b3 number
variable b4 number
variable b5 number
variable b6 number

exec :b1 := 100
exec :b2 := 120
exec :b3 := 1100
exec :b4 := 1220
exec :b5 := 3100
exec :b6 := 3320

set serveroutput off

select
        object_name
from 
        t1
where
        object_id between :b1 and :b2
or      object_id between :b3 and :b4
or      object_id between :b5 and :b6
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));


The critical feature of the query is the list of disjuncts (ORs) which all specify a range for object_id. The problem was that the query used a plan with an index full scan when there were no statistics on the table (or its indexes), but switched to a plan  that used index range scans when statistics were gathered – and the performance of the plan with the full scan was unacceptable.  (Clearly the “proper” solution is to have some suitable statistics in place – but sometimes such things are out of the control of the people who have to solve the problems.)

The /*+ index() */ and (undocumented) /*+ index_rs_asc() */ hints had no effect on the plan. The reason why the /*+ index() */ hint made no difference is because an index full scan is one of the ways in which the /*+ index() */ hint can be obeyed – the hint doesn’t instruct the optimizer to pick an index range scan. The hint /*+ index_rs_asc() */ specifically tells the optimizer to pick an index Range Scan ASCending if the hint has been specified correctly and the choice is available and legal. So why was the optimizer not doing as it was told. Without seeing the execution plan or CBO trace file from a live example I can’t guarantee that the following hypothesis is correct, but I think it’s in the right ball park.

I think the optimizer was probably using the (new to 12c) cost-based“OR expansion” transformation, which basically transformed the query into a UNION ALL of several index range scans – and that’s why its outline would show /*+ index_rs_asc() */ hints, and the hint would only become valid after the transformation had taken place so if Oracle didn’t consider (or considered and discarded) the transformation when there were no stats in place then the hint would have to be “Unused” (as the new 19c hint-report would say).

When I tried to model the problem the optimizer kept doing nice things with my data, so I wasn’t able to demonstrate the OP’s problem. However in one of my attempts to get a silly plan I did something silly – that can happen by accident if your client code isn’t careful! I’ll tell you what that was in a moment – first, a couple of plans.

As it stands, with the data and bind variables as shown, the optimizer used “b-tree / bitmap conversion” to produce an execution plan that did three separate index range scans, converts rowids to bit, OR-ed the bit-strings, then converted back to rowids before accessing the table:

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |    84 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |    84   (5)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|   3 |    BITMAP OR                        |       |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   5 |      SORT ORDER BY                  |       |       |       |            |          |
|*  6 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   8 |      SORT ORDER BY                  |       |       |       |            |          |
|*  9 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|  10 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|  11 |      SORT ORDER BY                  |       |       |       |            |          |
|* 12 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

So the first thing I had to do was disable this feature, which I did by adding the hint /*+ opt_param(‘_b_tree_bitmap_plans’,’false’) */ to the query. This adjustment left Oracle doing the OR-expansion that I didn’t want to see:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   297 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   288 | 19008 |   297   (1)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|  10 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |   173 |  7612 |   177   (1)| 00:00:01 |
|* 11 |      INDEX RANGE SCAN                  | T1_PK           |   173 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

You’ll notice that the three range scans have different row estimates and costs – that’s the effect of bind variable peeking and my careful choice of bind variables to define different sized ranges. Take note, by the way, for the three filter predicates flagged at operations 3, 6, and 9.  These are the “conditional plan” filters that say things like: “don’t run the sub-plan if the runtime value of :b5 is greater than :b6”.

Since I didn’t want to see OR-expansion just yet I then added the hint /*+ no_or_expand(@sel$1) */ to the query and that gave me a plan with tablescan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   617 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   291 | 12804 |   617   (4)| 00:00:01 |
--------------------------------------------------------------------------

This was a shame because I really wanted to see the optimizer produce an index full scan at this point – so I decided to add an “unnamed index” hint to the growing list of hints – specifically: /*+ index_(@sel$1 t1@sel$1) */

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |   405 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |   405   (2)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | T1_PK |   291 |       |   112   (7)| 00:00:01 |
---------------------------------------------------------------------------------------------

This, of course, is where things started to get a little interesting – the index full scan costs less than the tablescan but didn’t appear until hinted. But after a moment’s thought you can dismiss this one (possibly correctly) as an example of the optimizer being cautious about the cost of access paths that are dictated by bind variables or unpeekable inputs. (But these bind variables were peekable – so maybe there’s more to it than that – I was still trying to get to a point where my model would behave more like the OP’s, so I didn’t follow up on this detail: maybe in a couple of years time … ).

Once last tweak – and that will bring me to the main point of this note. In my original code I was using three ranges dictated by 3 pairs of bind variables, for example [:b5, :b6]. What would happen if I made :b5 greater than :b6, say I swapped their values?

The original btree/bitmap plan didn’t change, but where I had simply blocked bree/bitmap plans and seen OR-expansion as a result the plan changed to a full tablescan (with the cost you saw above of 617). So tried again, adding the hint /*+ or_expand(@sel$1) */ to see why; and this is the plan I got:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   735 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   116 |  7656 |   735   (3)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|* 10 |     TABLE ACCESS FULL                  | T1              |     1 |    44 |   615   (4)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

I still get the same three branches in the expansion, but look what’s happened to the sub-plan for the third pair of bind variables. The optimizer still has the FILTER at operation 9 – and that will evaluate to FALSE for the currently peeked values; but the optimizer has decided that it should use a tablescan for this part of the query if it ever gets a pair of bind variables in the right order; and the cost of the tablescan has echoed up the plan to make the total cost of the plan 735, which is (for obvious reasons) higher than the cost of running the whole query as a single tablescan.

The same anomaly appears in 19.11.0.0 and 21.3.0.0. On the plus side, it’s possible that if you have code like this the optimizer will be using the btree/bitmap conversion anyway;

tl;dr

As a generic point it’s worth ensuring that if you’re using bind variables in client code to define ranges then you’ve got to get the values in the right order otherwise one day the answer to the question “nothing changed why is the query running so slowly?” will be “someone got in first with the bound values the wrong way round”.

DB Optimizer

Filed under: CBO,Oracle — Jonathan Lewis @ 8:56 am BST Oct 7,2021

I’ve rediscovered this list of articles by Stefan Koehler on my laptop; it has a bias towards SAP users, and it’s several years old judging by the dates so some details will have changed, but there still seem to be plenty of people using 12c (and SAP) so I thought I’d tidy it up and publish it

September 20, 2021

Optimizer Tip

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 9:04 am BST Sep 20,2021

This is a note I drafted in March 2016, starting with a comment that it had been about the same time the previous year that I had written:

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

I keep finding ancient drafts like this and if they still seem relevant – even if they are a little behind the times – I’ve taken to dusting them down and publishing. (There are still more than 730 drafts on my blog at present – which gives me scope for one per day for the next 2 years!)

With the passing of time, though, new information becomes available, algorithms change and (occasionally) I discover I’ve made a significant error in a hypothesis (or guess). In this case there are a couple of important additions that I’ve added to the end of the original note.

Optimizer Tips (IOUG Quick Tips 2015)

There are two very common reasons why the optimizer picks a bad execution plan. The first is that its estimate of the required data volume is bad, the second is that it has a misleading impression of how scattered that data is.

The first issue is often due to problems with the selectivity of complex predicates, the second to unsuitable values for the clustering_factor of potentially useful indexes. Recent [ed: i.e. as at 2015] versions of the Oracle software have given us features that try to address both these issues and I’m going to comment on them in the following note.

As always any change can have side effects; introducing a new feature might have no effect on 99% of what we do, a beneficial effect on 99% of the remainder, and a hideous effect on the 1% of 1% that’s left, so I will be commenting on both the pros and cons of both features.

Column Group Stats

The optimizer assumes that the data in two different columns of a single table are independent – for example the registration number on your car (probably) has nothing to do with the account number of your bank account. So when we execute queries like:

     colX = 'abcd'
and  colY = 'wxyz'

the optimizer’s calculations will be something like:

“one row in 5,000 is likely to have colX = ‘abcd’ and one row in 2,000 is likely to have colY = ‘wxyz’, so the combination will probably appear in roughly one row in ten million”.

On the other hand we often find tables that do things like storing post codes (zipcodes) in one column and city names in another, and there’s a strong correlation between post codes and city – for example the district code (first part of the post code) “OX1” will be in the city of Oxford (Oxfordshire, UK). So if we query a table of addresses for rows where:

     district_code = 'OX1'
and  city          = 'Oxford'

there’s a degree of redundancy, but the optimizer will multiply the total number of distinct district codes in the UK by the total number of distinct city names in the UK as it tries to work out the number of addresses that match the combined predicate and will come up with a result that is far too small.

In cases like this we can define “column group” statistics about combinations of columns that we query together, using the function dbms_stats.create_extended_stats(). This function will create a type of virtual column for a table and report the system-generated name back to us, and we will be able to see that name in the view user_tab_cols, and the definition in the view user_stat_extensions. If we define a column group in this way we then need to gather stats on it, which we can do in one of two ways, either by using the generated name or by using the expression that created it.


SQL> create table addresses (district_code varchar2(8), city varchar2(40));

Table created.

SQL> execute dbms_output.put_line( -
>        dbms_stats.create_extended_stats( -
>            user,'addresses','(district_code, city)'))

SYS_STU12RZM_07240SN3V2667EQLW

PL/SQL procedure successfully completed.

begin
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns SYS_STU12RZM_07240SN3V2667EQLW size 1'
        );
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns (district_code, city) size 1'
        );
end;
/

I’ve included both options in the anonymous pl/sql block, but you only need one of them. In fact if you use the second one without calling create_extended_stats() first Oracle will create the column group implicitly, but you won’t know what it’s called until you query user_stat_extensions.

I’ve limited the stats collection to basic stats with the “size 1” option. You can collect a histogram on a column group but since the optimizer can only use a column group with equality predicates you should only create a histogram in the special cases where you know that you’re going to get a frequency histogram or “Top-N” histogram.

You can also define extended stats on expressions (e.g. trunc(delivery-date) – trunc(collection_date)) rather than column groups, but since you’re only allowed 20 column groups per table [but see update 1] it would be better to use virtual columns for expressions since you can have as many virtual columns as you like on a table provided the total column count stays below the limit of 1,000 columns per table.

Warnings

  • Column group statistics are only used for equality expressions. [see also update 2]
  • Column group statistics will not be used if you’ve created a histogram on any of the underlying columns unless there’s also a histogram on the column group itself.
  • Column group statistics will not be used if you query any of the underlying columns with an “out of range” value. This, perhaps, is the biggest instability threat with column groups. As time passes and new data appears you may find people querying the new data. If you haven’t kept the column stats up to date then plans can change dramatically as the optimizer switches from using column group stats to multiplying the selectivities of underlying columns.
  • The final warning arrives with 12c. If you have all the adaptive optimizer options enabled the optimizer will keep a look out for tables that it thinks could do with column group stats, and automatically creates them the next time you gather stats on the table. In principle this shouldn’t be a problem – the optimizer should only do this when it has seen that column group stats should improve performance – but you might want to monitor your system for the arrival of new automatic columns.

Preference: table_cached_blocks

Even when the cardinality estimates are correct we may find that we get an inefficient execution plan because the optimizer doesn’t want to use an index that we think would be a really good choice. A common reason for this failure is that the clustering_factor on the index is unrealistically large.

The clustering_factor of an index is a measure of how randomly you will jump around the table as you do an index range scan through the index and the algorithm Oracle uses to calculate this measure has a serious flaw in it: it can’t tell the difference between a little bit of localised jumping and constant random leaps across the entire width of the table.

To calculate the clustering_factor Oracle basically walks the entire index in order using the rowid at the end of each index entry to check which table block it would have to visit, and every time it has to visit a “new” table block it increments a counter. The trouble with this approach is that, by default, it doesn’t remember its recent history so, for example, it can’t tell the difference in quality between the following two sequences of table block visits:

Block 612, block 87, block 154, block  3, block 1386, block 834, block 237
Block  98, block 99, block  98, block 99, block   98, block  99, block  98

In both cases Oracle would say that it had visited seven different blocks and the data was badly scattered. This has always been a problem, but it became much more of a problem when Oracle introduced ASSM (automatic segment space management). The point of ASSM is to ensure that concurrent inserts from different sessions tend to use different table blocks, the aim being to reduce contention due to buffer busy waits. As we’ve just seen, though, the clustering_factor doesn’t differentiate between “a little bit of scatter” and “a totally random disaster area”.

Oracle finally addressed this problem by introducing a “table preference” which allows you to tell it to “remember history” when calculating the clustering_factor. So, for example, a call like this:

execute dbms_stats.set_table_prefs(user,'t1','table_cached_blocks',16)

would tell Oracle that the next time you collect statistics on any indexes on table t1 the code to calculate the clustering_factor should remember the last 16 table blocks it had “visited” and not increment the counter if the next block to visit was already in that list.

If you look at the two samples above, this means the counter for the first list of blocks would reach 7 while the counter for the second list would only reach 2. Suddenly the optimizer will be able to tell the difference between data that is “locally” scattered and data that really is randomly scattered. You and the optimizer may finally agree on what constitutes a good index.

It’s hard to say whether there’s a proper “default” value for this preference. If you’re using ASSM (and there can’t be many people left who aren’t) then the obvious choice for the parameter would be 16 since ASSM tends to format 16 consecutive blocks at a time when a segment needs to make more space available for users [but see Update 3]. However, if you know that the real level of insert concurrency on a table is higher than 16 then you might be better off setting the value to match the known level of concurrency.

Are there any special risks to setting this preference to a value like 16? I don’t think so; it’s going to result in plans changing, of course, but indexes which should have a large clustering_factor should still end up with a large clustering_factor after setting the preference and gathering statistics; the indexes that ought to have a low clustering_factor are the ones most likely to change, and change in the right direction.

Footnote: “Danger, Will Robinson”.

I’ve highlighted two features that are incredibly useful as tools to give the optimizer better information about your data and allow it to get better execution plans with less manual intervention. The usual warning applies, though: “if you want to get there, you don’t want to start from here”. When you manipulate the information the optimizer is using it will give you some new plans; better information will normally result in better plans but it is almost inevitable that some of your current queries are running efficiently “by accident” (possibly because of bugs) and the new code paths will result in some plans changing for the worse.

Clearly it is necessary to do some thorough testing but fortunately both features are incremental and any changes can be backed out very quickly and easily. We can change the table_cached_blocks one table at a time (or even, with a little manual intervention, one index at a time) and watch the effects; we can add column groups one at a time and watch for side effects. All it takes to back out of a change is a call to gather index stats, or a call to drop extended stats. It’s never nice to live through change – especially change that can have a dramatic impact – but if we find after going to production that we missed a problem with our testing we can reverse the changes very quickly.

Updates

Update 1 – 20 sets of extended stats. In fact the limit is the larger of 20 and ceiling(column count/10), and the way the arithmetic is applied is a little odd so there are ways to hack around the limit.

Update 2 – Column groups and equality. It’s worth a special menton that the predicate colX is null is not an equality predicate, and column group stats will not apply but there can be unexpected side effects even for cases where you don’t use this “is null” predicate. (More articles here about column groups.)

Update 3 – table_cached_blocks = 16. This suggestions doesn’t cater for systems running RAC.

August 25, 2021

qbregistry 2

Filed under: CBO,dbms_xplan,Oracle,Transformations — Jonathan Lewis @ 1:45 pm BST Aug 25,2021

Following a question (very similar to one I had been asking myself) that appeared on twitter in response to my original posting on the new qbregistry format option in the dbms_xplan package, I’ve drafted a note of how I interpreted the execution plan so that I could more clearly see how my visualisation of the transformation maps (or fails to map) to the Query Block Registry.

I can’t guarantee the correctness of the description I’ve given here, but it’s probably fairly accurate.

Original Query (hiding the unnest and no_semijoin hints)

select  
        /* sel$1 */
        * 
from    t1 
where   t1.owner = 'OUTLN' 
and     object_name in (
                select  /* sel$2 */
                        distinct t2.object_name 
                from   t2 
                where  t2.object_type = 'TABLE'
        )
;

Transformation 1: unnest the subquery

This is visible in the Outline Information of the execution plan as the hint UNNEST(@”SEL$2″ UNNEST_INNERJ_DISTINCT_VIEW) and produces two new query blocks, the query block that “is” the unnested subquery and the query block that joins t1 to the unnested subquery vw_nso_1.

select  
        /* SEL$5DA710D3 */
        t1.* 
from
        t1,
        (
        select  /* SEL$683B0107 */
                distinct
                t2.object_name 
        from    t2 
        where   t2.object_type = 'TABLE'
        )       vw_nso_1
where
        t1.owner = 'OUTLN' 
and     vw_nso_1.object_name = t1.object_name



Transformation 2: view merge (join then aggregate)

This is visible in the Outline Information of the execution plan in the hint MERGE(@”SEL$683B0107″ >”SEL$5DA710D3″). I think this produces three new query blocks; the block that “is” the merged view, a block that selects (projects) from the merged view, and the query block that the main query now becomes.

We will pretend that t1 has only 4 columns, owner, object_name, object_type, object_id.

select
        /* SEL$B186933D */
        vm_nwvw_2.owner,
        vm_nwvw_2.object_name,
        vm_nwvw_2.object_type
        vm_nwvw_2.object_id
from    (
        select  /* SEL$2F1334C4 */
                -- no distinct, and t2.object_name and t1.rowid eliminated
                t1.owner,
                t1.object_name,
                t1.object_type
                t1.object_id
        from    (
                select  /* SEL$88A77D12 */
                        distinct
                        t1.rowid,        -- ensures we don't duplicate t1 rows
                        t1.owner,
                        t1.object_name,
                        t2.object_name,  -- seems redundant, but is in the trace file
                        t1.object_type
                        t1.object_id
                from
                        t1,
                        t2
                where
                        t2.object_type = 'TABLE'
                and     t1.owner = 'OUTLN'
                and     t1.object_name = t2.object_name
                ) 
        ) vm_nwvw_2
;

Transformation 3: aggregate into partial join

I realised only as I was writing this note that I had completely forgotten that the plan reported a semi join even though the subquery had been hinted with a no_semijoin hint, and that the reported semi join was actually a partial join.

However, the query block registry is identical with or without a partial join (controlled by the hint [no]partial_join) so there doesn’t seem to be a transformation stage corresponding to the choice of strategy. Maybe the apparently redundant query block layer allows the variation to appear if required.

It’s Difficult

A problem I have with the query block registry is deciding what it’s telling us – and maybe the trace file and the execution plan are not trying to tell us exactly the same thing. I think it’s quite difficult, anyway, to find a good way of presenting the information that is completely informative but clear and uncluttered.

Something that may help, when you can check the trace file and the final execution plan, is the order in which query blocks are registered. Some of them may be discarded, of course, as the optimizer works through options, some of them may be marked as COPY, but if you ignore those you may be able to see from what’s left the evolution of the plan. Here, for example, is the extract of the lines where each query block is registered, taken from the CBO trace for this query, with numbering:

32:Registered qb: SEL$1 0xc26e28e8 (PARSER)
36:  signature (): qb_name=SEL$1 nbfros=1 flg=0
37:    fro(0): flg=4 objn=76167 hint_alias="T1"@"SEL$1"

39:Registered qb: SEL$2 0xc26e0d58 (PARSER)
43:  signature (): qb_name=SEL$2 nbfros=1 flg=0
44:    fro(0): flg=4 objn=76168 hint_alias="T2"@"SEL$2"

966:Registered qb: SEL$683B0107 0xbcc187c8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
970:  signature (): qb_name=SEL$683B0107 nbfros=1 flg=0
971:    fro(0): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

973:Registered qb: SEL$5DA710D3 0xbb1fcb10 (SUBQUERY UNNEST SEL$1; SEL$2;)
977:  signature (): qb_name=SEL$5DA710D3 nbfros=2 flg=0
978:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
979:    fro(1): flg=5 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1471:Registered qb: SEL$2F1334C4 0xbcf4b210 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3)
1475:  signature (): qb_name=SEL$2F1334C4 nbfros=1 flg=0
1476:    fro(0): flg=5 objn=0 hint_alias="VM_NWVW_2"@"SEL$2F1334C4"

1478:Registered qb: SEL$88A77D12 0xbcda5540 (PROJECTION VIEW FOR CVM SEL$683B0107)
1482:  signature (): qb_name=SEL$88A77D12 nbfros=2 flg=0
1483:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1484:    fro(1): flg=1 objn=0 hint_alias="VW_NSO_1"@"SEL$5DA710D3"

1489:Registered qb: SEL$B186933D 0xbcda5540 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3)
1493:  signature (): qb_name=SEL$B186933D nbfros=2 flg=0
1494:    fro(0): flg=0 objn=76167 hint_alias="T1"@"SEL$1"
1495:    fro(1): flg=0 objn=76168 hint_alias="T2"@"SEL$2"

Because it’s a very simple query you can almost see the “thinking” in the clumping of the line numbers.

  • The first two registrations are the original query blocks.
  • After a break the next pair is the t2 subquery being unested and the query which is the join between t1 and the unnested t2.
  • After another break we see, in rapid succession, the view using the merged join view, the projection view using that merged join view, then the query block selecting from that projection.

The thing I find difficult to keep clear in my mind (when trying to describe what the trace/registry is saying, that is, not when just reading the plan) is the “doubling” effect where transformation steps often seem to produce two query blocks, one for the inline view containing the transformed object and one for the query that is now using the transformed object; and a further source of confusion appears when a query block seems to be able to peek into an inner query block to reference the objects in it. I just keep losing track of the layers!

It’s probablyh as safe as it’s going to be to read this note (unless someone points out an error). I don’t think there’s any more that I could find to say about the example.

August 24, 2021

qbregistry

Filed under: CBO,dbms_xplan,Oracle,Transformations — Jonathan Lewis @ 11:54 am BST Aug 24,2021

If you look at the “Outline Information” from an execution plan it shows you a list of hints that will (in theory, at least) recreate the execution plan and it’s this information that gets stored as the “injection” part of an SQL Plan Baseline. Unfortunately the hints won’t necessarily allow you to infer what transformations the optimizer has used to get to the final execution plan.

If you’re prepared to generate the CBO trace file you could examine the Query Block Registry that appears near the end of the trace file to get some clues – here’s an example from 19.11.0.0 for a simple query involving a single table plus an IN subquery:

Query Block Registry:
SEL$2 0x6d47cde8 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)
    SEL$2F1334C4 0x6d480e60 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$683B0107 0x6d47cde8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
    SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
    SEL$88A77D12 0x6d48e3a8 (PROJECTION VIEW FOR CVM SEL$683B0107)
      SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
SEL$1 0x6d480e60 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)

I’m not going to say anything about interpreting this extract because I want to highlight a recent feature of the dbms_xplan package (brought to my attention by Franck Pachot some time ago). One of the format options for displaying execution plans will report the query block registry. Here’s the output from display_cursor(format=>’qbregistry’)) in 21.3.0.0 for the query that produced the CBO trace extract above:

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Two things to notice here – first that the output has reduced the 9 lines to 7 lines (which can only be helpful). secondly that the redundant memory addresses which appear in the trace file don’t get copied into the report.

I’m still not going to say anything about interpreting the output because I want to show you the display_cursor() output for the same query when executed in 19.11.0..0. It looks like this:

Query Block Registry:
---------------------

  <q o="13"><n><![CDATA[SEL$88A77D12]]></n><p><![CDATA[SEL$683B0107]]></p><
        f><h><t><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_N
        SO_1]]></t><s><![CDATA[SEL$5DA710D3]]></s></h></f></q>
  <q o="12"><n><![CDATA[SEL$683B0107]]></n><p><![CDATA[SEL$2]]></p><f><h><t
        ><![CDATA[T2]]></t><s><![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![CDATA[S
        EL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[T2]]></t><s><![CDATA[S
        EL$2]]></s></h></f></q>
  <q o="18" f="y" h="y"><n><![CDATA[SEL$B186933D]]></n><p><![CDATA[SEL$88A7
        7D12]]></p><i><o><t>VW</t><v><![CDATA[SEL$683B0107]]></v></o></i><f><h><t
        ><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[T2]]></t><s
        ><![CDATA[SEL$2]]></s></h></f></q>
  <q o="19" h="y"><n><![CDATA[SEL$5DA710D3]]></n><p><![CDATA[SEL$1]]></p><i
        ><o><t>SQ</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[T1]]></t><
        s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_NSO_1]]></t><s><![CDATA[SEL$
        5DA710D3]]></s></h></f></q>
  <q o="15" f="y"><n><![CDATA[SEL$2F1334C4]]></n><p><![CDATA[SEL$5DA710D3]]
        ></p><f><h><t><![CDATA[VM_NWVW_2]]></t><s><![CDATA[SEL$2F1334C4]]></s></h
        ></f></q>

Yes, it’s naked XML (extracted from the v$sql_plan.other_xml column for operation 1).

I had been living in hope that someone else would write a messy bit of SQL to translate this into something readable – but the last time I searched the Internet for “other_xml qbregistry” I got the magical result of a Googlewhack (i.e. only one hit), which was in Russian, and largely a short description of all the options for the format command.

Since I’ve just installed 21.3 on a VM I decided to bite the bullet but I’ve taken the short-cut to writing the code. I’ve run a trace on a call to dbms_xplan.display_cursor() and extracted the critical query from the resulting trace file. Then I spent 30 minutes making it readable, hacking it to make it almost workable on 19c, then finding out why it can’t work without a little extra effort. Here’s the resulting hack:

rem
rem     Script:         qbregistry_query.sql
rem     Author:         Oracle Corp / Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Last tested 
rem             19.11.0.0
rem

define m_sql_id='232sya6twg7sq'
define m_origin = 2

with 
xml as (
        select  other_xml
        from    V$sql_plan 
        where   sql_id = '&m_sql_id' 
        and     id = 1
        and     other_xml is not null
),
allqbs as ( 
        select 
                extractvalue(d.column_value, '/q/n') qbname, 
                extractvalue(d.column_value, '/q/@f') final, 
                extractvalue(d.column_value, '/q/p') prev, 
                extractvalue(d.column_value, '/q/@o') origin 
        from 
                table(xmlsequence(extract(xmltype ((select other_xml from xml)), '/other_xml/qb_registry/q'))) d 
), 
inpqbs as ( 
        select 
                xml.qbname qbname, 
                listagg(xml.depqbs, ',') within group (
                        order by xml.depqbs) depqbs 
        from 
                xmltable('/other_xml/qb_registry/q/i/o' passing xmltype((select other_xml from xml)) 
                        columns depqbs varchar2(256) path 'v', 
                        qbname varchar2(256) path './../../n'
                ) xml 
        where     xml.depqbs in ( select qbname from allqbs) 
        group by xml.qbname
), 
recqb   (src, origin, dest, final, lvl, inpobjs) as ( 
        select 
                qbname src, origin origin, null dest, final final, 1 lvl, null inpobjs 
        from 
                allqbs
        where 
--              origin = &m_origin
                origin in (2,3)
        union all 
        select 
                a.qbname src, a.origin origin, a.prev dest, a.final final, lvl+1, 
                (select depqbs from inpqbs i where i.qbname = a.qbname) inpobjs 
        from
                allqbs a, 
                recqb r 
        where a.prev = r.src
)
search depth first by src asc set ordseq, 
finalans as ( 
        select 
                src,
/* 
                (
                select 
                        name 
                from    v$query_block_origin 
                where 
                        origin_id=origin
                )       origin, 
*/
                origin,
                dest, final, lvl, inpobjs 
        from recqb order by ordseq
) 
select
        /*+ opt_param('parallel_execution_enabled', 'false') */ 
        g.qbreg 
from (
        select 
                rpad(' ', 2*(lvl-1)) || 
                src || ' (' || origin || 
                        case when length(dest)>0 
                                then ' ' || dest 
                                else '' 
                        end || 
                        case when length(inpobjs)>0 
                                then ' ; ' || inpobjs 
                                else ' ' 
                        end ||
                        ')' || 
                        case when final='y' 
                                then ' [final]' 
                                else '' 
                        end 
                qbreg 
        from 
                finalans
        ) g
/

In lines 10 and 11 I’ve defined a couple of substitution variables that appear further on in the script. One is the SQL_ID for the query you’re interested in, the other is a fixed (probably) symbolic constant used by Oracle.

Lines 14-20 are a “with” subquery that I’ve prepended to Oracle’s internal code to create a single row, single column table holding the other_xml value of the query of interest. You’ll notice that I’ve been fairly casual about this bit since I haven’t catered for the fact that a single sql_id may have several child cursors and might even be obsolete.

Lines 28 and 36 are where I’ve used my “with” subquery to supply the other_xml value that would have appeared as a bind variable (:B1) in the trace file.

Line 49 (commented out for the reason described in footnote 1) uses the m_origin variable to identify a row in the dynamic performance view v$query_block_origin (highlighted at line 68) and that’s where we have a problem with Oracle 19c: the view doesn’t exist, nor does the x$qbname structure that the view is based on (although it’s easy to find a table of the values in the oracle executable – albeit that several items in the 21c list don’t appear in the 19.11 executable).

In the code above I’ve actually commented out the whole of the inline scalar subquery that translates an origin number into an origin name and reported the actual value of origin. Originally I did this to check whether it was worth spending any more working on the code – and this is the result I got the initial test:

SEL$1 (2 )
  SEL$5DA710D3 (19 SEL$1 ; SEL$2)
    SEL$2F1334C4 (15 SEL$5DA710D3 ) [final]
SEL$2 (2 )
  SEL$683B0107 (12 SEL$2 )
    SEL$88A77D12 (13 SEL$683B0107 )
      SEL$B186933D (18 SEL$88A77D12 ; SEL$683B0107) [final]

A quick check by eye shows that it’s got the same pattern and set of query block names that the 21c output produced so it’s clearly a step in the right direction. Now all I need is a way to translate the origin numbers into names.

I could have tried searching x$ksmfsv to see if I could spot a pointer to the relevant structure and fake my way through the whole process of creating a “nearly dynamic” performance view, but I decided the quick and dirty workaround was to dump a CSV file listing the view contents in 21c, then read the file back as an external table to copy the data into a local IOT (index organized table) called my_query_block_origin. With the inline view back in play – and the name suitably changed – the 19c and 21c queries produced the same result (which is slightly surprising as the “SUBQUERY UNNEST” and “VIEW MERGE” options don’t seem to exist in the 19.11 list I found in the oracle executable.)

Footnote 1

Here’s a query to show the content of that 21c view (which is fairly interesting in its own right):

set linesize 144
set pagesize 100
set trimspool on
set tabout off

column  name format a60
column  hint_token format a32

spool query_block_origin.lst

select
        origin_id,
        name,
        hint_token
from
        v$query_block_origin
/

 ORIGIN_ID NAME                                                         HINT_TOKEN
---------- ------------------------------------------------------------ --------------------------------
         0 NOT NAMED
         1 ALLOCATE
         2 PARSER
         3 HINT
         4 COPY
         5 SAVE
         6 MV REWRITE                                                   REWRITE
         7 PUSHED PREDICATE                                             PUSH_PRED
         8 STAR TRANSFORM SUBQUERY
         9 COMPLEX VIEW MERGE
        10 COMPLEX SUBQUERY UNNEST
        11 OR EXPANSION                                                 USE_CONCAT
        12 SUBQ INTO VIEW FOR COMPLEX UNNEST
        13 PROJECTION VIEW FOR CVM
        14 GROUPING SET TO UNION
        15 SPLIT/MERGE QUERY BLOCKS
        16 COPY PARTITION VIEW
        17 RESTORE
        18 VIEW MERGE                                                   MERGE
        19 SUBQUERY UNNEST                                              UNNEST
        20 STAR TRANSFORM                                               STAR_TRANSFORMATION
        21 INDEX JOIN
        22 STAR TRANSFORM TEMP TABLE
        23 MAP QUERY BLOCK
        24 VIEW ADDED
        25 SET QUERY BLOCK
        26 QUERY BLOCK TABLES CHANGED
        27 QUERY BLOCK SIGNATURE CHANGED
        28 MV UNION QUERY BLOCK
        29 SPLIT QUERY BLOCK FOR GSET-TO-UNION                          EXPAND_GSET_TO_UNION
        30 PREDICATES REMOVED FROM QUERY BLOCK                          PULL_PRED
        31 PREDICATES ADDED TO QUERY BLOCK
        32 OLD PUSHED PREDICATE                                         OLD_PUSH_PRED
        33 ORDER BY REMOVED FROM QUERY BLOCK                            ELIMINATE_OBY
        34 JOIN REMOVED FROM QUERY BLOCK                                ELIMINATE_JOIN
        35 OUTER-JOIN REMOVED FROM QUERY BLOCK                          OUTER_JOIN_TO_INNER
        36 STAR TRANSFORMATION JOINBACK ELIMINATION                     ELIMINATE_JOIN
        37 BITMAP JOIN INDEX JOINBACK ELIMINATION                       ELIMINATE_JOIN
        38 CONNECT BY COST BASED TRANSFORMATION                         CONNECT_BY_COST_BASED
        39 CONNECT BY WITH FILTERING                                    CONNECT_BY_FILTERING
        40 CONNECT BY WITH NO FILTERING                                 NO_CONNECT_BY_FILTERING
        41 CONNECT BY START WITH QUERY BLOCK
        42 CONNECT BY FULL SCAN QUERY BLOCK
        43 PLACE GROUP BY                                               PLACE_GROUP_BY
        44 CONNECT BY NO FILTERING COMBINE                              NO_CONNECT_BY_FILTERING
        45 VIEW ON SELECT DISTINCT
        46 COALESCED SUBQUERY                                           COALESCE_SQ
        47 QUERY HAS COALESCED SUBQUERIES                               COALESCE_SQ
        48 SPLIT QUERY BLOCK FOR DISTINCT AGG OPTIM                     TRANSFORM_DISTINCT_AGG
        49 CONNECT BY ELIMINATE DUPLICATES FROM INPUT                   CONNECT_BY_ELIM_DUPS
        50 CONNECT BY COST BASED TRANSFORMATION FOR WHR ONLY            CONNECT_BY_CB_WHR_ONLY
        51 TABLE EXPANSION                                              EXPAND_TABLE
        52 TABLE EXPANSION BRANCH
        53 JOIN FACTORIZATION SET QUERY BLOCK                           FACTORIZE_JOIN
        54 DISTINCT PLACEMENT                                           PLACE_DISTINCT
        55 JOIN FACTORIZATION BRANCH QUERY BLOCK
        56 TABLE LOOKUP BY NESTED LOOP QUERY BLOCK                      TABLE_LOOKUP_BY_NL
        57 FULL OUTER JOIN TRANSFORMED TO OUTER                         FULL_OUTER_JOIN_TO_OUTER
        58 LEFT OUTER JOIN TRANSFORMED TO ANTI                          OUTER_JOIN_TO_ANTI
        59 VIEW DECORRELATED                                            DECORRELATE
        60 QUERY VIEW DECORRELATED                                      DECORRELATE
        61 NOT EXISTS SQ ADDED
        62 BRANCH WITH OUTER JOIN
        63 BRANCH WITH ANTI JOIN
        64 UNION ALL FOR FULL OUTER JOIN
        65 VECTOR TRANSFORMATION                                        VECTOR_TRANSFORM
        66 VECTOR TRANSFORMATION TEMP TABLE
        67 QUERY ANSI REARCHiTECTURE                                    ANSI_REARCH
        68 VIEW ANSI REARCHiTECTURE                                     ANSI_REARCH
        69 ELIMINATION OF GROUP BY                                      ELIM_GROUPBY
        70 UAL BRANCH OF UNNESTED SUBQUERY
        71 QUERY BLOCK HAS BUSHY JOIN                                   BUSHY_JOIN
        72 SUBQUERY ELIMINATE                                           ELIMINATE_SQ
        73 OR EXPANSION UNION ALL BRANCH
        74 OR EXPANSION UNION ALL VIEW                                  OR_EXPAND
        75 DIST AGG GROUPING SETS UNION ALL TRANSFORMATION              USE_DAGG_UNION_ALL_GSETS
        76 MATERIALIZED WITH CLAUSE
        77 STATISTCS BASED TRANSFORMED QB
        78 PQ TABLE EXPANSION
        79 LEFT OUTER JOIN TRANSFORMED TO BOTH INNER AND ANTI
        80 SHARD TEMP TABLE
        81 BRANCH OF COMPLEX UNNESTED SET QUERY BLOCK
        82 DIST AGG GROUPING SETS OPTIMIZATION                          DAGG_OPTIM_GSETS


You’ll notice the highlight for origin_id 2 which has the name PARSER – that’s the (first) significant value when reporting the query block registry but take note, also, of origin_id 3 which has the name hint. This is where the code built into 21c goes wrong. If you use the qb_name hint to name all your query blocks then their origin_id will be 3, and Oracle’s code won’t find them.

When I added the hint /*+ qb_name(main) */ to the query this is what I got from my registry query:

Query Block Registry:
---------------------

  SEL$1 (PARSER)
    SEL$7D4DB4AA (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$1)
      SEL$EFD91A2C (PROJECTION VIEW FOR CVM SEL$7D4DB4AA)
        SEL$7086F02E (VIEW MERGE SEL$EFD91A2C ; SEL$7D4DB4AA) [FINAL]

And when I also added the hint /*+ qb_name(subq) */ to the subquery the result was this:

Query Block Registry:
---------------------

An uncaught error happened in display_cursor : ORA-06502: PL/SQL: numeric or value error

I’ve said for a long time: “always name all your query blocks”. I think 21c (temporarily) demonstrates why you have two options: name ALL of them or name NONE of them. If you name just some of them you might not notice that parts of your plan don’t appear in the registry report, and I’d say it’s better to see an error than to be fooled into thinking you’ve got complete information.

Footnote 2

There’s another new option for the format parameter in 21c which is qbregistry_graph. I haven’t considered playing about with the trace file to see if I can extract and hack the SQL that generates the appropriate output (but that might change if I pick up a tool to turn the textual description into a graphic). For the registry listing above this is what the “graph” output looks like:

Query Block Registry Graph (dot format):
---------------------
digraph g{
  rankdir = TB
  "SEL$88A77D12"
  "SEL$683B0107"
  "SEL$1"
  "SEL$2"
  "SEL$B186933D" [peripheries=2]
  "SEL$5DA710D3"
  "SEL$2F1334C4" [peripheries=2]
  "SEL$683B0107" -> "SEL$88A77D12" [label="PROJECTION VIEW FOR CVM"]
  "SEL$2" -> "SEL$683B0107" [label="SUBQ INTO VIEW FOR COMPLEX UNNEST"]
  "SEL$88A77D12" -> "SEL$B186933D" [label="VIEW MERGE"]
  "SEL$1" -> "SEL$5DA710D3" [label="SUBQUERY UNNEST"]
  "SEL$5DA710D3" -> "SEL$2F1334C4" [label="SPLIT/MERGE QUERY BLOCKS"]
  "SEL$683B0107" -> "SEL$B186933D" [style=dotted]
  "SEL$2" -> "SEL$5DA710D3" [style=dotted]
  { rank = same }
  {
    rank="sink";
    rankdir = LR;
    item1 [style=invis];
    item2 [shape="plaintext" label="Participating query blocks"];
    item3 [label="&nbsp;" peripheries=2];
    item4 [shape="plaintext" label="Final query blocks"];
    item1 -> item2 [style=dotted];
    { rank=same item3 item4; }
  }
}

Footnote 2.1 (a few days later)

A couple of days after publishing this note I received an email pointing out that the qbregistry_graph output is in the Graphviz DOT language (see http://www.graphviz.org/) and there are even websites where it can easily be rendered into graphic form. This is the result I got by pasting the output into the website:

I haven’t tried to think through a generalised pattern for drawing these pictures, but I think I’d prefer to see a diagram which showed that the “final” query block sel$2f1334c4 was used by query block sel$b186933d. After all, the word “final” in this context means only that the query block was one for which the optimizer produced an independent (sub-)plan.

Footnote 3

For completeness – here’s the original SQL and plan for the statement that produced this qbregistry example:

select
        *
from    t1
where   owner = 'OUTLN'
and     object_name in (
                select  /*+
                                unnest
                                no_semijoin
                        */
                        distinct object_name
                from   t2
                where  object_type = 'TABLE'
        )
;

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |       |       |     5 (100)|          |
|   1 |  VIEW                                  | VM_NWVW_2 |     1 |   483 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                          |           |     1 |   155 |     5  (20)| 00:00:01 |
|   3 |    NESTED LOOPS SEMI                   |           |     1 |   155 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1        |     1 |   128 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_I1     |     1 |       |     1   (0)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2        |     1 |    27 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T2_I2     |    48 |       |     0   (0)|          |
----------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$B186933D / "VM_NWVW_2"@"SEL$2F1334C4"
   2 - SEL$B186933D
   4 - SEL$B186933D / "T1"@"SEL$1"
   5 - SEL$B186933D / "T1"@"SEL$1"
   6 - SEL$B186933D / "T2"@"SEL$2"
   7 - SEL$B186933D / "T2"@"SEL$2"

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      DB_VERSION('21.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$B186933D")
      MERGE(@"SEL$683B0107" >"SEL$5DA710D3")
      OUTLINE_LEAF(@"SEL$2F1334C4")
      OUTLINE(@"SEL$88A77D12")
      OUTLINE(@"SEL$683B0107")
      OUTLINE(@"SEL$5DA710D3")
      UNNEST(@"SEL$2" UNNEST_INNERJ_DISTINCT_VIEW)
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$2F1334C4" "VM_NWVW_2"@"SEL$2F1334C4")
      INDEX_RS_ASC(@"SEL$B186933D" "T1"@"SEL$1" ("T1"."OWNER"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T1"@"SEL$1")
      INDEX_RS_ASC(@"SEL$B186933D" "T2"@"SEL$2" ("T2"."OBJECT_TYPE"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T2"@"SEL$2")
      LEADING(@"SEL$B186933D" "T1"@"SEL$1" "T2"@"SEL$2")
      USE_NL(@"SEL$B186933D" "T2"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$B186933D" UNIQUE)
      PARTIAL_JOIN(@"SEL$B186933D" "T2"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("OWNER"='OUTLN')
   6 - filter("OBJECT_NAME"="OBJECT_NAME")
   7 - access("OBJECT_TYPE"='TABLE')

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Tables t1 and t2 are copies of the same data set, which is a subset of 100 rows from all_objects. You won’t necessarily see this plan on your systems because (even with the hints) the plan can vary depending on the number of rows with owner = ‘OUTLN’ (which is likely to be zero) or with object_type = ‘TABLE’ (which might be all of them). The script I started with was one I had used in a note I wrote about “distinct” appearing in the select list of subqueries, but the data it produced in the newer versions of Oracle was sufficiently different that I had to be a little more careful in constructing a data set that produced stable plans.

If you cross-check the Query Block Registry with the Outline Information you’ll see that the lines labelled FINAL start with the query block names that are shown as “outline_leaf” entries, and the other 5 query block names appear as “outline” entries.

Reading down the tree I then find myself strugling to interpret the QBR. I think I know what has happened, but I can’t quite manage to see exactly how the QBR is telling me that it happened.

QBR – tentative interpretation

Part of the difficulty is that the QBR seems to have a section for every initial query block in the query, so there’s (a) likely to be some overlap between sections and (b) some sequencing that means you can’t get the full picture just by reading straight from top to bottom. In this case we have two initial query blocks (the main query, implicitly named sel$1, and the subquery implicitly named sel$2), and I think the interpretation is as follows:

Starting with sel$2 section we can see that its second line tells us that the subquery was unnested and the resulting aggregate inline view is the sole content of a query block called sel$683B0107.

Jumping backwards to the sel$1 section, its second line tells us that sel$5DA710D3 is a query block consisting of a join between t1 and the inline aggregate view.

Sticking with the sel$1 tree, we then see a query block that tells us that the optimizer has transformed an “aggregate then join” into a “join then aggregate”. sel$2F1334C4 is the query block holding nothing but a select from the view VM_VWNW_2.

Returning to the sel$2 tree, sel$88A77D12 is the resulting query block when the inline aggregate view is merged using complex view merging. This is where I get a bit stuck, because this seems to be repeating a step that we’ve handled in the sel$1 section by a different route.

The final step of the sel$2 tree is sel$B186933D the query block where we select from the non-mergeable inline view VM_VWNW_2 that seems to have come from one of two different places.

Bottom line on this one: even though it’s an extremely simple query and I believe I understand what the execution plan is telling us about the transformations that took place, the query block registry is still something of a mystery to me.

August 23, 2021

Distributed Query

Filed under: distributed,Execution plans,Hints,Oracle,subqueries,Transformations,Troubleshooting — Jonathan Lewis @ 5:24 pm BST Aug 23,2021

Here’s an example that appeared on the Oracle Developer Community forum about a year ago that prompted me to do a little investigative work. The question involved a distributed query that was “misbehaving” – the interesting points were the appearance of the /*+ rule */ and /*+ driving_site() */ hints in the original query when combined with a suggestion to address the problem using the /*+ materialize */ hint with factored subqueries (common table expressions – CTEs), or when combined with my suggestion to use the /*+ no_merge */ hint.

If you don’t want to read the whole article there’s a tl;dr summary just before the end.

The original question was posed with a handful of poorly constructed code fragments that were supposed to describe the problem, viz:


select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ; -- query n°1

select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table); -- query n°2

select * from
select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ,
select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table)
where att_table_1 = att_table_2  -- sic

The crux of the problem was that the two separate statements individually produced an acceptable execution plan but the attempt to use the queries in inline views with a join resulted in a plan that (from the description) sounded like the result of Oracle merging the two inline views and running the two IN subqueries as FILTER (existence) subqueries.

We weren’t shown any execution plans and only had the title of the question (“Distributed sql query through multiple databases”) to give us the clue that there might be three different databases involved.

Obviously there are several questions worth asking when presented with this problem. The first being “can we have a more realistic piece of code”, also “which vesion of Oracle”, and “where are the execution plans”. I can’t help feeling that there’s more to the problem than just the three tables that seem to be suggested by the fragments supplied.

More significant, though, was the surprise that rule and driving_site should work together. There’s a long-standing (but incorrect) assertion that “any other hint invalidates the RULE hint”. I think I’ve published an example somewhere showing that /*+ unnest */ would affect an execution plan where the optimizer still obeyed the /*+ rule */ hint, and there’s an old post on this blog which points out that transformation and optimisation are (or were, at the time) independent of each other, implying that you could combine the rule hint with “transformational” hints and still end up with a rule-based execution plan.

Despite old memories suggesting the contrary my first thought was that the rule and driving_site hints couldn’t be working together – and that made it worth running a little test. Then one of the other specialists on the forums suggested using subquery factoring with the materialize hint – and I thought that probably wouldn’t help because when you insert into a global temporary table the driving site has to become the site that holds the global temporary tables (in fact this isn’t just a feature of GTTs). So there was another thing prompting me to run a test. (And then I suggested using the /*+ no_merge */ hint – but thought I’d check if that idea was going to work before I suggested it.)

So here’s a code sample to create some data, and the first two simple queries with calls for their predicted execution plans:

rem
rem     Script:         distributed_multi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

rem     create public database link test@loopback using 'test';
rem     create public database link test2@loopback using 'test2';

rem     create public database link orcl@loopback using 'orcl';
rem     create public database link orcl2@loopback using 'orcl2';

rem     create public database link orclpdb@loopback using 'orclpdb';
rem     create public database link orclpdb2@loopback using 'orclpdb2';

define m_target=test@loopback
define m_target2=test2@loopback

define m_target=orcl@loopback
define m_target2=orcl2@loopback

define m_target=orclpdb@loopback
define m_target2=orclpdb2@loopback

create table t0
as
select  *
from    all_objects
where   mod(object_id,4) = 1
;

create table t1
as
select  *
from    all_objects
where   mod(object_id,11) = 0
;

create table t2
as
select  *
from    all_Objects
where   mod(object_id,13) = 0
;

explain plan for
select  /*+ driving_site(t1) */
        t1.object_name, t1.object_id
from    t1@&m_target    t1
where
        t1.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

explain plan for
select
        /*+ rule driving_site(t2) */
        t2.object_name, t2.object_id
from    t2@&m_target2   t2
where
        t2.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

Reading from the top down – t0 is in the local database, t1 is in remote database 1, t2 is in remote database 2. I’ve indicated the creation and selection of a pair of public database links at the top of the script – in this case both of them are loopback links to the local database, but I’ve used substitition variables in the SQL to allow me to adjust which databases are the remote ones. Since there are no indexes on any of the tables the optimizer is very limited in its choice of execution plans, which are as follows in 19.3 (the oraclepdb/orclpdb2 links).

First, the query against t1@orclpdb1 – which will run cost-based:


-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|*  1 |  HASH JOIN SEMI        |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL    | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   3 |   REMOTE               | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A1"."OBJECT_ID"="A2"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement

You’ll note that operation 3 is simply REMOTE, and t0 is the object accessed – which means this query is behaving as if the (local) t0 table is the remote one as far as the execution plan is concerned. The IN-OUT column tells us that this operation is “Remote to Serial” (R->S)” and the instance called to is named “!” which is how the local database is identified in the plan from a remote database.

We can also see that the execution plan gives us the “Remote SQL Information” for operation 2 – and that’s the text of the query that gets sent by the driving site to the instance that holds the object of interest. In this case the query is simply selecting the object_id values from all the rows in t0.

Now the plan for the query against t2@orclpdb2 which includes a /*+ rule */ hint:

-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   SORT JOIN            |          |        |      |
|   3 |    TABLE ACCESS FULL   | T2       | ORCLP~ |      |
|*  4 |   SORT JOIN            |          |        |      |
|   5 |    VIEW                | VW_NSO_1 | ORCLP~ |      |
|   6 |     SORT UNIQUE        |          |        |      |
|   7 |      REMOTE            | T0       |      ! | R->S |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A1"."OBJECT_ID"="OBJECT_ID")
       filter("A1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   7 - SELECT /*+ RULE */ "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement
   - rule based optimizer used (consider using cbo)

The most striking feature of this plan is that it is an RBO (rule based optimizer) plan not a cost-based plan – and the Note section confirms that observation. We can also see that the Remote SQL Information is echoing the /*+ RULE */ hint back in it’s query against t0. Since the query is operating rule-based the hash join mechanism is not available (it’s a costed path – it needs to know the size of the data that will be used in the build table), and that’s why the plan is using a sort/merge join.

Following the “incremental build” strategy for writing SQL all we have to do as the next step of producing the final code is put the two queries into separate views and join them:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ rule driving_site(t2) */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

And here’s the execution plan – which, I have to admit, gave me a bit of a surprise on two counts when I first saw it:


-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT       |          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   MERGE JOIN           |          |        |      |
|   3 |    MERGE JOIN          |          |        |      |
|   4 |     SORT JOIN          |          |        |      |
|   5 |      REMOTE            | T2       | ORCLP~ | R->S |
|*  6 |     SORT JOIN          |          |        |      |
|   7 |      REMOTE            | T1       | ORCLP~ | R->S |
|*  8 |    SORT JOIN           |          |        |      |
|   9 |     VIEW               | VW_NSO_1 |        |      |
|  10 |      SORT UNIQUE       |          |        |      |
|  11 |       TABLE ACCESS FULL| T0       |        |      |
|* 12 |   SORT JOIN            |          |        |      |
|  13 |    VIEW                | VW_NSO_2 |        |      |
|  14 |     SORT UNIQUE        |          |        |      |
|  15 |      TABLE ACCESS FULL | T0       |        |      |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   8 - access("T2"."OBJECT_ID"="OBJECT_ID")
       filter("T2"."OBJECT_ID"="OBJECT_ID")
  12 - access("T1"."OBJECT_ID"="OBJECT_ID")
       filter("T1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   5 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T2" "T2"
       (accessing 'ORCLPDB2.LOCALDOMAIN@LOOPBACK' )

   7 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T1" "T1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - rule based optimizer used (consider using cbo)

The two surprises were that (a) the entire plan was rule-based, and (b) the driving_site() selection has disappeared from the plan.

Of course as soon as I actually started thinking about what I’d written (instead of trusting the knee-jerk “just stick the two bits together”) the flaw in the strategy became obvious.

  • Either the whole query runs RBO or it runs CBO – you can’t split the planning.
  • In the words of The Highlander “There can be only one” (driving site that is) – only one of the database involved will decide how to decompose and distribute the query.

It’s an interesting detail that the /*+ rule */ hint seems to have pushed the whole query into the arms of the RBO despite being buried somewhere in the depths of the query rather than being in the top level query block – but we’ve seen that before in some old data dictionary views.

The complete disregard for the driving_site() hints is less interesting – there is, after all, a comment in the manuals somewhere to the effect that when two hints contradict each other they are both ignored. (But I did wonder why the Hint Report that should appear with 19.3 plans didn’t tell me that the hints had been observed but not used.)

The other problem (from the perspective of the OP) is that the two inline views have been merged so the join order no longer reflects the two isolated components we used to have. So let’s fiddle around a little bit to see how close we can get to what the OP wants. The first step would be to add the /*+ no_merge */ hint to both inline view, and eliminate one of the /*+ driving_site() */ hints to see what happens, and since we’re modern we’ll also get rid of the /*+ rule */ hint:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ qb_name(subq1) no_merge driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ qb_name(subq2) no_merge */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|   2 |   VIEW                 |      |  4342 |   334K|    14   (8)| 00:00:01 |        |      |
|   3 |    REMOTE              |      |       |       |            |          |      ! | R->S |
|   4 |   VIEW                 |      |  5168 |   398K|    57   (8)| 00:00:01 |        |      |
|*  5 |    HASH JOIN SEMI      |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   6 |     TABLE ACCESS FULL  | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   7 |     REMOTE             | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A2"."OBJECT_ID"="A1"."OBJECT_ID")
   5 - access("A3"."OBJECT_ID"="A6"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - EXPLAIN PLAN INTO "PLAN_TABLE" FOR SELECT /*+ QB_NAME ("SUBQ2") NO_MERGE */
       "A1"."OBJECT_NAME","A1"."OBJECT_ID" FROM  (SELECT DISTINCT "A3"."OBJECT_ID"
       "OBJECT_ID" FROM "T0" "A3") "A2","T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1" WHERE
       "A1"."OBJECT_ID"="A2"."OBJECT_ID" (accessing '!' )

   7 - SELECT "OBJECT_ID" FROM "T0" "A6" (accessing '!' )

Note
-----
   - fully remote statement

In this plan we can see that the /*+ driving_site() */ hint has been applied – the plan is presented from the point of view of orclpdb (the database holding t1). The order of the two inline views has apparently been reversed as we move from the statement to its plan – but that’s just a minor side effect of the hash join (picking the smaller result set as the build table).

Operations 5 – 7 tell us that t1 is treated as the local table and used for the build table in a hash semi-join, and then t0 is accessed by a call back to our database and its result set is used as the probe table.

From operation 3 (in the body of the plan, and in the Remote SQL Information) we see that orclpdb has handed off the entire t2 query block to a remote operation – which is ‘accessing “!”. But there’s a problem (in my opinion) in the SQL that it’s handing off – the text is NOT the text of our inline view; it’s already been through a heuristic transformation that has unnested the IN subquery of our original text into a “join distinct view” – if we had used a hint to force this transformation it would have been the /*+ unnest(UNNEST_INNERJ_DISTINCT_VIEW) */ variant.

SELECT /*+ NO_MERGE */
        "A1"."OBJECT_NAME","A1"."OBJECT_ID"
FROM
       (SELECT DISTINCT "A3"."OBJECT_ID" "OBJECT_ID" FROM "T0" "A3") "A2",
       "T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1"
WHERE
        "A1"."OBJECT_ID"="A2"."OBJECT_ID"

I tried to change this by adding alternative versions of the /* unnest() */ hint to the original query, following the query block names indicated by the outline information (not shown), but it looks as if the code path constructs the Remote SQL operates without considering the main query hints – perhaps the decomposition code is simply following the code path of the old heuristic “I’ll do it if it’s legal” unnest. The drawback to this is that if the original form of the text had been sent to the other site the optimizer that had to handle it could have used cost-based query transformation and may have come up with a better plan.

You may be wondering why I left the /*+ driving_site() */ hint in one of the inline views rather than inserting it in the main query block. The answer is simple – it didn’t seem to work (even in 19.3) when I put /*+ driving_site(t1@subq1) */ in the main query block.

tl;dr

The optimizer has to operate rule-based or cost-based, it can’t do a bit of both in the same query – so if you’ve got a /*+ RULE */ hint that takes effect anywhere in the query the entire query will be optimised under the rule-based optimizer.

There can be only one driving site for a query, and if you manage to get multiple driving_site() hints in a query that contradict each other the optimizer will ignore all of them.

When the optimizer decomposes a distributed query and produces non-trivial components to send to remote sites you may find that some of the queries constructed for the remote sites have been subject to transformations that you cannot influence by hinting.

Footnote

I mentioned factored subqueries and the /*+ materialize */ option in the opening notes. In plans where the attempt to specify the driving site failed (i.e. when the query ran locally) the factored subqueries did materialize. In any plans where the driving site was a remote site the factored subqueries were always inline. This may well be related to the documented (though not always implemented) restriction that temporary tables cannot take part in distributed transactions.

August 9, 2021

Preferences

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 12:58 pm BST Aug 9,2021

I’ve made a few comments in the past about setting “table preferences” for stats collection – most significantly the table_cached_blocks preference that affects the calculation of the clustering_factor of all the indexes on the table, the incremental preference for dictating the strategy used for dealing with partitioned tables, and the method_opt preference for dictating precise requirments for histograms.

If you want to check the current preferences set for a table you can query the XXXX_tab_stat_prefs views. For some reason in the dim and distant past (perhaps in a beta release before the views had been created but perhaps because the views show only the preferences that have been set) I wrote a little script to report all the possible table preferences showing both the table value and the current global value.

rem
rem     Script: get_table_prefs.sql
rem     Dated:  ???
rem     Author: Jonathan Lewis
rem
rem     Last tested
rem             19.11.0.0
rem
rem     Notes
rem     Report the table preferences for a given
rem     owner and table.
rem
rem     Needs to find a list of all legal preferences.
rem             Global prefs are in:    optstat_hist_control$ (sname, spare4)
rem             Table prefs are in:     optstat_user_prefs$ (valchar / valnum)
rem
rem     The public view is dba_tab_stat_prefs / user_tab_stat_prefs.
rem     But if a table has no prefs set there are no rows in the view
rem
rem     This script currently has to be run by sys or a user with 
rem     the select privileges on sys.optstat_hist_control$ (and
rem     execute on dbms_stats).
rem

define m_owner = '&enter_schema'
define m_table = '&enter_tablename'


<<anon_block>>
declare
        pref_count      number(2,0) := 0;
begin
        dbms_output.new_line;
        dbms_output.put_line(
                        rpad('Preference',32) || ' ' ||
                        rpad('Table value',32) || ' ' ||
                        '[Global value]'
        );
        dbms_output.put_line(
                        rpad('=',32,'=') || ' ' ||
                        rpad('=',32,'=') || ' ' ||
                        '================================'
        );
        for c1 in (
                select  sname, spare4 
                from    sys.optstat_hist_control$
                where   spare4 is not null
        ) loop
                anon_block.pref_count := anon_block.pref_count + 1;
                
                dbms_output.put_line(
                        rpad(c1.sname,32) || ' ' ||
                        rpad(dbms_stats.get_prefs(c1.sname,'&m_owner','&m_table'),32) || ' ' 
                        || '[' || c1.spare4 || ']'
                );      

        end loop;
        dbms_output.new_line;
        dbms_output.put_line('Preferences reported: ' || anon_block.pref_count);
end;
/

While I’ve hardly ever used the script – and so haven’t considered reviewing the strategy it uses – the benefit of having it around means that when I have run it I’ve occasionally discovered new preferences that I hadn’t previously noticed (and ought to investigate).

Here’s a sample of the output – from a table with no special settings for preferences:

Preference                       Table value                      [Global value]
================================ ================================ ================================
TRACE                            0                                [0]
DEBUG                            0                                [0]
SYS_FLAGS                        1                                [1]
SPD_RETENTION_WEEKS              53                               [53]
CASCADE                          DBMS_STATS.AUTO_CASCADE          [DBMS_STATS.AUTO_CASCADE]
ESTIMATE_PERCENT                 DBMS_STATS.AUTO_SAMPLE_SIZE      [DBMS_STATS.AUTO_SAMPLE_SIZE]
DEGREE                           NULL                             [NULL]
METHOD_OPT                       FOR ALL COLUMNS SIZE AUTO        [FOR ALL COLUMNS SIZE AUTO]
NO_INVALIDATE                    DBMS_STATS.AUTO_INVALIDATE       [DBMS_STATS.AUTO_INVALIDATE]
GRANULARITY                      AUTO                             [AUTO]
PUBLISH                          TRUE                             [TRUE]
STALE_PERCENT                    10                               [10]
APPROXIMATE_NDV                  TRUE                             [TRUE]
APPROXIMATE_NDV_ALGORITHM        REPEAT OR HYPERLOGLOG            [REPEAT OR HYPERLOGLOG]
ANDV_ALGO_INTERNAL_OBSERVE       FALSE                            [FALSE]
INCREMENTAL                      FALSE                            [FALSE]
INCREMENTAL_INTERNAL_CONTROL     TRUE                             [TRUE]
AUTOSTATS_TARGET                 AUTO                             [AUTO]
CONCURRENT                       OFF                              [OFF]
JOB_OVERHEAD_PERC                1                                [1]
JOB_OVERHEAD                     -1                               [-1]
GLOBAL_TEMP_TABLE_STATS          SESSION                          [SESSION]
ENABLE_TOP_FREQ_HISTOGRAMS       3                                [3]
ENABLE_HYBRID_HISTOGRAMS         3                                [3]
TABLE_CACHED_BLOCKS              1                                [1]
INCREMENTAL_LEVEL                PARTITION                        [PARTITION]
INCREMENTAL_STALENESS            ALLOW_MIXED_FORMAT               [ALLOW_MIXED_FORMAT]
OPTIONS                          GATHER                           [GATHER]
GATHER_AUTO                      AFTER_LOAD                       [AFTER_LOAD]
STAT_CATEGORY                    OBJECT_STATS, REALTIME_STATS     [OBJECT_STATS, REALTIME_STATS]
SCAN_RATE                        0                                [0]
GATHER_SCAN_RATE                 HADOOP_ONLY                      [HADOOP_ONLY]
PREFERENCE_OVERRIDES_PARAMETER   FALSE                            [FALSE]
AUTO_STAT_EXTENSIONS             OFF                              [OFF]
WAIT_TIME_TO_UPDATE_STATS        15                               [15]
ROOT_TRIGGER_PDB                 FALSE                            [FALSE]
COORDINATOR_TRIGGER_SHARD        FALSE                            [FALSE]
MAINTAIN_STATISTICS_STATUS       FALSE                            [FALSE]
AUTO_TASK_STATUS                 OFF                              [OFF]
AUTO_TASK_MAX_RUN_TIME           3600                             [3600]
AUTO_TASK_INTERVAL               900                              [900]

Preferences reported: 41

As the notes that I’ve left in-line say: this version of the script has to be run by SYS or a DBA because of the privileges required.

You might notice, by the way , that this is one of those rare cases where I’ve remembered to use a label to name the PL/SQL block, and then used the label to qualify the name of a variable I’ve used inside the block.

Next Page »

Website Powered by WordPress.com.