Oracle Scratchpad

August 23, 2019

Optimizer Tricks 1

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 23,2019

I’ve got a number of examples of clever little tricks the optimizer can do to transform your SQL before starting in on the arithmetic of optimisation. I was prompted to publish this one by a recent thread on ODC. It’s worth taking note of these tricks when you spot one as a background knowledge of what’s possible makes it much easier to interpret and trouble-shoot from execution plans. I’ve labelled this one “#1” since I may publish a few more examples in the future, and then I’ll have to catalogue them – but I’m not making any promises about that.

Here’s a table definition, and a query that’s hinted to use an index on that table.


rem
rem     Script:         optimizer_tricks_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem

create table t1 (
        v1      varchar2(10),
        v2      varchar2(10),
        v3      varchar2(10),
        padding varchar2(100)
);

create index t1_i1 on t1(v1, v2, v3);


explain plan for
select
        /*+ index(t1 (v1, v2, v3)) */
        padding 
from 
        t1
where
        v1 = 'ABC'
and     nvl(v3,'ORA$BASE') = 'SET2'
;

select * from table(dbms_xplan.display);

The query uses the first and third columns of the index, but wraps the 3rd column in an nvl() function. Because of the hint the optimizer will generate a plan with an index range scan, but the question is – what will the Predicate Information tell us about Oracle’s use of my two predicates:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC')
       filter(NVL("V3",'ORA$BASE')='SET2')

The nvl() test is used during the index range scan (from memory I think much older versions of Oracle would have postponed the predicate test until they had accessed the table itself). This means Oracle will do a range scan over the whole section of the index where v1 = ‘ABC’, testing every index entry it finds against the nvl() predicate.

But what happens if we modify column v3 to be NOT NULL? (“alter table t1 modify v3 not null;”) Here’s the new plan:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC' AND "V3"='SET2')
       filter("V3"='SET2')


The optimizer will decide that with the NOT NULL status of the column the nvl() function can be eliminated and the predicate can be replaced with a simple column comparison. At this point the v3 predicate can now be used to reduce the number of index entries that need to be examined by using a type of skip-scan/iterator approach, but Oracle still has to test the predciate against the index entries it walks through – so the predicate still appears as a filter predicate as well.

You might notice, by the way, that the Plan hash value does not change as the predicate use changes – even though the change in use of predicates could make a huge difference to the performance. (As indicated in the comments at the top of the script, I’ve run this model against 11.2.0.4 – which is the version used in the ODC thread – and 19.3.0.0: the behaviour is the same in both versions, and the Plan hash value doesn’t change from version to version.)

Footnote

The reason why I decided to publish this note is that the original thread on the ODC forums reported the Following contradictory details – an index definition and the optimizer’s use of that index as shown in the predicate section of the plan:


Index column name      Column position
---------------------- ----------------
FLEX_VALUE_SET_ID      1
PARENT_FLEX_VALUE      2
RANGE_ATTRIBUTE        3
CHILD_FLEX_VALUE_LOW   4
CHILD_FLEX_VALUE_HIGH  5
ZD_EDITION_NAME        6

---------------------------------------------------------------------------
|* 17 |      INDEX RANGE SCAN             | FND_FLEX_VALUE_NORM_HIER_U1   |
---------------------------------------------------------------------------
  17 - access("FLEX_VALUE_SET_ID"=:B1 AND NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2')  
       filter((NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2'  ..... lots more bits of filter predicate.

Since the expression nvl(zd_edition_name, ‘ORA$BASE’) = ‘SET2’ appears as an access predicate and a filter predicate it must surely be a column in the index. So either this isn’t the definition of the index being used or, somehow, there’s a trick that allows zd_edition_name to appear as a column name in the index when it really means nvl(zd_edition_name,’ORA$BASE’) at run-time. (And if there is I want to know what it is – edition-based redefinition and tricks with virtual columns spring to mind, but I avoid thinking about complicated explanations when a simpler one might be available.)

 

August 14, 2019

gather_system_stats

Filed under: CBO,Exadata,Oracle,Statistics,System Stats — Jonathan Lewis @ 2:20 pm BST Aug 14,2019

What happens when you execute dbms_stats.gather_system_stats() with the ‘Exadata’ option ?

Here’s what my system stats look like (12.2.0.1 test results) after doing so. (The code to generate the two different versions is at the end of the note).


System Stats
============
Status: COMPLETED
Timed: 13-Aug-2019 15:00:00 - 13-Aug-2019 15:00:00
--------------------------------------------------
CPUSPEED        :
CPUSPEEDNW      :          918
IOSEEKTIM       :           10
IOTFRSPEED      :      204,800
MAXTHR          :
MBRC            :          128
MREADTIM        :
SLAVETHR        :
SREADTIM        :

PL/SQL procedure successfully completed.

MBRC       :          128
MREADTIM   :
SREADTIM   :
CPUSPEED   :
CPUSPEEDNW :          918
IOSEEKTIM  :           10
IOTFRSPEED :      204,800
MAXTHR     :
SLAVETHR   :

PL/SQL procedure successfully completed.

All the code does is set the MBRC, IOSEEKTIM, and IOTFRSPEED to fixed values and the only real gather is the CPUSPEEDNW. The parameters showing blanks are deliberately set null by the procedure – before I called the gather_system_stats() every parameter had a value. You could also check the SQL trace file (with bind captured enabled) to see the statements that deliberately set those parameters to null if you want more proof.

What are the consequences of this call (assuming you haven’t also done something with the calibrate_io() procedure? Essentially Oracle now has information that says a single (8KB) block read request will take marginally over 10 milliseconds, and a multiblock read request of 1MB will take just over 15 milliseconds: in other words “tablescans are great, don’t use indexes unless they’re really precisely targetted”. To give you a quantitative feel for the numbers: given the choice between doing a tablescan of 1GB to pick 1,500 randomly scattered rows and using a perfect index the optimizer would choose the index.

To explain the time calculations: Oracle has set an I/O seek time of 10 ms, and a transfer rate of 204,800 bytes per ms (200 MB/s), with the guideline that a “typical” multiblock read is going to achieve 128 blocks. So the optimizer believes a single block read will take 10 + 8192/204800 ms = 10.04ms, while a multiblock read request for 1MB will take 10 + 1048576/204800 ms = 15.12 ms.

It’s also important to note that Oracle will use the 128 MBRC value in its calculation of the cost of the tablescan – even if you’ve set the db_file_mulitblock_read_count parameter for the session or system to something smaller; and if you have set the db_file_multiblock_read_count that’s the maximum size of multiblock read that the run-time engine will use.

Code

Here are the two procedures I used to report the values above. You will only need the privilege to execute the dbms_stats package for the second one, but you’ll need the privilege to access the SYS table aux_stats$ to use the first. The benefit of the first one is that it can’t go out of date as versions change.


rem
rem     Script:         get_system_stats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

set linesize 180
set trimspool on
set pagesize 60
set serveroutput on

spool get_system_stats

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        for r1 in (
                select  rownum rn, pname
                from    sys.aux_stats$
                where   sname = 'SYSSTATS_MAIN'
        ) loop
                dbms_stats.get_system_stats(m_status, m_start, m_stop, r1.pname, m_value);
                if r1.rn = 1 then
                        dbms_output.put_line('System Stats');
                        dbms_output.put_line('============');
                        dbms_output.put_line('Status: ' || m_status);
                        dbms_output.put_line(
                                'Timed: ' ||
                                to_char(m_start,'dd-Mon-yyyy hh24:mi:ss') ||
                                ' - ' ||
                                to_char(m_stop ,'dd-Mon-yyyy hh24:mi:ss')
                        );
                        dbms_output.put_line('--------------------------------------------------');
                end if;
                dbms_output.put_line(rpad(r1.pname,15) ||  ' : ' || to_char(m_value,'999,999,999'));
        end loop;
end;
/

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MBRC', m_value);
        dbms_output.put_line('MBRC       : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MREADTIM', m_value);
        dbms_output.put_line('MREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SREADTIM', m_value);
        dbms_output.put_line('SREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEED', m_value);
        dbms_output.put_line('CPUSPEED   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEEDNW', m_value);
        dbms_output.put_line('CPUSPEEDNW : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOSEEKTIM', m_value);
        dbms_output.put_line('IOSEEKTIM  : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOTFRSPEED', m_value);
        dbms_output.put_line('IOTFRSPEED : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MAXTHR', m_value);
        dbms_output.put_line('MAXTHR     : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SLAVETHR', m_value);
        dbms_output.put_line('SLAVETHR   : ' || to_char(m_value,'999,999,999'));
end;
/

spool off

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

 

July 1, 2019

opt_estimate 4

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:18 pm BST Jul 1,2019

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

June 28, 2019

opt_estimate 3

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:12 pm BST Jun 28,2019

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

June 25, 2019

opt_estimate 2

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).

Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins.

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

June 6, 2019

Scalar Subquery Costing

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 7:54 pm BST Jun 6,2019

A question came up on Oracle-l list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem     Script:         ssq_costing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;

insert into t_1k
  select
         level
    from dual
    connect by level <= 1e3;

insert into t_100k
  select level
    from dual
    connect by level <= 1e5;

commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1K') ;
  dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/

explain plan for
select 
        /*+ qb_name(QB_MAIN) */
        (
        select /*+ qb_name(QB_SUBQ) */ count(*)
        from t_1k
        where t_1k.n1 = t_100k.n1
        )
from t_100k
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|  1533K  (2)| 00:01:00 |
|   1 |  SORT AGGREGATE    |        |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |    17   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    36   (9)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the t_1k tablescan, 36.356072 for the t_100K tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the t_100K.n1 column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the get_column_stats() and set_column_stats() procedures in the dbms_stats package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.


delete from plan_table;
commit;

declare

        srec                    dbms_stats.statrec;
        n_array                 dbms_stats.numarray;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;


begin

        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 't_100k',
                colname         => 'n1', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        for i in 1 .. 20 loop

                m_distcnt := 1000 * i;
                m_density := 1/m_distcnt;

                dbms_stats.set_column_stats(
                        ownname         => user,
                        tabname         => 't_100k',
                        colname         => 'n1', 
                        distcnt         => m_distcnt,
                        density         => m_density,
                        nullcnt         => m_nullcnt,
                        srec            => srec,
                        avgclen         => m_avgclen
                ); 


        execute immediate
        '
                explain plan set statement_id = ''' || m_distcnt || 
        '''
                for
                select
                        /*+ qb_name(QB_MAIN) */
                        (
                        select /*+ qb_name(QB_SUBQ) */ count(*)
                        from t_1k
                        where t_1k.n1 = t_100k.n1
                        )
                from t_100k
        ';
        
        end loop;       

end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the t_100k.n1 column, called get_column stats(), then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call “explain plan” for the original query.

You’ll notice I’ve given each plan a separate statement_id that corresponds to the num_distinct that generated the plan. In the code above I’ve changed the num_distinct from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to dbms_xplan.display(), I’m going to be selective about which rows and columns I report.

select
        statement_id, 
        io_cost,
        io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
        cpu_cost,
        cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
        cost
from 
        plan_table
where 
        id = 0
order by 
        to_number(statement_id)
;

I’ve picked id = 0 (the top line of the plan) for each statement_id and I’ve reported the cost column, which is made up of the io_cost column plus a scaled down value of the cpu_cost column. I’ve also used the analytic lag() function to calculate how much the io_cost and cpu_cost changed from the previous statement_id. Here are my results from 18c:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000                                17033            1099838920                 17253
2000                                34033      17000 2182897480 1083058560      34470
3000                                51033      17000 3265956040 1083058560      51686
4000                                68033      17000 4349014600 1083058560      68903
5000                                85033      17000 5432073160 1083058560      86119
6000                               102033      17000 6515131720 1083058560     103336
7000                               119033      17000 7598190280 1083058560     120553
8000                               136033      17000 8681248840 1083058560     137769
9000                               153033      17000 9764307400 1083058560     154986
10000                              170033      17000 1.0847E+10 1083058560     172202
11000                              197670      27637 1.2608E+10 1760725019     200191
12000                              338341     140671 2.1570E+10 8962036084     342655
13000                              457370     119029 2.9153E+10 7583261303     463200
14000                              559395     102025 3.5653E+10 6499938259     566525
15000                              647816      88421 4.1287E+10 5633279824     656073
16000                              725185      77369 4.6216E+10 4929119846     734428
17000                              793452      68267 5.0565E+10 4349223394     803565
18000                              854133      60681 5.4431E+10 3865976350     865019
19000                              908427      54294 5.7890E+10 3459031472     920005
20000                              957292      48865 6.1003E+10 3113128324     969492

The first pattern that hits the eye is the constant change of 17,000 in the io_cost in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * num_distinct) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the “scalar subquery caching” mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate m_distcnt as “19010 + i” this is the output from the final query:


-- m_distcnt := 10910 + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911                              185520            1.1834E+10                187887
10912                              185537         17 1.1835E+10    1083059     187904
10913                              185554         17 1.1836E+10    1083058     187921
10914                              185571         17 1.1837E+10    1083059     187938
10915                              185588         17 1.1838E+10    1083058     187956
10916                              185605         17 1.1839E+10    1083059     187973
10917                              185622         17 1.1841E+10    1083059     187990
10918                              185639         17 1.1842E+10    1083058     188007
10919                              185656         17 1.1843E+10    1083059     188025
10920                              185673         17 1.1844E+10    1083058     188042
10921                              185690         17 1.1845E+10    1083059     188059
10922                              185707         17 1.1846E+10    1083058     188076
10923                              185770         63 1.1850E+10    4027171     188140
10924                              185926        156 1.1860E+10    9914184     188298
10925                              186081        155 1.1870E+10    9912370     188455
10926                              186237        156 1.1880E+10    9910555     188613
10927                              186393        156 1.1890E+10    9908741     188770
10928                              186548        155 1.1900E+10    9906928     188928
10929                              186703        155 1.1909E+10    9905114     189085
10930                              186859        156 1.1919E+10    9903302     189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost stop changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start num_distinct off at slightly less than half the number of rows in the table:


 -- m_distcnt := (50000 - 10) + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991                             1514281            9.6488E+10               1533579
49992                             1514288          7 9.6489E+10     473357    1533586
49993                             1514296          8 9.6489E+10     473337    1533594
49994                             1514303          7 9.6490E+10     473319    1533601
49995                             1514311          8 9.6490E+10     473299    1533609
49996                             1514318          7 9.6491E+10     473281    1533616
49997                             1514325          7 9.6491E+10     473262    1533624
49998                             1514333          8 9.6492E+10     473243    1533631
49999                             1514340          7 9.6492E+10     473224    1533639
50000                             1514348          8 9.6493E+10     473205    1533646
50001                             1514348          0 9.6493E+10          0    1533646
50002                             1514348          0 9.6493E+10          0    1533646
50003                             1514348          0 9.6493E+10          0    1533646
50004                             1514348          0 9.6493E+10          0    1533646
50005                             1514348          0 9.6493E+10          0    1533646
50006                             1514348          0 9.6493E+10          0    1533646
50007                             1514348          0 9.6493E+10          0    1533646
50008                             1514348          0 9.6493E+10          0    1533646
50009                             1514348          0 9.6493E+10          0    1533646
50010                             1514348          0 9.6493E+10          0    1533646

The cost just stops changing when num_distinct = half the rows in the table.

Formulae

During the course of these experiments I had been exchanging email messages with Nenad Noveljic via the Oracle-L list-server (full monthly archive here) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

  • “tablescan cost + num_distinct * subquery unit cost” for values of num_distinct up to the cache size;
  • then, for values of num_distinct greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
  • then for values of num_distinct greater than half the number of rows in the table reported the cost associated with num_distinct = half the number of rows in the table.

Hence:

  • for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
  • for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
  • for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the “where” clause. The parameter is “_query_execution_cache_max_size” and has a default value of 131072 (power(2,7)) – so when I found that the initial formula didn’t quite work I made the following observation:

  • 131072 / 10922 = 12.00073
  • 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable m_cost to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the “execute immediate” I changed the “explain plan” line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of num_distinct in the region of 10,922:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  10911 -   183901                 185520            1.1834E+10                187887
  10912 -   184057                 185537         17 1.1835E+10    1083059     187904
  10913 -   184212                 185554         17 1.1836E+10    1083058     187921
  10914 -   184368                 185571         17 1.1837E+10    1083059     187938
  10915 -   184524                 185588         17 1.1838E+10    1083058     187956
  10916 -   184680                 185605         17 1.1839E+10    1083059     187973
  10917 -   184836                 185622         17 1.1841E+10    1083059     187990
  10918 -   184992                 185639         17 1.1842E+10    1083058     188007
  10919 -   185147                 185656         17 1.1843E+10    1083059     188025
  10920 -   185303                 185673         17 1.1844E+10    1083058     188042
  10921 -   185459                 185690         17 1.1845E+10    1083059     188059
  10922 -   185615                 185707         17 1.1846E+10    1083058     188076
  10923 -   185770                 185770         63 1.1850E+10    4027171     188140
  10924 -   185926                 185926        156 1.1860E+10    9914184     188298
  10925 -   186081                 186081        155 1.1870E+10    9912370     188455
  10926 -   186237                 186237        156 1.1880E+10    9910555     188613
  10927 -   186393                 186393        156 1.1890E+10    9908741     188770
  10928 -   186548                 186548        155 1.1900E+10    9906928     188928
  10929 -   186703                 186703        155 1.1909E+10    9905114     189085
  10930 -   186859                 186859        156 1.1919E+10    9903302     189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  49991 -  1514281                1514281            9.6488E+10               1533579
  49992 -  1514288                1514288          7 9.6489E+10     473357    1533586
  49993 -  1514296                1514296          8 9.6489E+10     473337    1533594
  49994 -  1514303                1514303          7 9.6490E+10     473319    1533601
  49995 -  1514311                1514311          8 9.6490E+10     473299    1533609
  49996 -  1514318                1514318          7 9.6491E+10     473281    1533616
  49997 -  1514325                1514325          7 9.6491E+10     473262    1533624
  49998 -  1514333                1514333          8 9.6492E+10     473243    1533631
  49999 -  1514340                1514340          7 9.6492E+10     473224    1533639
  50000 -  1514348                1514348          8 9.6493E+10     473205    1533646
  50001 -  1514355                1514348          0 9.6493E+10          0    1533646
  50002 -  1514363                1514348          0 9.6493E+10          0    1533646
  50003 -  1514370                1514348          0 9.6493E+10          0    1533646
  50004 -  1514377                1514348          0 9.6493E+10          0    1533646
  50005 -  1514385                1514348          0 9.6493E+10          0    1533646
  50006 -  1514392                1514348          0 9.6493E+10          0    1533646
  50007 -  1514400                1514348          0 9.6493E+10          0    1533646
  50008 -  1514407                1514348          0 9.6493E+10          0    1533646
  50009 -  1514415                1514348          0 9.6493E+10          0    1533646
  50010 -  1514422                1514348          0 9.6493E+10          0    1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value X for n1 returns value Y for count(*)”.

To pursue this idea I recreated the data sets using varchar2(10) as the definition of n1 and lpad(rownum,10) as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used set_column_stats() to hack the avg_col_,len of t_100K.n1 to 15 and the break point dropped to 4,096.  Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when n1 was a numeric the avg_col_len was 5, so we have three reference points:

  • Avg_col_len = 5. “Cache unit size” = 12
  • Avg_col_len = 11. Cache unit size = 24 (don’t forget the avg_col_len includes the length byte, so our padded varchar2(10) has a length of 11).
  • Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: “Cache unit size” = (2 x avg_col_len + 2).  Since I hadn’t been changing the t_1k.n1 column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the avg_col_len for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was.  (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is “where does the “spare 2″ come from?” Trying to minimize the change in the code I modified my subquery to select sum(to_number(n1)) rather than count(*), then to avg(to_number(n1)) – remember I had changed n1 to a varchar2(10) that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + 22” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of count(*) is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

  • a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
  • the avg_col_len of the input (correlating) column(s) from the driving table
  • the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once.  As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.

 

April 12, 2019

In-table predicates

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:49 pm BST Apr 12,2019

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

It turned out that I’d already written a blog note with a throwaway comment about the estimates and a general workaround for optimizer problems caused by examples of this kind. The comment I made about the estimate was that the selectivity seems to be the smaller of the selectivities of (using the example above) “amount_paid = :unpeekable_bind” and “amount_invoice = :unpeekable_bind”. I’m fairly sure I’ve made similar comments several times in the past, but after replying to the email I started to wonder whether this would still be true if there were histograms on the columns. So I ran up a little test and here, to start things off, is the code to generate the data I used for testing:


rem
rem     Script:         column_equality_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem     Purpose:

create table t1(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t2(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t3(
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(50)
)
;

execute dbms_random.seed(0)

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t3 (n1, n2, v1)
select
        t1.n1,
        t2.n1,
        rpad(rownum,50)
from
        t1, t2
where
        t1.id = t2.id
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 254 for columns v1 size 1'
        );
end;
/

select
        table_name, column_name, num_distinct, density, histogram,
        low_value, high_value
from
        user_tab_cols
where
        table_name in ('T1','T2','T3')
and     column_name in ('N1','N2')
order by
        table_name, column_name
;


TABLE_NAME      COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM       LOW_VALUE  HIGH_VALUE
--------------- --------------- ------------ ---------- --------------- ---------- ----------
T1              N1                        38     .00005 FREQUENCY       80         C128

T2              N1                        38     .00005 FREQUENCY       80         C126

T3              N1                        38     .00005 FREQUENCY       80         C128
                N2                        38     .00005 FREQUENCY       80         C126


I’ve created two sets of 10,000 rows each of normally distributed data – but taken the absolute values so I’ve only got half the bell curve, and I’ve scaled up by a factor of 10 and truncated. This has given me two similar but slightly different sets of values which happen to cover 38 distinct values each.

I’ve then generated my test set by joining these two tables on the unique (though not declared as such) id column to give a table with the same number of rows and two skewed sets of data. The calls to dbms_stats create histograms on the skewed data sets, and I’ve reported a few significant numbers about the 4 relevant columns.

Looking at the column statistics we have num_distinct = 38 across the board – so my observation from paragraph 2 above would tend to suggest that the optimizer would report 10,000/38 = 263 as the cardinality estimate for the predciate “t3.n1 = t3.n2” (I’m fairly confident that in this case 1/num_distinct will be preferred over using the density from user_tab_cols). But here’s what we get from a call to explain plan:


explain plan for
select
        v1
from
        t3
where
        n1 = n2
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   564 | 32148 |    18   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |   564 | 32148 |    18   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"="N2")

The estimate is 564 – which is a pretty good estimate in this case (the actual result was 552) as the two columns were randomly generated and there’s no correlation between them. Unfortunately this is quite a long way of my assumption of 263, so where did the optimizer get that number from?

Here’s a query (with result set) that you may recognise from an earlier post.


break on report skip 1
compute count of value on report
compute sum of t1_frequency on report
compute sum of t2_frequency on report
compute sum of product on report

column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N1'
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N2'
)
select
        f1.value,
        f1.frequency    t1_frequency,
        f2.frequency    t2_frequency,
        f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
order by
        f1.value
;



     VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ------------ ------------ ------------
         0          777          768      596,736
         1          806          753      606,918
         2          794          779      618,526
         3          808          763      616,504
         4          752          749      563,248
         5          627          729      457,083
         6          623          628      391,244
         7          584          616      359,744
         8          544          597      324,768
         9          512          546      279,552
        10          441          439      193,599
        11          409          342      139,878
        12          345          370      127,650
        13          318          300       95,400
        14          257          282       72,474
        15          244          242       59,048
        16          214          206       44,084
        17          172          193       33,196
        18          161          140       22,540
        19          113          114       12,882
        20          108           93       10,044
        21           95           81        7,695
        22           72           55        3,960
        23           54           56        3,024
        24           43           36        1,548
        25           38           31        1,178
        26           23           18          414
        27           18           23          414
        28            7           14           98
        29            9           13          117
        30           14           11          154
        31            4            2            8
        32            5            3           15
        33            1            3            3
        35            4            1            4
        37            2            2            4
---------- ------------ ------------ ------------
        36
                   9998         9998    5,643,754


I’m querying the histoggram information for the two columns, and where t3.n1 and t3.n2 have a value in common I’ve reported the two frequencies for that value and the product of the frequencies. For convenience I’ve included a count and a couple of sums to show that there isn’t a perfect match in the set of values for the two columns. The most important number at the bottom of the page, though, is the sum of the products of frequencies of common values. Take that value and divide by 10,000 and you get 564.3754 – compare that with the cardinality estimate of the predicate “t3.n1 = t3.n2”, it’s a perfect match (allowing for rounding).

The query against user_tab_histograms is the query I used to calculate the cardinality of a join where there were frequency histograms on the columns at both ends of the join. The optimizer’s estimate for “intra-table” predicates is consistent with its estimate for joins (in the special cases of “no histograms” and “two frequency histograms”, at least). Viewing it from a slightly different angle: the selectivity of the predicate “n1 = n2” can be derived as “the cardinality estimate for joining t3 to itself” divided by “the cardinality of the cartesian join” (the latter being num_rows * num_rows, of course).

Just as a closing demo – lets generate a plan for the appropriate self-join of t3 and check the cardinality estimate:


explain plan for
select
        t3a.v1, t3b.v1
from
        t3 t3a, t3 t3b
where
        t3a.n2 = t3b.n1
;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  5643K|   581M|   138  (83)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  5643K|   581M|   138  (83)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3A"."N2"="T3B"."N1")


As expected the (rounded) join cardinality is reported as 5,643K.

So the selectivity of the single table predicate “n1 = n2” will be (5,643,000 / (10,000 * 10,000) = 0.05643 and the cardinality estimate of the single table query will be 10,000 * 0.05643 = 564.3 QED.

I haven’t tested any other variations of types of histogram, degree of overlap of value ranges, etc. but I suspect that the general principle is probably going to give the selectivity as (or with the appearance of): “estimated cardinality of self-join” / “square of num_rows (allowing for nulls)”.

 

December 20, 2018

Transitive Closure

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 1:19 pm BST Dec 20,2018

This is a follow-up to a note I wrote nearly 12 years ago, looking at the problems of transitive closure (or absence thereof) from the opposite direction. Transitive closure gives the optimizer one way of generating new predicates from the predicates you supply in your where clause (or, in some cases, your constraints); but it’s a mechanism with some limitations. Consider the following pairs of predicates:


    t1.col1 = t2.col2
and t2.col2 = t3.col3

    t1.col1 = t2.col2
and t2.col2 = 'X'

A person can see that the first pair of predicate allows us to infer that “t1.col1 = t3.col3” and the second pair of predicates allows us to infer that “t1.col1 = ‘X'”. The optimizer is coded only to recognize the second inference. This has an important side effect that can have a dramatic impact on performance in a way that’s far more likely to appear if your SQL is generated by code. Consider this sample data set (reproduced from the 2006 article):

rem
rem     Script:         transitive_loop.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1 
as
select
        mod(rownum,100) col1,
        rpad('x',200)   v1
from
        all_objects
where   
        rownum <= 2000
;

create table t2
as
select
        mod(rownum,100) col2,
        rpad('x',200)   v2
from
        all_objects
where   
        rownum <= 2000
;

create table t3
as
select
        mod(rownum,100) col3,
        rpad('x',200)   v3
from
        all_objects
where   
        rownum <= 2000
;

-- gather stats if necessary

set autotrace traceonly explain

prompt  =========================
prompt  Baseline - two hash joins
prompt  =========================

select 
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

prompt  ================================================
prompt  Force mismatch between predicates and join order
prompt  ================================================

select 
        /*+
                leading(t1 t3 t2)
        */
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

The first query simply joins the tables in the from clause order on a column we know will have 20 rows for each distinct value, so the result sets will grow from 2,000 rows to 40,000 rows to 800,000 rows. Looking at the second query we would like to think that when we force Oracle to use the join order t1 -> t3 -> t2 it would be able to use the existing predicates to generate the predicate “t3.col3 = t1.col1” and therefore be able to do the same amount of work as the first query (and, perhaps, manage to produce the same final cardinality estimate).

Here are the two plans, taken from an instance of 12.2.0.1:


=========================
Baseline - two hash joins
=========================

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   800K|   466M|    48  (38)| 00:00:01 |
|*  1 |  HASH JOIN          |      |   800K|   466M|    48  (38)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   398K|    10   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    21   (5)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   398K|    10   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2")
   3 - access("T2"."COL2"="T1"."COL1")

================================================
Force mismatch between predicates and join order
================================================

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   800K|   466M| 16926   (3)| 00:00:01 |
|*  1 |  HASH JOIN            |      |   800K|   466M| 16926   (3)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | T2   |  2000 |   398K|    10   (0)| 00:00:01 |
|   3 |   MERGE JOIN CARTESIAN|      |  4000K|  1556M| 16835   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    BUFFER SORT        |      |  2000 |   398K| 16825   (2)| 00:00:01 |
|   6 |     TABLE ACCESS FULL | T3   |  2000 |   398K|     8   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."COL2"="T1"."COL1" AND "T3"."COL3"="T2"."COL2")

As you can see, there’s a dramatic difference between the two plans, and a huge difference in cost (though the predicted time for both is still no more than 1 second).

The first plan, where we leave Oracle to choose the join order, builds an in-memory hash table from t3, then joins t1 to t2 with a hash table and uses the result to join to t3 by probing the in-memory hash table.

The second plan, where we force Oracle to use a join order that (I am pretending) we believe to be a better join order results in Oracle doing a Cartesian merge join between t1 and t3 that explodes the intermediate result set up to 4 million rows (and the optimizer’s estimate is correct) before eliminating a huge amount of redundant data.

As far as performance is concerned, the first query took 0.81 seconds to generate its result set, the second query took 8.81 seconds. In both cases CPU time was close to 100% of the total time.

As a follow-up demo I added the extra predicate “t3.col3 = t1.col1” to the second query, allowing the optimizer to use a hash join with the join order t1 -> t3 -> t2, and this brought the run time back down (with a slight increase due to the extra predicate check on the second join).

Summary

The choice of columns in join predicates may stop Oracle from choosing the best join order because it is not able to use transitive closure to generate all the extra predicates that the human eye can see. If you are using programs to generate SQL rather than writing SQL by hand you are more likely to see this limitation resulting in some execution plans being less efficient than they could be.

 

 

 

 

December 18, 2018

NULL predicate

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 1:13 pm BST Dec 18,2018

People ask me from time to time if I’m going to write another book on the Cost Based Optimizer – and I think the answer has to be no because the product keeps growing so fast it’s not possible to keep up and because there are always more and more little details that might have been around for years and finally show up when someone asks me a question about some little oddity I’ve never noticed before.

The difficult with the “little oddities” is the amount of time you could spend trying to work out whether or not they matter and if it’s worth writing about them. Here’s a little example to show what I mean – first the data set:


rem
rem     Script:         null_filter.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem

create table t1
nologging
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
commit;

create index t1_i1 on t1(object_type, data_object_id, object_id, created);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/

It’s a simple data set with a single index. The only significant thing about the index is that the second column (data_object_id) is frequently null. This leads to a little quirk in the execution plans for a very similar pair of statements:


set serveroutput off
alter session set statistics_level = all;

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id = 20002
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id is null
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

How much difference would you expect in the execution plans for these two queries? There is, of course, the side effect of the “is null” predicate disabling the “implicit column group” that is the index distinct_keys value, but in this case I’ve got a range-based predicate on one of the columns so Oracle won’t be using the distinct_keys anyway.

Of course there’s the point that you can’t use the equality operator with null, you have to use “is null” – and that might make a difference, but how? Here are the two execution plan:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID"=20002 AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID" IS NULL AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))
       filter(("OBJECT_ID"=20002 AND "CREATED">TRUNC(SYSDATE@!-90)))

The query with the predicate “data_object_id is null” repeats the object_id and sysdate predicates as access predicates and filter predicates. This seems a little surprising and a potential performance threat. In the first query the run_time engine will hit the correct index leaf block in exactly the right place very efficiently and then walk along it supplying every rowid to the parent operator until it hits the end of the range.

With the “is null” plan the run-time engine will be checking the actual value of object_id and created for every index entry on the way – how much extra CPU will this use and, more importantly, might Oracle start with the first index entry where object_type = ‘TABLE’ and data_object_id is null and walk through every index entry that has that null checking for the correct object_id as it goes ?

That last question is the reason for running the query with rowsource execution stats enabled. The first query did a single physical read while the second didn’t have to, but the more important detail is that both queries did the same number of buffer gets – and there is, by the way, a set of eight rows where the object_id and data_object_id are  20,002, but they were created several years ago so the index range scan returns no rows in both cases.

Based on that comparison, how do we show that Oracle has not walked all the way from the first index entry where object_type = ‘TABLE’ and data_object_id is null checking every entry on the way or, to put it another way, has Oracle really managed to prune down the index range scan to the minimum “wedge” indicated by the presence of the predicates “OBJECT_ID”=20002 AND “CREATED”>TRUNC(SYSDATE@!-90) as access predicates?

Let’s just count the number of leaf blocks that might be relevant, using the sys_op_lbid() function (last seen here) that Oracle uses internally to count the number of leaf blocks in an index. First we get the index object_id, then we scan it to see how many leaf blocks hold entries that match our object_type and data_object_id predicates but appear in the index before our target value of 20,002:


column object_id new_value m_index_id

select
        object_id
from
        user_objects
where
        object_type = 'INDEX'
and     object_name = 'T1_I1'
;

select  distinct sys_op_lbid(&m_index_id, 'L', rowid)
from    t1
where   object_type    = 'TABLE'
and     data_object_id is null
and     object_id      < 20002
;


SYS_OP_LBID(159271
------------------
AAAm4nAAFAAACGDAAA
AAAm4nAAFAAACF9AAA
AAAm4nAAFAAACGCAAA
AAAm4nAAFAAACF/AAA
AAAm4nAAFAAACF+AAA
AAAm4nAAFAAACGFAAA
AAAm4nAAFAAACGEAAA
AAAm4nAAFAAACGGAAA

8 rows selected.


This tells us that there are 8 leaf blocks in the index that we would have to range through before we found object_id 20,002 and we would have seen 8 buffer gets, not 3 in the rowsource execution stats, if Oracle had not actually been clever with its access predicates and narrowed down the wedge of the index it was probing.

Bottom line: for a multi-column index there seems to be a difference in execution plans between “column is null” and “column = constant” when the column is one of the earlier columns in the index – but even though the “is null” option results in some access predicates re-appearing as filter predicates in the index range scan the extra workload is probably not significant – Oracle still uses the minimum number of index leaf blocks in the index range scan.

Update (May 2019)

Randolf Geist has written a blog note that extends this pattern to something that introduces a much bigger threat of a performance issue even though the SQL is only marginally more complex than the example above.

 

December 14, 2018

Extreme Nulls

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 7:01 pm BST Dec 14,2018

This note is a variant of a note that I wrote a few months ago about the impact of nulls on column groups. The effect showed up recently on a client site with a little camouflage that confused the issue for a little while, so I thought it would be worth a repeat.  We’ll start with a script to generate some test data:

rem
rem     Script:         pt_hash_cbo_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.1.0.1
rem             12.1.0.2
rem

create table t1 (
        hash_col,
        rare_col,
        n1,
        padding
)
nologging
partition by hash (hash_col)
partitions 32
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        mod(rownum,128),
        case when mod(rownum,1021) = 0 
                then rownum + trunc(dbms_random.value(-256, 256))
        end case,
        rownum,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1048576 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(hash_col, rare_col) nologging
local compress 1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                granularity => 'ALL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve got a hash-partitioned table with 32 partitions; the partitioning key is called hash_col, and there is another column called rare_col that is almost alway null – roughly 1 row in every 1,000 holds a value. I’ve added a local index on (hash_col, rare_col) compressing the leading column since hash_col is very repetitive, and gathered stats on the partitions and table. Here’s a view of the data for a single value of hash_col, and a summary report of the whole data set:

select  
        hash_col, rare_col, count(*)
from
        t1
where
        hash_col = 63
group by
        hash_col, rare_col
order by
        hash_col, rare_col
;

  HASH_COL   RARE_COL   COUNT(*)
---------- ---------- ----------
        63     109217          1
        63     240051          1
        63     370542          1
        63     501488          1
        63     631861          1
        63     762876          1
        63     893249          1
        63    1023869          1
        63                  8184

9 rows selected.

select
        count(*), ct
from    (
        select
                hash_col, rare_col, count(*) ct
        from
                t1
        group by
                hash_col, rare_col
        order by
                hash_col, rare_col
        )
group by ct
order by count(*)
;

  COUNT(*)         CT
---------- ----------
         3       8183
       125       8184
      1027          1

Given the way I’ve generated the data any one value for hash_col will have there are 8,184 (or 8,183) rows where the rare_col is null; but there are 1027 rows which have a value for both hash_col and rare_col with just one row for each combination.

Now we get to the problem. Whenever rare_col is non null the combination of hash_col and rare_col is unique (though this wasn’t quite the case at the client site) so when we query for a given hash_col and rare_col we would hope that the optimizer would be able to estimate a cardinality of one row; but this is what we see:


variable n1 number
variable n2 number

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   908 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

The optimizer has predicted a massive 908 rows. A quick check of the object stats shows us that this is “number of rows in table” / “number of distinct keys in index” (1,048,576 / 1,155, rounded up).

Any row with rare_col set to null cannot match the predicate “rare_col = :n2”, but because the optimizer is looking at the statistics of complete index entries (and there are 1048576 of them, with 1155 distinct combinations, and none that are completely null) it has lost sight of the frequency of nulls for rare_col on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this topic).

I’ve often said in the past that you shouldn’t create histograms on data unless your code is going to use them. In this case I need to stop the optimizer from looking at the index.distinct_keys and one way to do that is to create a histogram on one of the columns that defines the index; and I’ve chosen to do this with a fairly arbitrary size of 10 buckets:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns rare_col size 10')

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |     1 |    12 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Bonus observation

This problem came to my attention (and I’ve used a partitioned table in my demonstration) because I had noticed an obvious optimizer error in the client’s execution plan for exactly this simple a query. I can demonstrate the effect the client saw by running the test again without creating the histogram but declaring hash_col to be not null. Immediately after creating the index I’m going to add the line:


alter table t1 modify hash_col not null;

(The client’s system didn’t declare the column not null, but their equivalent of hash_col was part of the primary key of the table which meant it was implicitly declared not null). Here’s what my execution plan looked like with this constraint in place:


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |    28 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Spot the difference.

The estimate of index rowids is far smaller than the estimate of the rows that will be fetched using those rowids. This is clearly an error.

If you’re wondering how Oracle got this number divide 908 by 32 (the number of partitions in the table) – the answer is 28.375.

Fortunately it’s (probably) an error that doesn’t matter despite looking worryingly wrong. Critically the division hasn’t changed the estimate of the number of table rows (we’ll ignore the fact that the estimate is wrong anyway thanks to a different error), and the cost of the index range scan and table access have not changed. The error is purely cosmetic in effect.

Interestingly if you modify the query to be index-only (i.e. you restrict the select list to columns in the index) this extra division disappears.

Summary

1) If you have a B-tree index where one (or more) of the columns is null for a large fraction of the entries then the optimizer may over-estimate the cardinality of a predicate of the form: “(list of all index columns) = (list of values)” as it will be using the index.distinct_keys in its calculations and ignore the effects of nulls in the individual columns. If you need to work around this issue then creating a histogram on one of the index columns will be sufficient to switch Oracle back to the strategy of multiplying the individual column selectivities.

2) There are cases of plans for accessing partitioned tables where Oracle starts by using table-level statistics to get a suitable set of estimates but then displays a plan with the estimate of rows for an index range scan scaled down by the number of partitions in the table. This results in a visible inconsistency between the index estimate and the table estimate, but it doesn’t affect the cardinality estimate for the table access or either of the associated costs – so it probably doesn’t have a destabilising effect on the plan.

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm BST Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

November 8, 2018

Where / Having

Filed under: CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:11 pm BST Nov 8,2018

There’s a very old mantra about the use of the “having” clause that tells us that if it’s valid (i.e. will always give the same results) then any predicate that could be moved from the having clause to the where clause should be moved. In recent versions of Oracle the optimizer will do this for itself in some cases but (for reasons that I’m not going to mention) I came across a silly example recently where a little manual editing produced a massive performance improvement.

Here’s a quick demo:


rem
rem     Script:         where_having.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

reate table t1
as
select * 
from all_objects 
where rownum <= 50000   -- > comment to avoid WordPress format issue
;

spool where_having.lst

set serveroutput off

select /*+ gather_plan_statistics */ 
        object_type, count(*) 
from    t1 
group by 
        object_type 
having  count(*) > 0 
and     1 = 2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
;

The big question is: will Oracle do a full tablescan of t1, or will it apply a “null is not null” filter early to bypass that part of the plan. Here’s the plan pulled from memory, with run-time statistics (all versions from 11g to 18c):


--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|*  1 |  FILTER             |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|   2 |   HASH GROUP BY     |      |      1 |      1 |     27 |00:00:00.02 |     957 |    955 |  1186K|  1186K| 1397K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.01 |     957 |    955 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((COUNT(*)>0 AND 1=2))


As you can see, the filter at operation 1 includes the contradiction “1=2”, but Oracle tests this only after doing the full tablescan and aggregation. If you move the “1=2” into the where clause the tablescan doesn’t happen.

Interestingly, if you write the query with an in-line view and trailing where clause:


select /*+ gather_plan_statistics */
        *
from    (
        select
                object_type, count(*)
        from    t1
        group by
                object_type
        having  count(*) > 0
        )
where
        1 = 2
;

The optimizer is clever enough to push the final predicate inside the view (where you might expect it to become part of the having clause) and push it all the way down into a where clause on the base table.


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      0 |00:00:00.01 |
|*  1 |  FILTER              |      |      1 |        |      0 |00:00:00.01 |
|   2 |   HASH GROUP BY      |      |      1 |      1 |      0 |00:00:00.01 |
|*  3 |    FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   4 |     TABLE ACCESS FULL| T1   |      0 |  50000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)>0)
   3 - filter(NULL IS NOT NULL)



A quirky case of the optimizer handling the (apparently) more complex query than it does the simpler query.

November 1, 2018

Join Cardinality – 5

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:34 pm BST Nov 1,2018

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

It’s now time to look at a join where the other end has a height-balanced histogram. Arguably it’s not sensible to spend time writing about this since you shouldn’t be creating them in 12c (depending, instead, on the hybrid histogram that goes with the auto_sample_size), and the arithmetic is different in 11g. However, there still seem to be plenty of people running 12c but not using the auto_sample_size and that means they could be generating some height-balanced histograms – so let’s generate some data and see what happens.


rem
rem     Script:         freq_hist_join_04a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4        Different results
rem

drop table t2 purge;
drop table t1 purge;

set linesize 156
set trimspool on
set pagesize 60

set feedback off

execute dbms_random.seed(0)

create table t1(
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))   j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

prompt  ==========================================================
prompt  Using estimate_percent => 100 to get height-balanced in t2
prompt  ==========================================================

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                estimate_percent => 100,
                method_opt       => 'for all columns size 1 for columns j2 size 20'
        );
end;
/

As in earlier examples I’ve created some empty tables, then inserted randomly generated data (after calling the dbms_random.seed(0) function to make the data reproducible). Then I’ve gathered stats, knowing that there will be 22 distinct values in t2 so forcing a height-balanced histogram of 20 buckets to appear.

When we try to calculate the join cardinality we’re going to need various details from the histogram information, such as bucket sizes, number of distinct values, and so on, so in the next few queries to display the histogram information I’ve captured a few values into SQL*Plus variables. Here’s the basic information about the histograms on the join columns t1.j1 and t2.j2:


column num_distinct new_value m_t2_distinct
column num_rows     new_value m_t2_rows
column num_buckets  new_value m_t2_buckets
column bucket_size  new_value m_t2_bucket_size

select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by
        table_name
;

select  table_name, num_rows, decode(table_name, 'T2', num_rows/&m_t2_buckets, null) bucket_size
from    user_tables
where   table_name in ('T1','T2')
order by
        table_name
;

column table_name format a3 heading "Tab"
break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


Tab                  COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   HEIGHT BALANCED           22          20 .052652266

Tab                    NUM_ROWS BUCKET_SIZE
-------------------- ---------- -----------
T1                          100
T2                          800          40

Tab      VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
--- ---------- ------------------- ---------------
T1           2                   5               5
             5                  15              20
             7                  15              35
            10                  17              52
            12                  13              65
            15                  13              78
            17                  11              89
            20                   7              96
            22                   3              99
            25                   1             100

T2           1                   0               0
            14                   1               1
            17                   1               2
            18                   1               3
            19                   1               4
            20                   1               5
            21                   2               7
            22                   1               8
            23                   1               9
            24                   2              11
            25                   2              13
            26                   3              16
            27                   2              18
            28                   2              20

As you can see, there is a frequency histogram on t1 reporting a cumulative total of 100 rows; and the histogram on t2 is a height-balanced histogram of 20 buckets, showing 21, 24, 25, 26, 27 and 28 as popular values with 2,2,2,2,3 and 2 endpoints (i.e. buckets) respectively. You’ll also note that the t2 histogram has 21 rows with row/bucket 0 showing us the minimum value in the column and letting us know that bucket 1 is not exclusively full of the value 14. (If 14 had been the minimum value for the column as well as an end point Oracle would not have created a bucket 0 – that may be a little detail that isn’t well-known – and will be the subject of a little follow-up blog note.)

Let’s modify the code to join the two sets of hisogram data on data value – using a full outer join so we don’t lose any data but restricting ourselves to values where the histograms overlap. We’re going to follow the idea we’ve developed in earlier postings and multiply frequencies together to derive a join frequency, so we’ll start with a simple full outer join and assume that when we find a real match value we should behave as if the height-balanced buckets (t2) where the bucket count is 2 or greater represent completely full buckets and are popular values..

I’ve also included in this query (because it had a convenient full outer join) a column selection that counts how many rows there are in t1 with values that fall inside the range of the t2 histogram but don’t match a popular value in t2.


column unmatch_ct   new_value m_unmatch_ct
column product format 999,999.99

break on report skip 1
compute sum of product on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, 
        f2.value t2_value, 
        f1.frequency t1_frequency, 
        f2.frequency t2_frequency, 
        sum(
                case
                        when f2.frequency > 1 and f1.frequency is not null
                                then 0
                                else f1.frequency
                end
        ) over()        unmatch_ct,
        f2.frequency * &m_t2_bucket_size *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
        end     product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
--      coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
;

  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY UNMATCH_CT     PRODUCT
---------- ---------- ------------ ------------ ---------- -----------
	 2			 5			99
	 5			15			99
	 7			15			99
	10			17			99
	12			13			99
		   14			      1 	99
	15			13			99
	17	   17		11	      1 	99
		   18			      1 	99
		   19			      1 	99
	20	   20		 7	      1 	99
		   21			      2 	99
	22	   22		 3	      1 	99
		   23			      1 	99
		   24			      2 	99
	25	   25		 1	      2 	99	 80.00
							   -----------
sum								 80.00


We captured the bucket size (&m_bucket_size) for the t2 histogram as 40 in the earlier SQL, and we can see now that in the overlap range (which I’ve hard coded as 2 – 25) we have three buckets that identify popular values, but only one of them corresponds to a value in the frequency histogram on t1, so the Product column shows a value of 1 * 2 * 40 = 80. Unfortunately this is a long way off the prediction that the optimizer is going to make for the simple join. (Eventually we’ll see it’s 1,893 so we have a lot more rows to estimate for).

Our code so far only acounts for items that are popular in both tables. Previous experience tells us that when a popular value exists only at one end of the join predicate we need to derive a contribution to the total prediction through an “average selectivity” calculated for the other end of the join predicate. For frequency histograms we’ve seen that “half the number of the least frequently occuring value” seems to be the appropriate frequency estimate, and if we add that in we’ll get two more contributions to the total from the values 21 and 24 which appear in the height-balanced (t2) histogram as popular but don’t appear in the frequency (t1) histogram. Since the lowest frequency in t1 is 1 this would give us two contributions of 0.5 * 2 (buckets) * 40 (bucket size) viz: two contributions of 40 bringing our total to 160 – still a serious shortfall from Oracle’s prediction. So we need to work out how Oracle generates an “average frequency” for the non-popular values of t2 and then apply it to the 99 rows in t1 that haven’t yet been accounted for in the output above.

To calculate the “average selectivity” of a non-popular row in t2 I need a few numbers (some of which I’ve already acquired above). The total number of rows in the table (NR), the number of distinct values (NDV), and the number of popular values (NPV), from which we can derive the the number of distinct non-popular values and the number of rows for the non-popular values. The model that Oracle uses to derive these numbers is simply to assume that a value is popular if its frequency in the histogram is greater than one and the number of rows for that value is “frequency * bucket size”.

The first query we ran against the t2 histogram showed 6 popular values, accounting for 13 buckets of 40 rows each. We reported 22 distinct values for the column and 800 rows for the table so the optimizer assumes the non-popular values account for (22 – 6) = 16 distinct values and (800 – 13 * 40) = 280 rows. So the selectivity of non-popular values is (280/800) * (1/16) = 0.021875. This needs to be multiplied by the 99 rows in t1 which don’t match a popular value in t2 – so we now need to write some SQL to derive that number.

We could enhance our earlier full outer join and slot 0.5, 99, and 0.021875 into it as “magic” constants. Rather than do that though I’m going to write a couple of messy queries to derive the values (and the low/high range we’re interested in) so that I will be able to tweak the data later on and see if the formula still produces the right answer.


column range_low    new_value m_low
column range_high   new_value m_high
column avg_t1_freq  new_value m_avg_t1_freq
column new_density  new_value m_avg_t2_dens

with f1 as (
        select  endpoint_value ep_val,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T1'
        and     column_name = 'J1'
),
f2 as (
        select  endpoint_value ep_val,
                endpoint_number ep_num,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T2'
        and     column_name = 'J2'
)
select
        max(min_v) range_low, min(max_v) range_high, min(min_f)/2 avg_t1_freq, max(new_density) new_density
from    (
        select  min(ep_val) min_v, max(ep_val) max_v, min(frequency) min_f, to_number(null) new_density
        from f1
        union all
        select  min(ep_val) min_v, max(ep_val) max_v, null           min_f,
                (max(ep_num) - sum(case when frequency > 1 then frequency end)) /
                (
                        max(ep_num) *
                        (&m_t2_distinct - count(case when frequency > 1 then 1 end))
                )       new_density
        from    f2
        )
;

 RANGE_LOW RANGE_HIGH AVG_T1_FREQ NEW_DENSITY
---------- ---------- ----------- -----------
         2         25          .5     .021875


This query finds the overlap by querying the two histograms and reporting the lower high value and higher low value. It also reports the minimum frequency from the frequency histogram and divides by 2, and calculates the number of non-popular rows divided by the total number of rows and the number of distinct non-popular values. (Note that I’ve picked up the number of distinct values in t2.j2 as a substituion variable generated by one of my earlier queries.) In my full script this messy piece of code runs before the query that showed I showed earlier on that told us how well (or badly) the two histograms matched.

Finally we can use the various values we’ve picked up in a slightly more complex version of the full outer join – with a special row added through a union all to give us our the estimate:


break on report skip 1
compute sum of product on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, f2.value t2_value,
        f1.frequency t1_frequency, f2.frequency t2_frequency,
        f2.frequency *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
        &m_t2_bucket_size        product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
union all
select
        null,
        &m_avg_t2_dens,
        &m_unmatch_ct,
        &m_t2_rows * &m_avg_t2_dens,
        &m_t2_rows * &m_avg_t2_dens * &m_unmatch_ct
from
        dual
;


  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY     PRODUCT
---------- ---------- ------------ ------------ -----------
         2                       5
         5                      15
         7                      15
        10                      17
        12                      13
                   14                         1
        15                      13
        17         17           11            1
                   18                         1
                   19                         1
        20         20            7            1
                   21                         2       40.00
        22         22            3            1
                   23                         1
                   24                         2       40.00
        25         25            1            2       80.00
              .021875           99         17.5    1,732.50
                                                -----------
sum                                                1,892.50


It remains only to check what the optimizer thinks the cardinality will be on a simple join, and then modify the data slightly to see if the string of queries continues to produce the right result. Here’s a starting test:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


 COUNT(*)
----------
      1327


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1893 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."J1"="T2"."J2")

The E-rows for the hash join operation reports 1893 – and a quick check of the 10053 trace file shows that this is 1892.500000 rounded – a perfect match for the result from my query. I’ve modified the data in various ways (notably updating the t1 table to change the value 25 (i.e. the current maximum value of j1) to other, lower, values) and the algorithm in the script seems to be sound – for 12c and 18c. I won’t be surprised, however, if someone comes up with a data pattern where the wrong estimate appears.

Don’t look back

Upgrades are a pain. With the same data set and same statistics on 11.2.0.4, running the same join query between t1 and t2, here’s the execution plan I got:


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Notice that the E-rows value is different. The join cardinality algorithm seems to have changed in the upgrade from 11.2.0.4 to 12c. I haven’t quite figured out how to get to the 11g result, but I seem to get quite close most of the time by making a simple change to the final query that I used to predict the optimizer’s estimate. In the case expression that chooses between the actual t1.j1 frequency and the “average frequency” don’t choose, just use the latter:


        case
                when f2.frequency > 1 and f1.frequency is not null
                        -- then f1.frequency    -- 12c
                        then &m_avg_t1_freq     -- 11g
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
 

As I modified the t1 row with the value 25 to hold other values this change kept producing results that were exactly 2, 2.5, or 3.0 different from the execution plan E-Rows – except in one case where the error was exactly 15.5 (which looks suspiciously like 17.5: the “average frequency in t2” minus 2). I’m not keen to spend time trying to work out exactly what’s going on but the takeaway from this change is that anyone upgrading from 11g to 12c may find that some of their queries change plans because they happen to match the type of example I’ve been working with in this post.

In some email I exchanged with Chinar Aliyev, he suggested three fix-controls that might be relevant. I’ve added these to an earlier posting I did when I first hit the anomaly a few days ago but I’ll repeat them here. I will be testing their effects at some point in the not too distant future:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)

October 25, 2018

Join Cardinality – 4

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 9:09 am BST Oct 25,2018

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

I was nearly right but the arithmetic didn’t quite work out the way I expected.  Fortunately Chinar Aliyev’s document highlighted my error – the optimizer doesn’t use all the repeat counts, it uses only those repeat counts that identify popular values, and a popular value is one where the endpoint_repeat_count is not less than the average number of rows in a bucket. Let’s work through an example – first the data (which repeats an earlier article, but is included here for ease of reference):

rem
rem     Script:         freq_hist_join_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

set linesize 156
set pagesize 60
set trimspool on

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

As before I’ve got a table with 100 rows using the sqrt() function to generate column j1, and a table with 800 rows using the dbms_random.normal function to generate column j2. So the two columns have skewed patterns of data distribution, with a small number of low values and larger numbers of higher values – but the two patterns are different.

I’ve generated a histogram with 254 buckets (which dropped to 10) for the t1.j1 column, and generated a histogram with 13 buckets for the t2.j2 column as I knew (after a little trial and error) that this would give me a hybrid histogram.

Here’s a simple query, with its result set, to report the two histograms – using a full outer join to line up matching values and show the gaps where (endpoint) values in one histogram do not appear in the other:


define m_popular = 62

break on report skip 1

compute sum of product on report
compute sum of product_rp on report

compute sum of t1_count on report
compute sum of t2_count on report
compute sum of t2_repeats on report
compute sum of t2_pop_count on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null)
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value t1_value,
        f2.value t2_value,
        f1.row_or_bucket_count t1_count,
        f2.row_or_bucket_count t2_count,
        f1.endpoint_repeat_count t1_repeats,
        f2.endpoint_repeat_count t2_repeats,
        f2.pop_count t2_pop_count
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT T1_REPEATS T2_REPEATS T2_POP_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ------------
                    1                     1                     1
         2                     5                     0
         5                    15                     0
         7                    15                     0
        10                    17                     0
        12                    13                     0
        15         15         13         55          0         11
        17         17         11         56          0         34
                   19                    67                    36
        20         20          7         57          0         57
                   21                    44                    44
        22         22          3         45          0         45
                   23                    72                    72           72
                   24                    70                    70           70
        25         25          1         87          0         87           87
                   26                   109                   109          109
                   27                    96                    96           96
                   28                    41                    41
---------- ---------- ---------- ----------            ---------- ------------
                             100        800                   703          434

You’ll notice that there’s a substitution variable (m_popular) in this script that I use to identify the “popular values” in the hybrid histogram so that I can report them separately. I’ve set this value to 62 for this example because a quick check of user_tables and user_tab_cols tells me I have 800 rows in the table (user_tables.num_rows) and 13 buckets (user_tab_cols.num_buckets) in the histogram: 800/13 = 61.52. A value is popular only if its repeat count is 62 or more.

This is where you may hit a problem – I certainly did when I switched from testing 18c to testing 12c (which I just knew was going to work – but I tested anyway). Although my data has been engineered so that I get the same “random” data in both versions of Oracle, I got different hybrid histograms (hence my complaint in a recent post.) The rest of this covers 18c in detail, but if you’re running 12c there are a couple of defined values that you can change to get the right results in 12c.

At this point I need to “top and tail” the output because the arithmetic only applies where the histograms overlap, so I need to pick the range from 2 to 25. Then I need to inject a “representative” or “average” count/frequency in all the gaps, then cross-multiply. The average frequency for the frequency histogram is “half the frequency of the least frequently occurring value” (which seems to be identical to new_density * num_rows), and the representative frequency for the hybrid histogram is (“number of non-popular rows” / “number of non-popular values”). There are 800 rows in the table with 22 distinct values in the column, and the output above shows us that we have 5 popular values totally 434 rows, so the average frequency is (800 – 434) / (22 – 5) = 21.5294. (Alternatively we could say that the average selectivities (which is what I’ve used in the next query) are 0.5/100 and 21.5294/800.)

[Note for 12c, you’ll get 4 popular values covering 338 rows, so your figurese will be: (800 – 338) / (22 – 4) = 25.6666… and 0.0302833]

So here’s a query that restricts the output to the rows we want from the histograms, discards a couple of columns, and does the arithmetic:


define m_t2_sel = 0.0302833
define m_t2_sel = 0.0269118
define m_t1_sel = 0.005

break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null) pop_count
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,100 * &m_t1_sel) t1_count,
        nvl(f2.pop_count,          800 * &m_t2_sel) t2_count,
        case when (   f1.row_or_bucket_count is not null
                   or f2.pop_count is not null
        )    then
                nvl(f1.row_or_bucket_count,100 * &m_t1_sel) *
                nvl(f2.pop_count,          800 * &m_t2_sel)
        end      product_rp
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;


 F1_VALUE   F2_VALUE   T1_COUNT   T2_COUNT PRODUCT_RP
---------- ---------- ---------- ---------- ----------
         2                     5   21.52944   107.6472
         5                    15   21.52944   322.9416
         7                    15   21.52944   322.9416
        10                    17   21.52944  366.00048
        12                    13   21.52944  279.88272
        15         15         13   21.52944  279.88272
        17         17         11   21.52944  236.82384
                   19         .5   21.52944
        20         20          7   21.52944  150.70608
                   21         .5   21.52944
        22         22          3   21.52944   64.58832
                   23         .5         72         36
                   24         .5         70         35
        25         25          1         87         87
                      ---------- ---------- ----------
sum                          102  465.82384 2289.41456

There’s an important detail that I haven’t mentioned so far. In the output above you can see that some rows show “product_rp” as blank. While we cross multiply the frequencies from t1.j1 and t2.j2, filling in average frequencies where necessary, we exclude from the final result any rows where average frequencies have been used for both histograms.

[Note for 12c, you’ll get the result 2698.99736 for the query, and 2699 for the execution plan]

Of course we now have to check that the predicted cardinality for a simple join between these two tables really is 2,289. So let’s run a suitable query and see what the optimizer predicts:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

SQL_ID  cf4r52yj2hyd2, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     108 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     108 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   2289 |   1327 |00:00:00.01 |     108 |  2546K|  2546K| 1194K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |      18 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |      34 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

As you can see, the E-Rows for the join is 2,289, as required.

I can’t claim that the model I’ve produced is definitely what Oracle does, but it looks fairly promising. No doubt, though, there are some variations on the theme that I haven’t considered – even when sticking to a simple (non-partitioned) join on equality on a single column.

Next Page »

Powered by WordPress.com.