Oracle Scratchpad

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

April 10, 2020

raw timestamp

Filed under: Histograms,Oracle,Troubleshooting — Jonathan Lewis @ 6:59 pm BST Apr 10,2020

Many years ago I wrote a note with some comments about conveting the low_value and high_value columns of user_tab_columns to see what values they actually held. At the time the dbms_stats package held a few procedures to convert raw values and it was necessary to wrap these procedures in user-defined functions if you wanted a function to do the job.

In recent versions of Oracle (18c onwards) the package has finally acquired several functions to do a proper job, these are:

  • convert_raw_to_bin_double()
  • convert_raw_to_bin_float()
  • convert_raw_to_date()
  • convert_raw_to_number()
  • convert_raw_to_nvarchar()
  • convert_raw_to_rowid()
  • convert_raw_to_varchar()

You might note that these functions do not appear in the PL/SQL Packages and Types Reference, but they are documented in the script $ORACLE_HOME/rdbms/admin/dbmsstat.sql that creates the package:


-- Convert the internal representation of a minimum or maximum value
-- into a datatype-specific value. The minval and maxval fields
-- of the StatRec structure as filled in by get_column_stats or
-- prepare_column_values are appropriate values for input.

One thing you’ll notice about the list is that there’s no convert_raw_to_timestamp(), and a question came up recently on the oracle-l list server asking how to do this. This note answers that question and, in passing, demonstrates the typical use of the other functions.

As ever we start by creating some data:


rem
rem     Script:         raw_to_timestamp.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1(
        t0 timestamp(0),
        t3 timestamp(3),
        t6 timestamp(6),
        t9 timestamp(9),
        ts timestamp
);

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.123456789','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.987654321','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 2'
        );
end;
/

I’ve creaed a table to hold timestamps constrained to different levels of precision. The maximum precision allowed is 9 decimal places for the seconds, the default (column ts) is 6. I’ve then created two rows in the table using slightly different timestamps for the rows but giving all the columns in a single row the same value. Then I’ve gathered stats – including a histogram – on the table and all its columns.

I can now query user_tab_cols to pick up the low ahd high values:


select
        column_name, low_value, high_value
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  HIGH_VALUE
-------------------- -------------------------- --------------------------
T0                   7878040116100B             7878040116100C
T3                   7878040116100B0754D4C0     7878040116100B3AE3AF00
T6                   7878040116100B075BCDE8     7878040116100B3ADE6770
T9                   7878040116100B075BCD15     7878040116100B3ADE68B1
TS                   7878040116100B075BCDE8     7878040116100B3ADE6770


And here’s a dump of the columns so that you can see the actual values held in the table in their internal representation.


select
        dump(t0,16), 
        dump(t3,16), 
        dump(t6,16), 
        dump(t9,16), 
        dump(ts,16) 
from 
        t1;


DUMP(T0,16)
-------------------------------------------------------------------------------------
DUMP(T3,16)
-------------------------------------------------------------------------------------
DUMP(T6,16)
-------------------------------------------------------------------------------------
DUMP(T9,16)
-------------------------------------------------------------------------------------
DUMP(TS,16)
-------------------------------------------------------------------------------------
Typ=180 Len=7: 78,78,4,1,16,10,b
Typ=180 Len=11: 78,78,4,1,16,10,b,7,54,d4,c0
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,15
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8

Typ=180 Len=7: 78,78,4,1,16,10,c
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,e3,af,0
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,68,b1
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70

You’ll notice that, despite the attempt to limit the precision, all the columns other than the first use 11 bytes. The first column is different, with zero precision we use only 7 bytes and if you look closely (and know your internal storage formats) you’ll realise that those 7 bytes are using the standard format for stored dates. The last 4 bytes hold the fraction of the second stored as a number of nano-seconds as a simple 32-bit binary number. (Warning: it’s possible that you will see a difference in this set of bytes if your machine uses a different endianness from mine – I can’t test that for myself at present.)

So let’s see what we get if we try to convert the raw values using the call to dbms_stats.convert_raw_to_date(). I’ll start by setting the nls_XXX formats to get the full conntent of a date column or a timestamp column reported from SQL*Plus.


alter session set nls_date_format = 'dd-Mon-yyyy hh24:mi:ss';
alter session set nls_timestamp_format = 'dd-Mon-yyyy hh24:mi:ss.ff9';

set linesize 165

column t0 format a32
column t3 format a32
column t6 format a32
column t9 format a32
column ts format a32

select  t0, t3, t6, t9, ts from t1;


select
        column_name, 
        dbms_stats.convert_raw_to_date(low_value)       date_low,
        dbms_stats.convert_raw_to_date(high_value)      date_high
from 
        user_tab_cols
where
        table_name = 'T1'
order by 
        column_name
/



T0                               T3                               T6                               T9                               TS
-------------------------------- -------------------------------- -------------------------------- -------------------------------- --------------------------------
01-Apr-2020 21:15:10.000000000   01-Apr-2020 21:15:10.123000000   01-Apr-2020 21:15:10.123457000   01-Apr-2020 21:15:10.123456789   01-Apr-2020 21:15:10.123457000
01-Apr-2020 21:15:11.000000000   01-Apr-2020 21:15:10.988000000   01-Apr-2020 21:15:10.987654000   01-Apr-2020 21:15:10.987654321   01-Apr-2020 21:15:10.987654000

2 rows selected.


COLUMN_NAME          DATE_LOW             DATE_HIGH
-------------------- -------------------- --------------------
T0                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:11
T3                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T6                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T9                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
TS                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10

5 rows selected.

Since I’ve inserted only two rows, with each row holding a single value with different precision, we expect those values to be reflected in the low_value and high_value for the columns – and we’re not disappointed (probably) but, as you might have expected, Oracle has processed the timestamp data type into a date data type by simply ignoring the last 4 bytes and dealing with the first 7 bytes as if they were a date. We need to do a little more check that we can convert the low and high values into exactly the values that appear in the table itself.

So we can write a messy bit of SQL that extracts the first 7 bytes, converts them using the convert_to_date() call, then extracts the last 4 bytes and converts them to a number using a simple to_number() call (with an nvl() thrown in to deal with the special case of there being no nanosecond component), divides by 1e9, converts the result to an interval in seconds using the numtointervalds() function, and adds that to the date. To keep things clean I’ll only apply the mess to the low_value column. You’ll note that I’ve


select 
        column_name, 
        low_value,
--
--      convert bytes 8 onwards to numeric with an nvl()
--      to handle the case of zero precision when there 
--      are no bytes to show the number of nanoseconds
--
--      Note - I've used substr() to bytes 8 onwards turns
--      into characters 15 - 22.
--
        nvl(
                to_number(
                        hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                ),
                0
        ) nanoseconds,
--
--      repeat the conversion, and wrap with numtodsinterval()
--      to show the resulting interval in seconds.
--
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                )+0,
                'SECOND'
        ) interval_val,
--
--      Converted the 1st 7 bytes (14 characters) to a date,
--      coerce the result to a timestamp, then convert the 
--      nanoseconds bytes to an interval and add
--
        to_timestamp(
                dbms_stats.convert_raw_to_date(hextoraw(substr(low_value,1,14)))
        ) + 
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                ),
                'SECOND'
        ) timestamp_val
from 
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  NANOSECONDS INTERVAL_VAL                     TIMESTAMP_VAL
-------------------- -------------------------- ----------- -------------------------------- --------------------------------
T0                   7878040116100B                       0 +000000000 00:00:00.000000000    01-Apr-2020 21:15:10.000000000
T3                   7878040116100B0754D4C0       123000000 +000000000 00:00:00.123000000    01-Apr-2020 21:15:10.123000000
T6                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000
T9                   7878040116100B075BCD15       123456789 +000000000 00:00:00.123456789    01-Apr-2020 21:15:10.123456789
TS                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000

5 rows selected.

If you go back to the original output we got from selecting the actual values from the table you can see that the timestamp_val column we’ve generated correctly matches the column values for the lower of the two original values we inserted into the table.

Q.E.D.

October 30, 2019

Strange Estimates.

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 1:10 pm GMT Oct 30,2019

A question came up on the Oracle-L list server a couple of days ago expressing some surprise at the following execution plan:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |       |       |   845 (100)|          |
|   1 |  SORT AGGREGATE                      |                 |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| ANY_TABLE       | 84827 |   579K|   845   (1)| 00:00:01 |
|   3 |    SORT CLUSTER BY ROWID             |                 | 68418 |       |    76   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                 | ANY_INDEX       | 68418 |       |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("X"."ANY_COLUMN1"='J')
   4 - access("X"."ANY_COLUMN2"=89155)

You’ll notice that this is a very simple query accessing a table by index, yet the estimated table rows found exceeds the estimated number of index entries used to probe the table. How can this happen. The answer (most frequently) is that there’s a mismatch between the table (or, more commonly, column) statistics and the index statistics. This seems to happen very frequently when you start mixing partitioned tables with global (or globally partitioned) indexes but it can happen in very simple cases, especially since a call to gather_table_stats() with cascade set to true and using the auto_sample_size will take a small sample from the index while using a 100% “sample” from the table.

Here’s an example I engineered very quickly to demonstrate the point. There’s no particular reason for the choice of DML I’ve used on the data beyond a rough idea of setting up a percentage of nulls and deleting a non-uniform pattern of rows.


rem
rem     Script:         table_index_mismatch.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,1000)                n1,
        mod(rownum,1000)                n2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1, for columns (n1,n2) size 1'
        );
end;
/

create index t1_i1 on t1(n1);

delete from t1 where mod(trunc(sqrt(n1)),7) = 0;
update t1 set n1 = null where mod(n1,10) = 0;
delete from t1 where mod(n1,10) = trunc(dbms_random.value(0,10));

execute dbms_stats.gather_table_stats(user,'t1',estimate_percent=>1)
execute dbms_stats.gather_index_stats(null,'t1_i1',estimate_percent=> 0.01)

Of course you’re not supposed to collect stats with arbitrary samples in any recent version of Oracle, so going for a 1% and 0.01% sample seems a little daft but I’m just doing that to demonstrate the problem with a very small data set.

After generating the data and gathering the stats I ran a few queries to pick out some critical numbers.


select
        table_name, sample_size, num_rows
from
        user_tables
where
        table_name = 'T1'
/

select 
        index_name, sample_size, num_rows, distinct_keys
from
        user_indexes
where
        table_name = 'T1'
and     index_name = 'T1_I1'
/

select
        column_name, sample_size, num_nulls, num_distinct
from
        user_tab_cols
where
        table_name = 'T1'
and     (
            column_name = 'N1'
         or virtual_column = 'YES'
        )
order by
        column_name
/

You’ll notice that I’ve only picked one of my original columns and any virtual columns. My gather_table_stats() call had a method_opt that included the creation of extended stats for the column group (n1, n2) and I want to report the stats on the resulting virtual column.


TABLE_NAME           SAMPLE_SIZE   NUM_ROWS
-------------------- ----------- ----------
T1                          7865     786500


INDEX_NAME           SAMPLE_SIZE   NUM_ROWS DISTINCT_KEYS
-------------------- ----------- ---------- -------------
T1_I1                     385779     713292           714


COLUMN_NAME                      SAMPLE_SIZE  NUM_NULLS NUM_DISTINCT
-------------------------------- ----------- ---------- ------------
N1                                      7012      85300          771
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS          7865          0          855

A couple of observations on the stats

  • the table sample size is, as expected, 1% of the reported num_rows (the actual count is 778,154).
  • The index sample size is much bigger than expected – but that’s probably related to the normal “select 1,100 leaf blocks strategy”. Because of the skew in the pattern of deleted values it’s possible for the sample size in this model to vary between 694,154 and something in the region of 380,000.
  • The n1 sample size is about 10% smaller than the table sample size – but that’s because I set 10% of the column to null.
  • The column group sample size matches the table sample size because column group hash values are never null, even if an underlying column is null.

So let’s check the execution plan for a very simple query:


set autotrace on explain
select id from t1 where n1 = 140 and n2 = 140;
set autotrace off


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   920 | 11960 |   918   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   920 | 11960 |   918   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |   909 |       |     5   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=140)
   2 - access("N1"=140)

The estimate for relevant index rowids is smaller than the estimate for the number of table rows! The difference is not as extreme as the case reported on Oracle-l, but I’m only trying to demonstrate a principle, not reproduce the exact results.

There are several ways in which contradictory results like this can appear – but in this case we can see the following:

  • For the table access table.num_rows/column.num_distinct = 786,500 / 855 = 919.88  (using the column group num_distinct)
  • For the index range scan: (table.num_rows – column.num_nulls) / column.num_distinct = (786500 – 85300) / 771 = 909.47 (using the n1 statistics)

So the change in strategy as it becomes possible for the optimizer to take advantage of the column group means the index and table have been using incompatible sets of stats (in particular there’s that loss of information about NULLs) as their cardinalities are calculated. The question, then, is “how much is that likely to matter”, and the follow-up if it can matter is “in what circumstancs could the effect be large enough to cause problems”. But that’s a topic for another day.

Update / Footnote

In the case of the Oracle-l example, there was no column group, and in some cases the optimizer would produce a plan where the table estimate was much smaller than the index estimate, and in other cases (like the opening plan above) the table estimate was signficantly greater than the index estimate. This was a side effect of adaptive statistics: the low table estimate was due to the basic “multiply separate selectivities”; but the with adaptive statistics enabled Oracle started sampling the table to check the correlation between the two tables, and then produced an SQL Plan Directive to do so and got to the higher (and correct) result.

 

 

August 14, 2019

gather_system_stats

Filed under: CBO,Exadata,Oracle,Statistics,System Stats — Jonathan Lewis @ 2:20 pm BST Aug 14,2019

What happens when you execute dbms_stats.gather_system_stats() with the ‘Exadata’ option ?

Here’s what my system stats look like (12.2.0.1 test results) after doing so. (The code to generate the two different versions is at the end of the note).


System Stats
============
Status: COMPLETED
Timed: 13-Aug-2019 15:00:00 - 13-Aug-2019 15:00:00
--------------------------------------------------
CPUSPEED        :
CPUSPEEDNW      :          918
IOSEEKTIM       :           10
IOTFRSPEED      :      204,800
MAXTHR          :
MBRC            :          128
MREADTIM        :
SLAVETHR        :
SREADTIM        :

PL/SQL procedure successfully completed.

MBRC       :          128
MREADTIM   :
SREADTIM   :
CPUSPEED   :
CPUSPEEDNW :          918
IOSEEKTIM  :           10
IOTFRSPEED :      204,800
MAXTHR     :
SLAVETHR   :

PL/SQL procedure successfully completed.

All the code does is set the MBRC, IOSEEKTIM, and IOTFRSPEED to fixed values and the only real gather is the CPUSPEEDNW. The parameters showing blanks are deliberately set null by the procedure – before I called the gather_system_stats() every parameter had a value. You could also check the SQL trace file (with bind captured enabled) to see the statements that deliberately set those parameters to null if you want more proof.

What are the consequences of this call (assuming you haven’t also done something with the calibrate_io() procedure? Essentially Oracle now has information that says a single (8KB) block read request will take marginally over 10 milliseconds, and a multiblock read request of 1MB will take just over 15 milliseconds: in other words “tablescans are great, don’t use indexes unless they’re really precisely targetted”. To give you a quantitative feel for the numbers: given the choice between doing a tablescan of 1GB to pick 1,500 randomly scattered rows and using a perfect index the optimizer would choose the index.

To explain the time calculations: Oracle has set an I/O seek time of 10 ms, and a transfer rate of 204,800 bytes per ms (200 MB/s), with the guideline that a “typical” multiblock read is going to achieve 128 blocks. So the optimizer believes a single block read will take 10 + 8192/204800 ms = 10.04ms, while a multiblock read request for 1MB will take 10 + 1048576/204800 ms = 15.12 ms.

It’s also important to note that Oracle will use the 128 MBRC value in its calculation of the cost of the tablescan – even if you’ve set the db_file_mulitblock_read_count parameter for the session or system to something smaller; and if you have set the db_file_multiblock_read_count that’s the maximum size of multiblock read that the run-time engine will use.

Code

Here are the two procedures I used to report the values above. You will only need the privilege to execute the dbms_stats package for the second one, but you’ll need the privilege to access the SYS table aux_stats$ to use the first. The benefit of the first one is that it can’t go out of date as versions change.


rem
rem     Script:         get_system_stats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

set linesize 180
set trimspool on
set pagesize 60
set serveroutput on

spool get_system_stats

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        for r1 in (
                select  rownum rn, pname
                from    sys.aux_stats$
                where   sname = 'SYSSTATS_MAIN'
        ) loop
                dbms_stats.get_system_stats(m_status, m_start, m_stop, r1.pname, m_value);
                if r1.rn = 1 then
                        dbms_output.put_line('System Stats');
                        dbms_output.put_line('============');
                        dbms_output.put_line('Status: ' || m_status);
                        dbms_output.put_line(
                                'Timed: ' ||
                                to_char(m_start,'dd-Mon-yyyy hh24:mi:ss') ||
                                ' - ' ||
                                to_char(m_stop ,'dd-Mon-yyyy hh24:mi:ss')
                        );
                        dbms_output.put_line('--------------------------------------------------');
                end if;
                dbms_output.put_line(rpad(r1.pname,15) ||  ' : ' || to_char(m_value,'999,999,999'));
        end loop;
end;
/

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MBRC', m_value);
        dbms_output.put_line('MBRC       : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MREADTIM', m_value);
        dbms_output.put_line('MREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SREADTIM', m_value);
        dbms_output.put_line('SREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEED', m_value);
        dbms_output.put_line('CPUSPEED   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEEDNW', m_value);
        dbms_output.put_line('CPUSPEEDNW : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOSEEKTIM', m_value);
        dbms_output.put_line('IOSEEKTIM  : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOTFRSPEED', m_value);
        dbms_output.put_line('IOTFRSPEED : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MAXTHR', m_value);
        dbms_output.put_line('MAXTHR     : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SLAVETHR', m_value);
        dbms_output.put_line('SLAVETHR   : ' || to_char(m_value,'999,999,999'));
end;
/

spool off

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

Update Oct 2019

I’ve been saying for years that I don’t like the trick of pulling the Outline Information from an execution plan in memory and storing it in the database as an SQL Profile because that’s effectively storing an SQL Plan Baseline as an SQL Profile and there might be subtle and (potentially) misleading side effects of abusing the two mechanisms. Behind the argument I’ve also made the observation that while both mechamisms store hints, the hints for an SQL Profile are about statistics and the hints for an SQL Plan Baseline are about transformations, joins, and other mechanis.

However .;.

I’ve now down the test I described in the foot note above – created a table with data in it that made Oracle choose full tablescans for the t2a and t2b tables, then changed the data (without changing the object statistic) and run the SQL Tuning tool to see if the optimizer would suggest the plan I wanted and offer a profile to produce it.

I was successful – Oracle offered the profile, and when I looked at it (before accepting it) it looked like this:


         1 OPT_ESTIMATE(@"SEL$1", TABLE, "T2"@"SEL$1", SCALE_ROWS=200)
         1 OPT_ESTIMATE(@"SEL$1", JOIN, ("T2"@"SEL$1", "T1"@"SEL$1"), SCALE_ROWS=15)
         1 OPTIMIZER_FEATURES_ENABLE(default)
         1 IGNORE_OPTIM_EMBEDDED_HINTS

But when I accepted it and looked at it again it looked like this:


        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
        IGNORE_OPTIM_EMBEDDED_HINTS
        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
        INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
        USE_NL(@"SEL$1" "T2"@"SEL$1")
        LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
        NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
        FULL(@"SEL$1" "T1"@"SEL$1")
        OUTLINE(@"SET$1")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$2")
        OUTLINE_LEAF(@"SEL$1")
        PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
        OUTLINE_LEAF(@"SET$5715CE2E")
        OUTLINE_LEAF(@"SEL$B01C6807")
        OUTLINE_LEAF(@"SEL$639F1A6F")
        ALL_ROWS
        DB_VERSION('19.1.0')
        OPTIMIZER_FEATURES_ENABLE('19.1.0')
        INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))

In other words, Oracle has recorded something that looks like an SQL Plan Baseline and called it an SQL Profile.

July 1, 2019

opt_estimate 4

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:18 pm BST Jul 1,2019

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

June 28, 2019

opt_estimate 3

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:12 pm BST Jun 28,2019

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

June 25, 2019

opt_estimate 2

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t change the arithmetic the optimizer uses, but it does mean the presentation of the nested loop goes back to the original pre-9i form which makes it a little easier to see costs and cardinalities adding and multiplying their way through the plan. I do not do this in production systems.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2. The optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15 * 15 = 255 and the cost comes to: 2 (t1 cost) + (15 (t1 rows) * 16 (t2 unit cost)) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name   nlj_index_scan,   target_table_alias   (list of possible driving tables),   target_index,   numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is: 2 (t1 cost) + (15 (t1 rows) * 2 (t2 unit cost)).

Let’s try to adjust the prediction that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decide to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality at operation 1 has also dropped from 225 to 47 – if the optimizer is told that its cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins (where “multi” can be greater than 2).

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints and remember we’vs disabled “nlj batching“).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted became desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

April 12, 2019

In-table predicates

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:49 pm BST Apr 12,2019

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

It turned out that I’d already written a blog note with a throwaway comment about the estimates and a general workaround for optimizer problems caused by examples of this kind. The comment I made about the estimate was that the selectivity seems to be the smaller of the selectivities of (using the example above) “amount_paid = :unpeekable_bind” and “amount_invoice = :unpeekable_bind”. I’m fairly sure I’ve made similar comments several times in the past, but after replying to the email I started to wonder whether this would still be true if there were histograms on the columns. So I ran up a little test and here, to start things off, is the code to generate the data I used for testing:


rem
rem     Script:         column_equality_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem     Purpose:

create table t1(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t2(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t3(
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(50)
)
;

execute dbms_random.seed(0)

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t3 (n1, n2, v1)
select
        t1.n1,
        t2.n1,
        rpad(rownum,50)
from
        t1, t2
where
        t1.id = t2.id
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 254 for columns v1 size 1'
        );
end;
/

select
        table_name, column_name, num_distinct, density, histogram,
        low_value, high_value
from
        user_tab_cols
where
        table_name in ('T1','T2','T3')
and     column_name in ('N1','N2')
order by
        table_name, column_name
;


TABLE_NAME      COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM       LOW_VALUE  HIGH_VALUE
--------------- --------------- ------------ ---------- --------------- ---------- ----------
T1              N1                        38     .00005 FREQUENCY       80         C128

T2              N1                        38     .00005 FREQUENCY       80         C126

T3              N1                        38     .00005 FREQUENCY       80         C128
                N2                        38     .00005 FREQUENCY       80         C126


I’ve created two sets of 10,000 rows each of normally distributed data – but taken the absolute values so I’ve only got half the bell curve, and I’ve scaled up by a factor of 10 and truncated. This has given me two similar but slightly different sets of values which happen to cover 38 distinct values each.

I’ve then generated my test set by joining these two tables on the unique (though not declared as such) id column to give a table with the same number of rows and two skewed sets of data. The calls to dbms_stats create histograms on the skewed data sets, and I’ve reported a few significant numbers about the 4 relevant columns.

Looking at the column statistics we have num_distinct = 38 across the board – so my observation from paragraph 2 above would tend to suggest that the optimizer would report 10,000/38 = 263 as the cardinality estimate for the predciate “t3.n1 = t3.n2” (I’m fairly confident that in this case 1/num_distinct will be preferred over using the density from user_tab_cols). But here’s what we get from a call to explain plan:


explain plan for
select
        v1
from
        t3
where
        n1 = n2
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   564 | 32148 |    18   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |   564 | 32148 |    18   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"="N2")

The estimate is 564 – which is a pretty good estimate in this case (the actual result was 552) as the two columns were randomly generated and there’s no correlation between them. Unfortunately this is quite a long way of my assumption of 263, so where did the optimizer get that number from?

Here’s a query (with result set) that you may recognise from an earlier post.


break on report skip 1
compute count of value on report
compute sum of t1_frequency on report
compute sum of t2_frequency on report
compute sum of product on report

column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N1'
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N2'
)
select
        f1.value,
        f1.frequency    t1_frequency,
        f2.frequency    t2_frequency,
        f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
order by
        f1.value
;



     VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ------------ ------------ ------------
         0          777          768      596,736
         1          806          753      606,918
         2          794          779      618,526
         3          808          763      616,504
         4          752          749      563,248
         5          627          729      457,083
         6          623          628      391,244
         7          584          616      359,744
         8          544          597      324,768
         9          512          546      279,552
        10          441          439      193,599
        11          409          342      139,878
        12          345          370      127,650
        13          318          300       95,400
        14          257          282       72,474
        15          244          242       59,048
        16          214          206       44,084
        17          172          193       33,196
        18          161          140       22,540
        19          113          114       12,882
        20          108           93       10,044
        21           95           81        7,695
        22           72           55        3,960
        23           54           56        3,024
        24           43           36        1,548
        25           38           31        1,178
        26           23           18          414
        27           18           23          414
        28            7           14           98
        29            9           13          117
        30           14           11          154
        31            4            2            8
        32            5            3           15
        33            1            3            3
        35            4            1            4
        37            2            2            4
---------- ------------ ------------ ------------
        36
                   9998         9998    5,643,754


I’m querying the histoggram information for the two columns, and where t3.n1 and t3.n2 have a value in common I’ve reported the two frequencies for that value and the product of the frequencies. For convenience I’ve included a count and a couple of sums to show that there isn’t a perfect match in the set of values for the two columns. The most important number at the bottom of the page, though, is the sum of the products of frequencies of common values. Take that value and divide by 10,000 and you get 564.3754 – compare that with the cardinality estimate of the predicate “t3.n1 = t3.n2”, it’s a perfect match (allowing for rounding).

The query against user_tab_histograms is the query I used to calculate the cardinality of a join where there were frequency histograms on the columns at both ends of the join. The optimizer’s estimate for “intra-table” predicates is consistent with its estimate for joins (in the special cases of “no histograms” and “two frequency histograms”, at least). Viewing it from a slightly different angle: the selectivity of the predicate “n1 = n2” can be derived as “the cardinality estimate for joining t3 to itself” divided by “the cardinality of the cartesian join” (the latter being num_rows * num_rows, of course).

Just as a closing demo – lets generate a plan for the appropriate self-join of t3 and check the cardinality estimate:


explain plan for
select
        t3a.v1, t3b.v1
from
        t3 t3a, t3 t3b
where
        t3a.n2 = t3b.n1
;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  5643K|   581M|   138  (83)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  5643K|   581M|   138  (83)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3A"."N2"="T3B"."N1")


As expected the (rounded) join cardinality is reported as 5,643K.

So the selectivity of the single table predicate “n1 = n2” will be (5,643,000 / (10,000 * 10,000) = 0.05643 and the cardinality estimate of the single table query will be 10,000 * 0.05643 = 564.3 QED.

I haven’t tested any other variations of types of histogram, degree of overlap of value ranges, etc. but I suspect that the general principle is probably going to give the selectivity as (or with the appearance of): “estimated cardinality of self-join” / “square of num_rows (allowing for nulls)”.

 

March 22, 2019

Stats advisor

Filed under: 12c,Oracle,Statistics — Jonathan Lewis @ 1:10 pm GMT Mar 22,2019

This is just a little shout-out about the Stats Advisor – if you decide to give it a go, what sort of things is it likely to tell you. The answer is in a dynamic performance view called v$stats_advisor_rules – which I’ve list below from an instance running 18.3.0.0.


SQL> set linesize 180
SQL> set trimspool on
SQL> set pagesize 40
SQL> column description format a75
SQL> column name format a32
SQL> break on rule_type duplicate skip 1
SQL> select * from v$stats_advisor_rules;

  RULE_ID NAME                             RULE_TYPE DESCRIPTION                                                                     CON_ID
---------- -------------------------------- --------- --------------------------------------------------------------------------- ----------
         0                                  SYSTEM                                                                                         0
         1 UseAutoJob                       SYSTEM    Use Auto Job for Statistics Collection                                               0
         2 CompleteAutoJob                  SYSTEM    Auto Statistics Gather Job should complete successfully                              0
         3 MaintainStatsHistory             SYSTEM    Maintain Statistics History                                                          0
         4 UseConcurrent                    SYSTEM    Use Concurrent preference for Statistics Collection                                  0
         5 UseDefaultPreference             SYSTEM    Use Default Preference for Stats Collection                                          0
         6 TurnOnSQLPlanDirective           SYSTEM    SQL Plan Directives should not be disabled                                           0

         7 AvoidSetProcedures               OPERATION Avoid Set Statistics Procedures                                                      0
         8 UseDefaultParams                 OPERATION Use Default Parameters in Statistics Collection Procedures                           0
         9 UseGatherSchemaStats             OPERATION Use gather_schema_stats procedure                                                    0
        10 AvoidInefficientStatsOprSeq      OPERATION Avoid inefficient statistics operation sequences                                     0

        11 AvoidUnnecessaryStatsCollection  OBJECT    Avoid unnecessary statistics collection                                              0
        12 AvoidStaleStats                  OBJECT    Avoid objects with stale or no statistics                                            0
        13 GatherStatsAfterBulkDML          OBJECT    Do not gather statistics right before bulk DML                                       0
        14 LockVolatileTable                OBJECT    Statistics for objects with volatile data should be locked                           0
        15 UnlockNonVolatileTable           OBJECT    Statistics for objects with non-volatile should not be locked                        0
        16 MaintainStatsConsistency         OBJECT    Statistics of dependent objects should be consistent                                 0
        17 AvoidDropRecreate                OBJECT    Avoid drop and recreate object seqauences                                            0
        18 UseIncremental                   OBJECT    Statistics should be maintained incrementally when it is beneficial                  0
        19 NotUseIncremental                OBJECT    Statistics should not be maintained incrementally when it is not beneficial          0
        20 AvoidOutOfRange                  OBJECT    Avoid Out of Range Histogram endpoints                                               0
        21 UseAutoDegree                    OBJECT    Use Auto Degree for statistics collection                                            0
        22 UseDefaultObjectPreference       OBJECT    Use Default Object Preference for statistics collection                              0
        23 AvoidAnalyzeTable                OBJECT    Avoid using analyze table commands for statistics collection                         0

24 rows selected.

As you can see the rules fall into three groups: system, operation, and object – and you can’t help noticing at all three levels how commonly the theme is: “just stick with the defaults!”.

As so often happens when I start writing a catch-up or “remind myself” not I found that Tim Hall has already written all about it.

March 13, 2019

Hash Partitions

Filed under: Oracle,Partitioning,Statistics — Jonathan Lewis @ 1:13 pm GMT Mar 13,2019

Here’s an important thought if you’ve got any large tables which are purely hash partitioned. As a general guideline you should not need partition level stats on those tables. The principle of hash partitioned tables is that the rows are distributed uniformly and randomly based on the hash key so, with the assumption that the number of different hash keys is “large” compared to the number of partitions, any one partition should look the same as any other partition.

Consider, as a thought experiment (and as a warning), a table of product_deliveries which is hash partitioned by product_id with ca. 65,000 distinct products that have been hashed across 64 partitions. (Hash partitioning should always use a power of 2 for the partition count if you want the number of rows per partition to be roughly the same across all partitions – if you don’t pick a power of two then some of the partitions will be roughly twice the size of others.)

Consider a query for “deliveries to Basingstoke” – in the absence of a histogram on the delivery location the optimizer will produce a cardinality estimate that is:

  • total rows in table / number of distinct delivery locations in table

Now consider a query for: “deliveries of product X to Basingstoke” – again in the absence of histograms. The optimizer could have two ways of calculating this cardinality:

  • total rows in table / (number of distinct products in table * number of distinct delivery locations in table)
  • total rows in relevant partition / (number of distinct products in relevant partition * number of distinct delivery locations in relevant partition)

But given the intent of hash partitioning to distribute data evenly we can make three further observations:

  1. the number of rows in any one partition should be very similar to the number of rows in the table divided by the number of partitions
  2. the number of distinct products in any one partition should be very similar to the number of products in the table divided by the number of partitions
  3. the number of distinct locations in any one partition should be very similar to the number of distinct locations in the whole table.

The second condition holds because product is the partition key, the third holds because location is not the partition key.

So we can rewrite the second, partition-oriented, formula as:

  • (total rows in table / number of partitions) / ((number of distinct products in table / number of partitions) * number of distinct locations in table)

which, re-arranging parentheses and cancelling common factors, reduces to:

  • total rows in table / (number of distinct products in table * number of distinct locations in table)

which matches the first formula. (Q.E.D.) In the absence of any statistics on hash partitions the optimizer can (ought to be able to) produce reasonable cardinality estimates based purely on table-level stats.

In fact if you look back into the history of partitioning this observation is implicit in the early days of composite partitioning when the only option was for range/hash composite partitions – the optimizer never used sub-partition stats to calculate costs or cardinality it used only partition-level statistics. (And it was several years before the optimizer caught up to the fact that (e.g.) range/list composite partitioning might actually need to do arithmetic based on subpartition stats.)

I did say that the example was also a warning. Hash partitioning is “expected” to have a large number of distinct key values compared to the number of partitions. (If you don’t meet this requirement then possibly you should be using list partitioning). There’s also a “uniformity” assumption built into the arithmetic (both the basic arithmetic and the hand-waving discussion I produced above). Just imagine that your company supplies a handful of products that for some strange reason are incredibly popular  in Basingstoke. If this is the case then the assumption that “all partitions look alike” is weakened and you would have to consider the possibility that the variation would require you to produce a workaround to address problems of poor cardinality estimates that the variation might produce.

A pattern of this type has two generic effects on the optimizer, of course. First is the simple skew in the data – to have a significant impact the number of rows for the problem products would have to be much larger than average, which suggests the need for a suitably crafted histogram; secondly there’s an implied correlation between a few products and Basingstoke, so you might even end up creating a column group and manually coding a histogram on it to capture the correlation.

 

March 11, 2019

sys_op_lbid

Filed under: Indexing,Infrastructure,IOT,Oracle,Statistics — Jonathan Lewis @ 1:23 pm GMT Mar 11,2019

I’ve made use of the function a few times in the past, for example in this posting on the dangers of using reverse key indexes, but every time I’ve mentioned it I’ve only been interested in the “leaf blocks per key” option. There are actually four different variations of the function, relevant to different types of index and controlled by setting a flag parameter to one of 4 different values.

The call to sys_op_lbid() take 3 parameters: index (or index [sub]partition object id, a flag vlaue, and a table “rowid”, where the flag value can be one of L, R, O, or G. The variations of the call are as follows:

  • L – the function will return the row directory address  (i.e. something that look like a rowid) of the first index entry in the leaf block that holds the index entry for the referenced table rowid. The effect of this is that the number of distinct values returned by calling the function for every row in the table is equal to the number of index leaf blocks which current hold an active entry.
  • R – Relevant only to bitmap indexes; the function will return the row directory address of the bitmap index entry for the referenced table rowid. The effect of this is that the number of distinct values returned by calling the function for every row in the table is equal to the number of index entries in the bitmap index.
  • O – Relevent only to the primary key index of an index organized table with an overflow. The function is used with a non-key column instead of a rowid and returns a rowid that corresponds to the row directory entry in the overflow segment. An interesting detail of the overflow entries is that there is an “nrid” (next rowid) pointer in the primary key index entry that does not get deleted when all the columns in the related overflow entry are set null – so you can delete all the data from the overflow (set every overflow column in every row to null) and the primary key clustering factor would not change.
  • G – Relevent only to secondary indexes on an index organized table. Like the L and R options this function takes a rowid (which is a special case for IOTs) as one of its inputs and uses the block guess from the secondary index to construct a row directory entry for the first entry in the primary key leaf block that corresponds to that block guess. This serves two purposes – it allows Oracle to calculate the clustering factor of the secondary index (as you walk the secondary index in order how much do you jump around the leaf blocks of the primary key), and it allows Oracle to produce the pct_direct_access figure for the secondary index by joining the secondary index to the primary key index on primary key, and comparing the ‘G’ result for the secondary with the ‘L’ result from the primary, which gives a count of the number of times the guess is correct.

These observations can be confirmed by gathering stats on different structures with trace enabled, and doing a couple of block dumps. For reference the following is just a simple script to create an index organized table with overflow and secondary index:


rem
rem     Script:         sys_op_lbid_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem

create table t1(
        id      constraint t1_pk primary key,
        v1      ,
        v2      ,
        v3      ,
        padding 
)
organization index
pctthreshold 2
overflow
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum,
        lpad(rownum,30),
        lpad(rownum,30),
        lpad(rownum,40),
        rpad('x',100,'x')
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(v3);

alter session set sql_trace true;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter session set sql_trace false;

select
        object_id, object_name
from
        user_objects
order by
        object_id
;

The significance of the query for object_id and data_object_id shows up in the trace file (and subsequent dumps) when Oracle uses one or other of the values in its SQL and rowid construction.

Here are the interesting SQL statements generated as the stats are gathered – but cosmetically altered to be reader-friendly. In order they are:

  1. Stats for primary key of IOT: using the ‘L’ option for counting leaf blocks and the ‘O’ option for the clustering factor into overflow segment.
  2. Stats for secondary index of IOT: using the ‘L’ option for counting leaf blocks and the ‘G’ option for the clustering factor into the primary key index
  3. Calculate pct_direct_access: the ‘L’ option gives the actual leaf block in the primary key index, the ‘G’ option gives the leaf block guessed by the secondary index

select 
        /*+ index(t,t1_pk) */ 
        count(*) as nrw,
        count(distinct sys_op_lbid(351334,'L',t.rowid)) as nlb,
        null as ndk,
        (sys_op_lbid(351334,'O',V1),1) as clf
from
        t1 t 
where 
        id is not null
;


select 
        /*+ index(t,t1_i1) */ 
        count(*) as nrw,
        count(distinct sys_op_lbid(351335,'L',t.rowid)) as nlb,
        null as ndk,
        sys_op_countchg(sys_op_lbid(351335,'G',t.rowid),1) as clf
from
        t1 t 
where 
        v3 is not null
;


select
        case when count(*) = 0
                then 100
                else round(
                        count(
                                case when substr(gdba,7,9)=substr(lbid,7,9)
                                        then 1
                                        else null
                                end
                        )/count(*)*100
                )
        end
from    (
        select
                /*+
                        ordered
                        use_hash(i.t1 t2)
                        index_ffs(t2,t1_pk)
                */
                sys_op_lbid(351334,'L',t2.rowid) lbid,
                gdba
        from (
                select
                        /*+ index_ffs(t1,t1_i1) */
                        sys_op_lbid(351335,'G',t1.rowid) gdba,
                        t1.ID
                from
                        t1 t1
                ) i,
`               t1 t2
        where
                i.id = t2.id
        )
;

The strange substr(,7,9) that appears in the join between the primary key index and the secondary index is needed because the ‘G’ option uses the object_id of the table to turn an absolute block guess into a rowid while the ‘L’ option is using the data_object_id of the primary key index to turn its block addrss into a rowid. (This means there may be variants of this SQL for IOTs using partitioning.)

 

December 14, 2018

Extreme Nulls

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 7:01 pm GMT Dec 14,2018

This note is a variant of a note that I wrote a few months ago about the impact of nulls on column groups. The effect showed up recently on a client site with a little camouflage that confused the issue for a little while, so I thought it would be worth a repeat.  We’ll start with a script to generate some test data:

rem
rem     Script:         pt_hash_cbo_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.1.0.1
rem             12.1.0.2
rem

create table t1 (
        hash_col,
        rare_col,
        n1,
        padding
)
nologging
partition by hash (hash_col)
partitions 32
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        mod(rownum,128),
        case when mod(rownum,1021) = 0 
                then rownum + trunc(dbms_random.value(-256, 256))
        end case,
        rownum,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1048576 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(hash_col, rare_col) nologging
local compress 1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                granularity => 'ALL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve got a hash-partitioned table with 32 partitions; the partitioning key is called hash_col, and there is another column called rare_col that is almost alway null – roughly 1 row in every 1,000 holds a value. I’ve added a local index on (hash_col, rare_col) compressing the leading column since hash_col is very repetitive, and gathered stats on the partitions and table. Here’s a view of the data for a single value of hash_col, and a summary report of the whole data set:

select  
        hash_col, rare_col, count(*)
from
        t1
where
        hash_col = 63
group by
        hash_col, rare_col
order by
        hash_col, rare_col
;

  HASH_COL   RARE_COL   COUNT(*)
---------- ---------- ----------
        63     109217          1
        63     240051          1
        63     370542          1
        63     501488          1
        63     631861          1
        63     762876          1
        63     893249          1
        63    1023869          1
        63                  8184

9 rows selected.

select
        count(*), ct
from    (
        select
                hash_col, rare_col, count(*) ct
        from
                t1
        group by
                hash_col, rare_col
        order by
                hash_col, rare_col
        )
group by ct
order by count(*)
;

  COUNT(*)         CT
---------- ----------
         3       8183
       125       8184
      1027          1

Given the way I’ve generated the data any one value for hash_col will have there are 8,184 (or 8,183) rows where the rare_col is null; but there are 1027 rows which have a value for both hash_col and rare_col with just one row for each combination.

Now we get to the problem. Whenever rare_col is non null the combination of hash_col and rare_col is unique (though this wasn’t quite the case at the client site) so when we query for a given hash_col and rare_col we would hope that the optimizer would be able to estimate a cardinality of one row; but this is what we see:


variable n1 number
variable n2 number

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   908 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

The optimizer has predicted a massive 908 rows. A quick check of the object stats shows us that this is “number of rows in table” / “number of distinct keys in index” (1,048,576 / 1,155, rounded up).

Any row with rare_col set to null cannot match the predicate “rare_col = :n2”, but because the optimizer is looking at the statistics of complete index entries (and there are 1048576 of them, with 1155 distinct combinations, and none that are completely null) it has lost sight of the frequency of nulls for rare_col on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this topic).

I’ve often said in the past that you shouldn’t create histograms on data unless your code is going to use them. In this case I need to stop the optimizer from looking at the index.distinct_keys and one way to do that is to create a histogram on one of the columns that defines the index; and I’ve chosen to do this with a fairly arbitrary size of 10 buckets:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns rare_col size 10')

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |     1 |    12 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Bonus observation

This problem came to my attention (and I’ve used a partitioned table in my demonstration) because I had noticed an obvious optimizer error in the client’s execution plan for exactly this simple a query. I can demonstrate the effect the client saw by running the test again without creating the histogram but declaring hash_col to be not null. Immediately after creating the index I’m going to add the line:


alter table t1 modify hash_col not null;

(The client’s system didn’t declare the column not null, but their equivalent of hash_col was part of the primary key of the table which meant it was implicitly declared not null). Here’s what my execution plan looked like with this constraint in place:


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |    28 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Spot the difference.

The estimate of index rowids is far smaller than the estimate of the rows that will be fetched using those rowids. This is clearly an error.

If you’re wondering how Oracle got this number divide 908 by 32 (the number of partitions in the table) – the answer is 28.375.

Fortunately it’s (probably) an error that doesn’t matter despite looking worryingly wrong. Critically the division hasn’t changed the estimate of the number of table rows (we’ll ignore the fact that the estimate is wrong anyway thanks to a different error), and the cost of the index range scan and table access have not changed. The error is purely cosmetic in effect.

Interestingly if you modify the query to be index-only (i.e. you restrict the select list to columns in the index) this extra division disappears.

Summary

1) If you have a B-tree index where one (or more) of the columns is null for a large fraction of the entries then the optimizer may over-estimate the cardinality of a predicate of the form: “(list of all index columns) = (list of values)” as it will be using the index.distinct_keys in its calculations and ignore the effects of nulls in the individual columns. If you need to work around this issue then creating a histogram on one of the index columns will be sufficient to switch Oracle back to the strategy of multiplying the individual column selectivities.

2) There are cases of plans for accessing partitioned tables where Oracle starts by using table-level statistics to get a suitable set of estimates but then displays a plan with the estimate of rows for an index range scan scaled down by the number of partitions in the table. This results in a visible inconsistency between the index estimate and the table estimate, but it doesn’t affect the cardinality estimate for the table access or either of the associated costs – so it probably doesn’t have a destabilising effect on the plan.

December 10, 2018

Case Study

Filed under: Execution plans,Oracle,Statistics — Jonathan Lewis @ 1:10 pm GMT Dec 10,2018

A recent thread on the ODC database forum highlighted a case where the optimizer was estimating 83,000 for a particular index full scan when the SQL Monitor output for the operation showed that it was returning 11,000,000 rows.

Apart from the minor detail that the OP didn’t specifically ask a question, the information supplied was pretty good. The OP had given us a list of bind variables, with values, and the SQL statement, followed by the text output of the Monitor’ed SQL and, to get the predicate section of the plan, the output from a call to dbms_xplan. This was followed by the DDL for the critical index and a list of the stats for all the columns in the index.

Here’s the critical line of the plan (from the SQL Monitor report) followed by its predicate section (from the dbms_xplan output, but cosmetically enhanced) and some details of the columns used in the predicate:

SQL Plan Monitoring Details (Plan Hash Value=3210215320)
=================================================================================================================================================================================================================================
| Id    |            Operation            |         Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |       Activity Detail       | Progress | 
|       |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |         (# samples)         |          |
=================================================================================================================================================================================================================================
|    11 |             INDEX FULL SCAN     | PK_HOUSEHOLD_GDC        |   83917 | 22799 |        86 |     +1 |     1 |      11M |     9 | 73728 |       |       |      |      |    24.21 | Cpu (77)                    |          |
=================================================================================================================================================================================================================================

  11 - filter(
        (    TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS_B_02)>=ADD_MONTHS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B_06)) 
         AND TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS_B_02)<=TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10)
        )

COLUMN_NAME                    DATA_TYPE       NUM_DISTINCT  DENSITY  NUM_NULLS LAST_ANALYZED       HISTOGRAM
------------------------------ --------------- ------------ -------- ---------- ------------------- ---------------
YEAR                           NUMBER                     5        0          0 2018-12-02 13:19:10 FREQUENCY
MONTH                          NUMBER                    12        0          0 2018-12-02 13:19:10 FREQUENCY

I’ve included the full Monitor output at the end of the posting, or you could visit the ODC page if you want to see it, but if we look at just this line we can see that the index full scan starts running in the first second of the query (‘Start Active’), runs once (‘Execs’) and, as the OP said, retrieved 11M rows in that one scan compared to an estimated 83,917.

When we examine the predicate section we can understand why the optimizer could make such a large error – the SQL requires Oracle to combine two columns from the table with various bits of bind variables to construct a date which is then compares with a couple of constant dates derived from several input bind variables using range based comparisons.

This is an example of Oracle using a fixed estimate of 5% for the selectivity of “unknown range-based comparison” – but with two comparisons the selectivity becomes 5% of 5% = 0.25% (i.e. 1/400).

If we look at the column definitions and stats we see that we seem to have 5 possible years and 12 possible months (which could mean a range as small as 3 years and 2 months) – so a selectivity of 1/400 would be in the right ballpark if we were querying for a date range of roughly 4.5 days. Working the figures the other way around – if 83,917 is 1/400 of the data then there are about 33.5M rows in the table and we are querying for something more like 1/3 of the table.

Observations

I find it curious that the optimizer used an “index full scan” to fetch a huge amount of data from the index when there is no requirement for sorting (there is a subsequent “hash unique”, rather than “sort unique nosort”). I would have expected an “index fast full scan” so I am curious to know if some optimizer parameters have been fiddled with to get the optimizer to bypass the fast full scan. Possibly a change in parameter settings would result in a very different plan.

The names of the bind variables are of the form “SYS_B_nn” – which means that the original query has been subject to the effects of forced cursor sharing. Since we are apparently expecting to identify and manipulate millions of rows this looks like the type of query where you don’t want to use cursor sharing. If the session can set “cursor_sharing=exact” before running the query, or inject the hint /*+ cursor_sharing_exact */ into the query then perhaps we’d get a better estimate of rows (and a better plan). If hinting or setting session parameters is possible then setting optimzer_dynamic_sampling to level 3, or possibly 4, might be sufficient.

The messy expression combining month and year is a crippling handicap to the optimizer – so fixing the query to make the literals visible isn’t actually going to help. This is Oracle 12c, though – so we could add a virtual date column (declared as invisible to avoid the threat of inserts that don’t specify column lists) and gather stats on it. The combination of virtual column and literal values might give the optimizer the information it really needs. Here’s a little script to demonstrate:


rem
rem     Script:         virtual_study.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        sysdate - (5 * 365) + rownum / 550      d1,
        to_number(
                to_char(
                        (sysdate - (5 * 365) + rownum / 550),
                        'MM'
                )
        )                                       month,
        to_number(
                to_char(
                        (sysdate - (5 * 365) + rownum / 550),
                        'YYYY'
                )
        )                                       year,
        lpad(rownum,10,'0')                     v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns month size 12 for columns year size 6'
        );
end;
/

I’ve created a table with a million rows with data going back roughly 5 years from current date, which means I need roughly 550 rows per day. I’ve then created histograms on the month and year columns to match the original posting. Now I’ll set up the bind variables and values specified by the OP and run a simple query to show the date information that the bind variables give, and the 1/400 selectivity of the OP’s predicate:


var SYS_B_00 varchar2(32);
var SYS_B_01 varchar2(32);
var SYS_B_02 varchar2(32);
var SYS_B_03 varchar2(32);
var SYS_B_04 varchar2(32);
var SYS_B_05 varchar2(32);
var SYS_B_06 number;
var SYS_B_07 varchar2(32);
var SYS_B_08 varchar2(32);
var SYS_B_09 varchar2(32);
var SYS_B_10 number;

exec :SYS_B_00:='01/';
exec :SYS_B_01:='/';
exec :SYS_B_02:='dd/MM/yyyy';
exec :SYS_B_03:='10/04/2018';
exec :SYS_B_04:='MM/dd/yyyy';
exec :SYS_B_05:='q';
exec :SYS_B_06:=12;
exec :SYS_B_07:='10/04/2018';
exec :SYS_B_08:='MM/dd/yyyy';
exec :SYS_B_09:='q';
exec :SYS_B_10:=1;

select
        to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02)  d1, 
        add_months(trunc(to_date(:sys_b_03,:sys_b_04),:sys_b_05),(-:sys_b_06))  c1,
        to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02)  d2,
        trunc(to_date(:sys_b_07,:sys_b_08),:sys_b_09)-:sys_b_10                 c2
from
        t1
where
        rownum = 1
;

set serveroutput off
alter session set statistics_level = all;

select  count(*)
from    t1
where
        (    to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02) >= add_months(trunc(to_date(:sys_b_03,:sys_b_04),:sys_b_05),(-:sys_b_06)) 
         and to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02) <= trunc(to_date(:sys_b_07,:sys_b_08),:sys_b_09)-:sys_b_10 )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

===========================================

D1        C1        D2        C2
--------- --------- --------- ---------
01-DEC-13 01-OCT-17 01-DEC-13 30-SEP-18


  COUNT(*)
----------
    200750

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:07.39 |    4980 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:07.39 |    4980 |
|*  2 |   FILTER            |      |      1 |        |    200K|00:00:06.42 |    4980 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   2500 |    200K|00:00:04.59 |    4980 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10 .ge. ADD_MON
              THS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B_06)))
   3 - filter((TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR")
              ,:SYS_B_02) .ge. ADD_MONTHS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B
              _06)) AND TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS
              _B_02) .le. TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10))


Note: in this and subsequent text I’ve had to use .le. to represent “less than or equal to” and .ge. to represent “greater than or equal to”. in the execution plans

This shows us that the first row in my table has a date component of 1st Dec 2013, while the date range required by the OP was one year’s worth of data between 1st Oct 2017 and 30th Sept 2018. The optimizer’s estimate of 2,500 rows out of 1M is the 1/400 we expect.

Let’s test the effect of running the query using literals (i.e. in the OP’s environment stop the “cursor_sharing = force” effect):


select
        count(*)
from    t1
where
        (    to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') >= add_months(trunc(to_date('10/04/2018','dd/MM/yyyy'),'q'),(-12)) 
         and to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') <= trunc(to_date('10/04/2018','dd/MM/yyyy'),'q')-1 )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'))

========================================================

 COUNT(*)
----------
    200750


--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   892 (100)|      1 |00:00:05.17 |    4980 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:05.17 |    4980 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2500 |   892  (30)|    200K|00:00:04.30 |    4980 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .ge. TO_DAT
              E(' 2017-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .le. TO_DATE(' 2018-03-31
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))


We can see that the literals have echoed through the plan to the predicate section, but the optimizer hasn’t changed its estimate. Let’s create the virtual column, gather stats on it, and try again:


alter table t1 add v_date invisible generated always as (
        to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy')
) virtual
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns v_date size 1')

select  /* virtual column */
        count(*)
from    t1
where
        (    to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') >= add_months(trunc(to_date('10/04/2018','dd/MM/yyyy'),'q'),(-12)) 
         and to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') <= trunc(to_date('10/04/2018','dd/MM/yyyy'),'q')-1 )
;

 select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

=======================================================================

 COUNT(*)
----------
    200750

--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   950 (100)|      1 |00:00:06.27 |    4980 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:06.27 |    4980 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    236K|   950  (34)|    200K|00:00:04.78 |    4980 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .ge. TO_DAT
              E(' 2017-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .le. TO_DATE(' 2018-03-31
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))



The optimizer sees that the expression involving month and year matches the virtual column definition, and evaluates the two date expression to produce simple constants and gives us a cardinality estimate in the right ballpark.

Conclusion

Cursor sharing and “big” queries don’t mix. If you have queries that have to manipulate large volumes of data then the overhead of optimising each one separately is likely to be insignificant, and the threat of cardinality errors introduced by bind variables being re-used could be significant.

If you have to make use of an existing (bad) table definition, and can’t managed to write predicates that allow the optimizer to use existing column statistics, remember that you might be able to create a virtual (and invisible) column that captures the necessary definition thereby allowing you to give Oracle some statistics about the necessary predicate.

Footnote

In case you didn’t want to scan through the ODC page, here’s the full SQL Monitor output for the original query:


Global Stats
==============================================================================================
| Elapsed |   Cpu   |    IO    | Cluster  |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
==============================================================================================
|     320 |      76 |      140 |       39 |       66 |     8M | 257K |   2GB |  1528 | 306MB |
==============================================================================================
 
 
SQL Plan Monitoring Details (Plan Hash Value=3210215320)
=================================================================================================================================================================================================================================
| Id    |            Operation            |         Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |       Activity Detail       | Progress | 
|       |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |         (# samples)         |          |
=================================================================================================================================================================================================================================
|  -> 0 | SELECT STATEMENT                |                         |         |       |       180 |   +142 |     1 |        0 |       |       |       |       |      |      |          |                             |          |
|  -> 1 |   SORT UNIQUE                   |                         |    1093 | 52574 |       180 |   +142 |     1 |        0 |       |       |   534 | 107MB |   2M | 113M |     0.94 | Cpu (3)                     |          |
|  -> 2 |    NESTED LOOPS                 |                         |    1093 | 52573 |       180 |   +142 |     1 |       3M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|  -> 3 |     NESTED LOOPS                |                         |    1118 | 52573 |       180 |   +142 |     1 |       3M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|  -> 4 |      HASH JOIN RIGHT SEMI       |                         |    1118 | 52238 |       189 |   +133 |     1 |       3M |       |       |       |       | 153M |      |     1.57 | Cpu (5)                     |          |
|     5 |       VIEW                      |                         |    157K | 31145 |         9 |   +134 |     1 |       2M |       |       |       |       |      |      |          |                             |          |
|     6 |        WINDOW SORT              |                         |    157K | 31145 |        57 |    +86 |     1 |       4M |  3777 | 199MB |   994 | 199MB |      |      |     3.14 | Cpu (5)                     |     100% |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read temp (5)   |          |
|     7 |         HASH JOIN               |                         |    157K | 29653 |        50 |    +85 |     1 |       4M |       |       |       |       |      |      |     1.26 | Cpu (4)                     |          |
|     8 |          VIEW                   |                         |   81771 | 23273 |         1 |    +86 |     1 |       1M |       |       |       |       |      |      |          |                             |          |
|     9 |           HASH UNIQUE           |                         |   81771 | 23273 |        75 |    +12 |     1 |       1M |       |       |       |       |      |      |     1.89 | Cpu (6)                     |          |
|    10 |            FILTER               |                         |         |       |        78 |     +9 |     1 |      11M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|    11 |             INDEX FULL SCAN     | PK_HOUSEHOLD_GDC        |   83917 | 22799 |        86 |     +1 |     1 |      11M |     9 | 73728 |       |       |      |      |    24.21 | Cpu (77)                    |          |
|    12 |          INDEX FULL SCAN        | PK_ADV_HOUSEHOLD_ACCT   |      8M |  6332 |        49 |    +86 |     1 |       8M |       |       |       |       |      |      |    12.58 | gc cr block 2-way (37)      |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | gc current block 2-way (3)  |          |
| -> 13 |       INDEX FULL SCAN           | PK_ADV_HOUSEHOLD_ACCT   |      8M |  6332 |       180 |   +142 |     1 |       7M |       |       |       |       |      |      |     0.63 | Cpu (2)                     |          |
| -> 14 |      INDEX RANGE SCAN           | IDX4_LPL_BETA_CUST_RLTN |       1 |     1 |       181 |   +141 |    3M |       3M | 75759 | 592MB |       |       |      |      |    23.27 | gc current grant 2-way (1)  |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | Cpu (21)                    |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (52)  |          |
| -> 15 |     TABLE ACCESS BY INDEX ROWID | IMPL_LPL_BETA_CUST_RLTN |       1 |     1 |       180 |   +142 |    3M |       3M |  177K |   1GB |       |       |      |      |    29.56 | Cpu (12)                    |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (81)  |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (1) |          |
=================================================================================================================================================================================================================================

November 1, 2018

Join Cardinality – 5

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:34 pm GMT Nov 1,2018

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

It’s now time to look at a join where the other end has a height-balanced histogram. Arguably it’s not sensible to spend time writing about this since you shouldn’t be creating them in 12c (depending, instead, on the hybrid histogram that goes with the auto_sample_size), and the arithmetic is different in 11g. However, there still seem to be plenty of people running 12c but not using the auto_sample_size and that means they could be generating some height-balanced histograms – so let’s generate some data and see what happens.


rem
rem     Script:         freq_hist_join_04a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4        Different results
rem

drop table t2 purge;
drop table t1 purge;

set linesize 156
set trimspool on
set pagesize 60

set feedback off

execute dbms_random.seed(0)

create table t1(
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))   j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

prompt  ==========================================================
prompt  Using estimate_percent => 100 to get height-balanced in t2
prompt  ==========================================================

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                estimate_percent => 100,
                method_opt       => 'for all columns size 1 for columns j2 size 20'
        );
end;
/

As in earlier examples I’ve created some empty tables, then inserted randomly generated data (after calling the dbms_random.seed(0) function to make the data reproducible). Then I’ve gathered stats, knowing that there will be 22 distinct values in t2 so forcing a height-balanced histogram of 20 buckets to appear.

When we try to calculate the join cardinality we’re going to need various details from the histogram information, such as bucket sizes, number of distinct values, and so on, so in the next few queries to display the histogram information I’ve captured a few values into SQL*Plus variables. Here’s the basic information about the histograms on the join columns t1.j1 and t2.j2:


column num_distinct new_value m_t2_distinct
column num_rows     new_value m_t2_rows
column num_buckets  new_value m_t2_buckets
column bucket_size  new_value m_t2_bucket_size

select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by
        table_name
;

select  table_name, num_rows, decode(table_name, 'T2', num_rows/&m_t2_buckets, null) bucket_size
from    user_tables
where   table_name in ('T1','T2')
order by
        table_name
;

column table_name format a3 heading "Tab"
break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


Tab                  COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   HEIGHT BALANCED           22          20 .052652266

Tab                    NUM_ROWS BUCKET_SIZE
-------------------- ---------- -----------
T1                          100
T2                          800          40

Tab      VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
--- ---------- ------------------- ---------------
T1           2                   5               5
             5                  15              20
             7                  15              35
            10                  17              52
            12                  13              65
            15                  13              78
            17                  11              89
            20                   7              96
            22                   3              99
            25                   1             100

T2           1                   0               0
            14                   1               1
            17                   1               2
            18                   1               3
            19                   1               4
            20                   1               5
            21                   2               7
            22                   1               8
            23                   1               9
            24                   2              11
            25                   2              13
            26                   3              16
            27                   2              18
            28                   2              20

As you can see, there is a frequency histogram on t1 reporting a cumulative total of 100 rows; and the histogram on t2 is a height-balanced histogram of 20 buckets, showing 21, 24, 25, 26, 27 and 28 as popular values with 2,2,2,2,3 and 2 endpoints (i.e. buckets) respectively. You’ll also note that the t2 histogram has 21 rows with row/bucket 0 showing us the minimum value in the column and letting us know that bucket 1 is not exclusively full of the value 14. (If 14 had been the minimum value for the column as well as an end point Oracle would not have created a bucket 0 – that may be a little detail that isn’t well-known – and will be the subject of a little follow-up blog note.)

Let’s modify the code to join the two sets of hisogram data on data value – using a full outer join so we don’t lose any data but restricting ourselves to values where the histograms overlap. We’re going to follow the idea we’ve developed in earlier postings and multiply frequencies together to derive a join frequency, so we’ll start with a simple full outer join and assume that when we find a real match value we should behave as if the height-balanced buckets (t2) where the bucket count is 2 or greater represent completely full buckets and are popular values..

I’ve also included in this query (because it had a convenient full outer join) a column selection that counts how many rows there are in t1 with values that fall inside the range of the t2 histogram but don’t match a popular value in t2.


column unmatch_ct   new_value m_unmatch_ct
column product format 999,999.99

break on report skip 1
compute sum of product on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, 
        f2.value t2_value, 
        f1.frequency t1_frequency, 
        f2.frequency t2_frequency, 
        sum(
                case
                        when f2.frequency > 1 and f1.frequency is not null
                                then 0
                                else f1.frequency
                end
        ) over()        unmatch_ct,
        f2.frequency * &m_t2_bucket_size *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
        end     product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
--      coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
;

  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY UNMATCH_CT     PRODUCT
---------- ---------- ------------ ------------ ---------- -----------
	 2			 5			99
	 5			15			99
	 7			15			99
	10			17			99
	12			13			99
		   14			      1 	99
	15			13			99
	17	   17		11	      1 	99
		   18			      1 	99
		   19			      1 	99
	20	   20		 7	      1 	99
		   21			      2 	99
	22	   22		 3	      1 	99
		   23			      1 	99
		   24			      2 	99
	25	   25		 1	      2 	99	 80.00
							   -----------
sum								 80.00


We captured the bucket size (&m_bucket_size) for the t2 histogram as 40 in the earlier SQL, and we can see now that in the overlap range (which I’ve hard coded as 2 – 25) we have three buckets that identify popular values, but only one of them corresponds to a value in the frequency histogram on t1, so the Product column shows a value of 1 * 2 * 40 = 80. Unfortunately this is a long way off the prediction that the optimizer is going to make for the simple join. (Eventually we’ll see it’s 1,893 so we have a lot more rows to estimate for).

Our code so far only acounts for items that are popular in both tables. Previous experience tells us that when a popular value exists only at one end of the join predicate we need to derive a contribution to the total prediction through an “average selectivity” calculated for the other end of the join predicate. For frequency histograms we’ve seen that “half the number of the least frequently occuring value” seems to be the appropriate frequency estimate, and if we add that in we’ll get two more contributions to the total from the values 21 and 24 which appear in the height-balanced (t2) histogram as popular but don’t appear in the frequency (t1) histogram. Since the lowest frequency in t1 is 1 this would give us two contributions of 0.5 * 2 (buckets) * 40 (bucket size) viz: two contributions of 40 bringing our total to 160 – still a serious shortfall from Oracle’s prediction. So we need to work out how Oracle generates an “average frequency” for the non-popular values of t2 and then apply it to the 99 rows in t1 that haven’t yet been accounted for in the output above.

To calculate the “average selectivity” of a non-popular row in t2 I need a few numbers (some of which I’ve already acquired above). The total number of rows in the table (NR), the number of distinct values (NDV), and the number of popular values (NPV), from which we can derive the the number of distinct non-popular values and the number of rows for the non-popular values. The model that Oracle uses to derive these numbers is simply to assume that a value is popular if its frequency in the histogram is greater than one and the number of rows for that value is “frequency * bucket size”.

The first query we ran against the t2 histogram showed 6 popular values, accounting for 13 buckets of 40 rows each. We reported 22 distinct values for the column and 800 rows for the table so the optimizer assumes the non-popular values account for (22 – 6) = 16 distinct values and (800 – 13 * 40) = 280 rows. So the selectivity of non-popular values is (280/800) * (1/16) = 0.021875. This needs to be multiplied by the 99 rows in t1 which don’t match a popular value in t2 – so we now need to write some SQL to derive that number.

We could enhance our earlier full outer join and slot 0.5, 99, and 0.021875 into it as “magic” constants. Rather than do that though I’m going to write a couple of messy queries to derive the values (and the low/high range we’re interested in) so that I will be able to tweak the data later on and see if the formula still produces the right answer.


column range_low    new_value m_low
column range_high   new_value m_high
column avg_t1_freq  new_value m_avg_t1_freq
column new_density  new_value m_avg_t2_dens

with f1 as (
        select  endpoint_value ep_val,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T1'
        and     column_name = 'J1'
),
f2 as (
        select  endpoint_value ep_val,
                endpoint_number ep_num,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T2'
        and     column_name = 'J2'
)
select
        max(min_v) range_low, min(max_v) range_high, min(min_f)/2 avg_t1_freq, max(new_density) new_density
from    (
        select  min(ep_val) min_v, max(ep_val) max_v, min(frequency) min_f, to_number(null) new_density
        from f1
        union all
        select  min(ep_val) min_v, max(ep_val) max_v, null           min_f,
                (max(ep_num) - sum(case when frequency > 1 then frequency end)) /
                (
                        max(ep_num) *
                        (&m_t2_distinct - count(case when frequency > 1 then 1 end))
                )       new_density
        from    f2
        )
;

 RANGE_LOW RANGE_HIGH AVG_T1_FREQ NEW_DENSITY
---------- ---------- ----------- -----------
         2         25          .5     .021875


This query finds the overlap by querying the two histograms and reporting the lower high value and higher low value. It also reports the minimum frequency from the frequency histogram and divides by 2, and calculates the number of non-popular rows divided by the total number of rows and the number of distinct non-popular values. (Note that I’ve picked up the number of distinct values in t2.j2 as a substituion variable generated by one of my earlier queries.) In my full script this messy piece of code runs before the query that showed I showed earlier on that told us how well (or badly) the two histograms matched.

Finally we can use the various values we’ve picked up in a slightly more complex version of the full outer join – with a special row added through a union all to give us our the estimate:


break on report skip 1
compute sum of product on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, f2.value t2_value,
        f1.frequency t1_frequency, f2.frequency t2_frequency,
        f2.frequency *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
        &m_t2_bucket_size        product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
union all
select
        null,
        &m_avg_t2_dens,
        &m_unmatch_ct,
        &m_t2_rows * &m_avg_t2_dens,
        &m_t2_rows * &m_avg_t2_dens * &m_unmatch_ct
from
        dual
;


  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY     PRODUCT
---------- ---------- ------------ ------------ -----------
         2                       5
         5                      15
         7                      15
        10                      17
        12                      13
                   14                         1
        15                      13
        17         17           11            1
                   18                         1
                   19                         1
        20         20            7            1
                   21                         2       40.00
        22         22            3            1
                   23                         1
                   24                         2       40.00
        25         25            1            2       80.00
              .021875           99         17.5    1,732.50
                                                -----------
sum                                                1,892.50


It remains only to check what the optimizer thinks the cardinality will be on a simple join, and then modify the data slightly to see if the string of queries continues to produce the right result. Here’s a starting test:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


 COUNT(*)
----------
      1327


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1893 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."J1"="T2"."J2")

The E-rows for the hash join operation reports 1893 – and a quick check of the 10053 trace file shows that this is 1892.500000 rounded – a perfect match for the result from my query. I’ve modified the data in various ways (notably updating the t1 table to change the value 25 (i.e. the current maximum value of j1) to other, lower, values) and the algorithm in the script seems to be sound – for 12c and 18c. I won’t be surprised, however, if someone comes up with a data pattern where the wrong estimate appears.

Don’t look back

Upgrades are a pain. With the same data set and same statistics on 11.2.0.4, running the same join query between t1 and t2, here’s the execution plan I got:


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Notice that the E-rows value is different. The join cardinality algorithm seems to have changed in the upgrade from 11.2.0.4 to 12c. I haven’t quite figured out how to get to the 11g result, but I seem to get quite close most of the time by making a simple change to the final query that I used to predict the optimizer’s estimate. In the case expression that chooses between the actual t1.j1 frequency and the “average frequency” don’t choose, just use the latter:


        case
                when f2.frequency > 1 and f1.frequency is not null
                        -- then f1.frequency    -- 12c
                        then &m_avg_t1_freq     -- 11g
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
 

As I modified the t1 row with the value 25 to hold other values this change kept producing results that were exactly 2, 2.5, or 3.0 different from the execution plan E-Rows – except in one case where the error was exactly 15.5 (which looks suspiciously like 17.5: the “average frequency in t2” minus 2). I’m not keen to spend time trying to work out exactly what’s going on but the takeaway from this change is that anyone upgrading from 11g to 12c may find that some of their queries change plans because they happen to match the type of example I’ve been working with in this post.

In some email I exchanged with Chinar Aliyev, he suggested three fix-controls that might be relevant. I’ve added these to an earlier posting I did when I first hit the anomaly a few days ago but I’ll repeat them here. I will be testing their effects at some point in the not too distant future:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)
Next Page »

Powered by WordPress.com.