Oracle Scratchpad

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm GMT Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

November 8, 2018

Where / Having

Filed under: CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:11 pm GMT Nov 8,2018

There’s a very old mantra about the use of the “having” clause that tells us that if it’s valid (i.e. will always give the same results) then any predicate that could be moved from the having clause to the where clause should be moved. In recent versions of Oracle the optimizer will do this for itself in some cases but (for reasons that I’m not going to mention) I came across a silly example recently where a little manual editing produced a massive performance improvement.

Here’s a quick demo:


rem
rem     Script:         where_having.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

reate table t1
as
select * 
from all_objects 
where rownum <= 50000   -- > comment to avoid WordPress format issue
;

spool where_having.lst

set serveroutput off

select /*+ gather_plan_statistics */ 
        object_type, count(*) 
from    t1 
group by 
        object_type 
having  count(*) > 0 
and     1 = 2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
;

The big question is: will Oracle do a full tablescan of t1, or will it apply a “null is not null” filter early to bypass that part of the plan. Here’s the plan pulled from memory, with run-time statistics (all versions from 11g to 18c):


--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|*  1 |  FILTER             |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|   2 |   HASH GROUP BY     |      |      1 |      1 |     27 |00:00:00.02 |     957 |    955 |  1186K|  1186K| 1397K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.01 |     957 |    955 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((COUNT(*)>0 AND 1=2))


As you can see, the filter at operation 1 includes the contradiction “1=2”, but Oracle tests this only after doing the full tablescan and aggregation. If you move the “1=2” into the where clause the tablescan doesn’t happen.

Interestingly, if you write the query with an in-line view and trailing where clause:


select /*+ gather_plan_statistics */
        *
from    (
        select
                object_type, count(*)
        from    t1
        group by
                object_type
        having  count(*) > 0
        )
where
        1 = 2
;

The optimizer is clever enough to push the final predicate inside the view (where you might expect it to become part of the having clause) and push it all the way down into a where clause on the base table.


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      0 |00:00:00.01 |
|*  1 |  FILTER              |      |      1 |        |      0 |00:00:00.01 |
|   2 |   HASH GROUP BY      |      |      1 |      1 |      0 |00:00:00.01 |
|*  3 |    FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   4 |     TABLE ACCESS FULL| T1   |      0 |  50000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)>0)
   3 - filter(NULL IS NOT NULL)



A quirky case of the optimizer handling the (apparently) more complex query than it does the simpler query.

November 1, 2018

Join Cardinality – 5

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:34 pm GMT Nov 1,2018

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

It’s now time to look at a join where the other end has a height-balanced histogram. Arguably it’s not sensible to spend time writing about this since you shouldn’t be creating them in 12c (depending, instead, on the hybrid histogram that goes with the auto_sample_size), and the arithmetic is different in 11g. However, there still seem to be plenty of people running 12c but not using the auto_sample_size and that means they could be generating some height-balanced histograms – so let’s generate some data and see what happens.


rem
rem     Script:         freq_hist_join_04a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4        Different results
rem

drop table t2 purge;
drop table t1 purge;

set linesize 156
set trimspool on
set pagesize 60

set feedback off

execute dbms_random.seed(0)

create table t1(
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))   j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

prompt  ==========================================================
prompt  Using estimate_percent => 100 to get height-balanced in t2
prompt  ==========================================================

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                estimate_percent => 100,
                method_opt       => 'for all columns size 1 for columns j2 size 20'
        );
end;
/

As in earlier examples I’ve created some empty tables, then inserted randomly generated data (after calling the dbms_random.seed(0) function to make the data reproducible). Then I’ve gathered stats, knowing that there will be 22 distinct values in t2 so forcing a height-balanced histogram of 20 buckets to appear.

When we try to calculate the join cardinality we’re going to need various details from the histogram information, such as bucket sizes, number of distinct values, and so on, so in the next few queries to display the histogram information I’ve captured a few values into SQL*Plus variables. Here’s the basic information about the histograms on the join columns t1.j1 and t2.j2:


column num_distinct new_value m_t2_distinct
column num_rows     new_value m_t2_rows
column num_buckets  new_value m_t2_buckets
column bucket_size  new_value m_t2_bucket_size

select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by
        table_name
;

select  table_name, num_rows, decode(table_name, 'T2', num_rows/&m_t2_buckets, null) bucket_size
from    user_tables
where   table_name in ('T1','T2')
order by
        table_name
;

column table_name format a3 heading "Tab"
break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


Tab                  COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   HEIGHT BALANCED           22          20 .052652266

Tab                    NUM_ROWS BUCKET_SIZE
-------------------- ---------- -----------
T1                          100
T2                          800          40

Tab      VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
--- ---------- ------------------- ---------------
T1           2                   5               5
             5                  15              20
             7                  15              35
            10                  17              52
            12                  13              65
            15                  13              78
            17                  11              89
            20                   7              96
            22                   3              99
            25                   1             100

T2           1                   0               0
            14                   1               1
            17                   1               2
            18                   1               3
            19                   1               4
            20                   1               5
            21                   2               7
            22                   1               8
            23                   1               9
            24                   2              11
            25                   2              13
            26                   3              16
            27                   2              18
            28                   2              20

As you can see, there is a frequency histogram on t1 reporting a cumulative total of 100 rows; and the histogram on t2 is a height-balanced histogram of 20 buckets, showing 21, 24, 25, 26, 27 and 28 as popular values with 2,2,2,2,3 and 2 endpoints (i.e. buckets) respectively. You’ll also note that the t2 histogram has 21 rows with row/bucket 0 showing us the minimum value in the column and letting us know that bucket 1 is not exclusively full of the value 14. (If 14 had been the minimum value for the column as well as an end point Oracle would not have created a bucket 0 – that may be a little detail that isn’t well-known – and will be the subject of a little follow-up blog note.)

Let’s modify the code to join the two sets of hisogram data on data value – using a full outer join so we don’t lose any data but restricting ourselves to values where the histograms overlap. We’re going to follow the idea we’ve developed in earlier postings and multiply frequencies together to derive a join frequency, so we’ll start with a simple full outer join and assume that when we find a real match value we should behave as if the height-balanced buckets (t2) where the bucket count is 2 or greater represent completely full buckets and are popular values..

I’ve also included in this query (because it had a convenient full outer join) a column selection that counts how many rows there are in t1 with values that fall inside the range of the t2 histogram but don’t match a popular value in t2.


column unmatch_ct   new_value m_unmatch_ct
column product format 999,999.99

break on report skip 1
compute sum of product on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, 
        f2.value t2_value, 
        f1.frequency t1_frequency, 
        f2.frequency t2_frequency, 
        sum(
                case
                        when f2.frequency > 1 and f1.frequency is not null
                                then 0
                                else f1.frequency
                end
        ) over()        unmatch_ct,
        f2.frequency * &m_t2_bucket_size *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
        end     product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
--      coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
;

  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY UNMATCH_CT     PRODUCT
---------- ---------- ------------ ------------ ---------- -----------
	 2			 5			99
	 5			15			99
	 7			15			99
	10			17			99
	12			13			99
		   14			      1 	99
	15			13			99
	17	   17		11	      1 	99
		   18			      1 	99
		   19			      1 	99
	20	   20		 7	      1 	99
		   21			      2 	99
	22	   22		 3	      1 	99
		   23			      1 	99
		   24			      2 	99
	25	   25		 1	      2 	99	 80.00
							   -----------
sum								 80.00


We captured the bucket size (&m_bucket_size) for the t2 histogram as 40 in the earlier SQL, and we can see now that in the overlap range (which I’ve hard coded as 2 – 25) we have three buckets that identify popular values, but only one of them corresponds to a value in the frequency histogram on t1, so the Product column shows a value of 1 * 2 * 40 = 80. Unfortunately this is a long way off the prediction that the optimizer is going to make for the simple join. (Eventually we’ll see it’s 1,893 so we have a lot more rows to estimate for).

Our code so far only acounts for items that are popular in both tables. Previous experience tells us that when a popular value exists only at one end of the join predicate we need to derive a contribution to the total prediction through an “average selectivity” calculated for the other end of the join predicate. For frequency histograms we’ve seen that “half the number of the least frequently occuring value” seems to be the appropriate frequency estimate, and if we add that in we’ll get two more contributions to the total from the values 21 and 24 which appear in the height-balanced (t2) histogram as popular but don’t appear in the frequency (t1) histogram. Since the lowest frequency in t1 is 1 this would give us two contributions of 0.5 * 2 (buckets) * 40 (bucket size) viz: two contributions of 40 bringing our total to 160 – still a serious shortfall from Oracle’s prediction. So we need to work out how Oracle generates an “average frequency” for the non-popular values of t2 and then apply it to the 99 rows in t1 that haven’t yet been accounted for in the output above.

To calculate the “average selectivity” of a non-popular row in t2 I need a few numbers (some of which I’ve already acquired above). The total number of rows in the table (NR), the number of distinct values (NDV), and the number of popular values (NPV), from which we can derive the the number of distinct non-popular values and the number of rows for the non-popular values. The model that Oracle uses to derive these numbers is simply to assume that a value is popular if its frequency in the histogram is greater than one and the number of rows for that value is “frequency * bucket size”.

The first query we ran against the t2 histogram showed 6 popular values, accounting for 13 buckets of 40 rows each. We reported 22 distinct values for the column and 800 rows for the table so the optimizer assumes the non-popular values account for (22 – 6) = 16 distinct values and (800 – 13 * 40) = 280 rows. So the selectivity of non-popular values is (280/800) * (1/16) = 0.021875. This needs to be multiplied by the 99 rows in t1 which don’t match a popular value in t2 – so we now need to write some SQL to derive that number.

We could enhance our earlier full outer join and slot 0.5, 99, and 0.021875 into it as “magic” constants. Rather than do that though I’m going to write a couple of messy queries to derive the values (and the low/high range we’re interested in) so that I will be able to tweak the data later on and see if the formula still produces the right answer.


column range_low    new_value m_low
column range_high   new_value m_high
column avg_t1_freq  new_value m_avg_t1_freq
column new_density  new_value m_avg_t2_dens

with f1 as (
        select  endpoint_value ep_val,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T1'
        and     column_name = 'J1'
),
f2 as (
        select  endpoint_value ep_val,
                endpoint_number ep_num,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T2'
        and     column_name = 'J2'
)
select
        max(min_v) range_low, min(max_v) range_high, min(min_f)/2 avg_t1_freq, max(new_density) new_density
from    (
        select  min(ep_val) min_v, max(ep_val) max_v, min(frequency) min_f, to_number(null) new_density
        from f1
        union all
        select  min(ep_val) min_v, max(ep_val) max_v, null           min_f,
                (max(ep_num) - sum(case when frequency > 1 then frequency end)) /
                (
                        max(ep_num) *
                        (&m_t2_distinct - count(case when frequency > 1 then 1 end))
                )       new_density
        from    f2
        )
;

 RANGE_LOW RANGE_HIGH AVG_T1_FREQ NEW_DENSITY
---------- ---------- ----------- -----------
         2         25          .5     .021875


This query finds the overlap by querying the two histograms and reporting the lower high value and higher low value. It also reports the minimum frequency from the frequency histogram and divides by 2, and calculates the number of non-popular rows divided by the total number of rows and the number of distinct non-popular values. (Note that I’ve picked up the number of distinct values in t2.j2 as a substituion variable generated by one of my earlier queries.) In my full script this messy piece of code runs before the query that showed I showed earlier on that told us how well (or badly) the two histograms matched.

Finally we can use the various values we’ve picked up in a slightly more complex version of the full outer join – with a special row added through a union all to give us our the estimate:


break on report skip 1
compute sum of product on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, f2.value t2_value,
        f1.frequency t1_frequency, f2.frequency t2_frequency,
        f2.frequency *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
        &m_t2_bucket_size        product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
union all
select
        null,
        &m_avg_t2_dens,
        &m_unmatch_ct,
        &m_t2_rows * &m_avg_t2_dens,
        &m_t2_rows * &m_avg_t2_dens * &m_unmatch_ct
from
        dual
;


  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY     PRODUCT
---------- ---------- ------------ ------------ -----------
         2                       5
         5                      15
         7                      15
        10                      17
        12                      13
                   14                         1
        15                      13
        17         17           11            1
                   18                         1
                   19                         1
        20         20            7            1
                   21                         2       40.00
        22         22            3            1
                   23                         1
                   24                         2       40.00
        25         25            1            2       80.00
              .021875           99         17.5    1,732.50
                                                -----------
sum                                                1,892.50


It remains only to check what the optimizer thinks the cardinality will be on a simple join, and then modify the data slightly to see if the string of queries continues to produce the right result. Here’s a starting test:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


 COUNT(*)
----------
      1327


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1893 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."J1"="T2"."J2")

The E-rows for the hash join operation reports 1893 – and a quick check of the 10053 trace file shows that this is 1892.500000 rounded – a perfect match for the result from my query. I’ve modified the data in various ways (notably updating the t1 table to change the value 25 (i.e. the current maximum value of j1) to other, lower, values) and the algorithm in the script seems to be sound – for 12c and 18c. I won’t be surprised, however, if someone comes up with a data pattern where the wrong estimate appears.

Don’t look back

Upgrades are a pain. With the same data set and same statistics on 11.2.0.4, running the same join query between t1 and t2, here’s the execution plan I got:


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Notice that the E-rows value is different. The join cardinality algorithm seems to have changed in the upgrade from 11.2.0.4 to 12c. I haven’t quite figured out how to get to the 11g result, but I seem to get quite close most of the time by making a simple change to the final query that I used to predict the optimizer’s estimate. In the case expression that chooses between the actual t1.j1 frequency and the “average frequency” don’t choose, just use the latter:


        case
                when f2.frequency > 1 and f1.frequency is not null
                        -- then f1.frequency    -- 12c
                        then &m_avg_t1_freq     -- 11g
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
 

As I modified the t1 row with the value 25 to hold other values this change kept producing results that were exactly 2, 2.5, or 3.0 different from the execution plan E-Rows – except in one case where the error was exactly 15.5 (which looks suspiciously like 17.5: the “average frequency in t2” minus 2). I’m not keen to spend time trying to work out exactly what’s going on but the takeaway from this change is that anyone upgrading from 11g to 12c may find that some of their queries change plans because they happen to match the type of example I’ve been working with in this post.

In some email I exchanged with Chinar Aliyev, he suggested three fix-controls that might be relevant. I’ve added these to an earlier posting I did when I first hit the anomaly a few days ago but I’ll repeat them here. I will be testing their effects at some point in the not too distant future:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)

October 25, 2018

Join Cardinality – 4

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 9:09 am GMT Oct 25,2018

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

I was nearly right but the arithmetic didn’t quite work out the way I expected.  Fortunately Chinar Aliyev’s document highlighted my error – the optimizer doesn’t use all the repeat counts, it uses only those repeat counts that identify popular values, and a popular value is one where the endpoint_repeat_count is not less than the average number of rows in a bucket. Let’s work through an example – first the data (which repeats an earlier article, but is included here for ease of reference):

rem
rem     Script:         freq_hist_join_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

set linesize 156
set pagesize 60
set trimspool on

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

As before I’ve got a table with 100 rows using the sqrt() function to generate column j1, and a table with 800 rows using the dbms_random.normal function to generate column j2. So the two columns have skewed patterns of data distribution, with a small number of low values and larger numbers of higher values – but the two patterns are different.

I’ve generated a histogram with 254 buckets (which dropped to 10) for the t1.j1 column, and generated a histogram with 13 buckets for the t2.j2 column as I knew (after a little trial and error) that this would give me a hybrid histogram.

Here’s a simple query, with its result set, to report the two histograms – using a full outer join to line up matching values and show the gaps where (endpoint) values in one histogram do not appear in the other:


define m_popular = 62

break on report skip 1

compute sum of product on report
compute sum of product_rp on report

compute sum of t1_count on report
compute sum of t2_count on report
compute sum of t2_repeats on report
compute sum of t2_pop_count on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null)
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value t1_value,
        f2.value t2_value,
        f1.row_or_bucket_count t1_count,
        f2.row_or_bucket_count t2_count,
        f1.endpoint_repeat_count t1_repeats,
        f2.endpoint_repeat_count t2_repeats,
        f2.pop_count t2_pop_count
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT T1_REPEATS T2_REPEATS T2_POP_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ------------
                    1                     1                     1
         2                     5                     0
         5                    15                     0
         7                    15                     0
        10                    17                     0
        12                    13                     0
        15         15         13         55          0         11
        17         17         11         56          0         34
                   19                    67                    36
        20         20          7         57          0         57
                   21                    44                    44
        22         22          3         45          0         45
                   23                    72                    72           72
                   24                    70                    70           70
        25         25          1         87          0         87           87
                   26                   109                   109          109
                   27                    96                    96           96
                   28                    41                    41
---------- ---------- ---------- ----------            ---------- ------------
                             100        800                   703          434

You’ll notice that there’s a substitution variable (m_popular) in this script that I use to identify the “popular values” in the hybrid histogram so that I can report them separately. I’ve set this value to 62 for this example because a quick check of user_tables and user_tab_cols tells me I have 800 rows in the table (user_tables.num_rows) and 13 buckets (user_tab_cols.num_buckets) in the histogram: 800/13 = 61.52. A value is popular only if its repeat count is 62 or more.

This is where you may hit a problem – I certainly did when I switched from testing 18c to testing 12c (which I just knew was going to work – but I tested anyway). Although my data has been engineered so that I get the same “random” data in both versions of Oracle, I got different hybrid histograms (hence my complaint in a recent post.) The rest of this covers 18c in detail, but if you’re running 12c there are a couple of defined values that you can change to get the right results in 12c.

At this point I need to “top and tail” the output because the arithmetic only applies where the histograms overlap, so I need to pick the range from 2 to 25. Then I need to inject a “representative” or “average” count/frequency in all the gaps, then cross-multiply. The average frequency for the frequency histogram is “half the frequency of the least frequently occurring value” (which seems to be identical to new_density * num_rows), and the representative frequency for the hybrid histogram is (“number of non-popular rows” / “number of non-popular values”). There are 800 rows in the table with 22 distinct values in the column, and the output above shows us that we have 5 popular values totally 434 rows, so the average frequency is (800 – 434) / (22 – 5) = 21.5294. (Alternatively we could say that the average selectivities (which is what I’ve used in the next query) are 0.5/100 and 21.5294/800.)

[Note for 12c, you’ll get 4 popular values covering 338 rows, so your figurese will be: (800 – 338) / (22 – 4) = 25.6666… and 0.0302833]

So here’s a query that restricts the output to the rows we want from the histograms, discards a couple of columns, and does the arithmetic:


define m_t2_sel = 0.0302833
define m_t2_sel = 0.0269118
define m_t1_sel = 0.005

break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null) pop_count
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,100 * &m_t1_sel) t1_count,
        nvl(f2.pop_count,          800 * &m_t2_sel) t2_count,
        case when (   f1.row_or_bucket_count is not null
                   or f2.pop_count is not null
        )    then
                nvl(f1.row_or_bucket_count,100 * &m_t1_sel) *
                nvl(f2.pop_count,          800 * &m_t2_sel)
        end      product_rp
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;


 F1_VALUE   F2_VALUE   T1_COUNT   T2_COUNT PRODUCT_RP
---------- ---------- ---------- ---------- ----------
         2                     5   21.52944   107.6472
         5                    15   21.52944   322.9416
         7                    15   21.52944   322.9416
        10                    17   21.52944  366.00048
        12                    13   21.52944  279.88272
        15         15         13   21.52944  279.88272
        17         17         11   21.52944  236.82384
                   19         .5   21.52944
        20         20          7   21.52944  150.70608
                   21         .5   21.52944
        22         22          3   21.52944   64.58832
                   23         .5         72         36
                   24         .5         70         35
        25         25          1         87         87
                      ---------- ---------- ----------
sum                          102  465.82384 2289.41456

There’s an important detail that I haven’t mentioned so far. In the output above you can see that some rows show “product_rp” as blank. While we cross multiply the frequencies from t1.j1 and t2.j2, filling in average frequencies where necessary, we exclude from the final result any rows where average frequencies have been used for both histograms.

[Note for 12c, you’ll get the result 2698.99736 for the query, and 2699 for the execution plan]

Of course we now have to check that the predicted cardinality for a simple join between these two tables really is 2,289. So let’s run a suitable query and see what the optimizer predicts:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

SQL_ID  cf4r52yj2hyd2, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     108 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     108 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   2289 |   1327 |00:00:00.01 |     108 |  2546K|  2546K| 1194K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |      18 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |      34 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

As you can see, the E-Rows for the join is 2,289, as required.

I can’t claim that the model I’ve produced is definitely what Oracle does, but it looks fairly promising. No doubt, though, there are some variations on the theme that I haven’t considered – even when sticking to a simple (non-partitioned) join on equality on a single column.

October 22, 2018

Column Groups

Filed under: Bugs,CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:36 pm GMT Oct 22,2018

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:


rem
rem     Script:         distinct_key_prob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > commment to avoid wordpress format issue
)
select
        cast(mod(rownum-1,10) as number(8,0))   non_null,
        cast(null as number(8,0))               null_col,
        cast(lpad(rownum,10) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > commment to avoid wordpress format issue
;

create index t1_i1 on t1(null_col, non_null);

begin

/*
        dbms_output.put_line(
                dbms_stats.create_extended_stats(user,'t1','(non_null, null_col)')
        );
*/

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

So I have a table with 1,000,000 rows; one of its columns is always null and another has a very small number of distinct values and is never null (though it hasn’t been declared as not null). I’ve created an index that starts with the “always null” column (in a production system we’d really be looking at a column that was “almost always” null and have a few special rows where the column was not null, so an index like this can make sense).

I’ve also got a few lines, commented out, to create extended stats on the column group (non_null, null_col) because any anomaly relating to the handling of the number of distinct keys in a multi-column index may also be relevant to column groups. I can run two variations of this code, one with the index, one without the index but with the column group, and see the same cardinality issue appearing in both cases.

So let’s execute a couple of queries – after setting up a couple of bind variables – and pull their execution plans from memory:


variable b_null    number
variable b_nonnull number

exec :b_null    := 5
exec :b_nonnull := 5

set serveroutput off

prompt  ===================
prompt  Query null_col only
prompt  ===================

select  count(small_vc)
from    t1
where
        null_col = :b_null
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

prompt  =========================
prompt  Query (null_col,non_null)
prompt  =========================

select  count(small_vc)
from    t1
where
        null_col = :b_null
and     non_null = :b_nonnull
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

The optimizer has statistics that tell it that null_col is always null so its estimate of rows where null_col = 5 should be zero (which will be rounded up to 1); and we have an index starting with null_col so we might expect the optimizer to use an index range scan on that index for these queries. Here are the plans that actually appeared:


SQL_ID  danj9r6rq3c7g, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |    24 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("NULL_COL"=:B_NULL)



SQL_ID  d8kbtq594bsp0, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null and non_null =
:b_nonnull

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  2189 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    27 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  2636K|  2189   (4)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("NULL_COL"=:B_NULL AND "NON_NULL"=:B_NONNULL))

Take a careful look at what we’ve got: the second query has to access exactly the same table rows as those identified by the first query and then apply a second predicate which may discard some of those rows – but the optimizer has changed the access path from a low-cost index-driven access to a high cost tablescan. This is clearly idiotic – there has to be a flaw in the optimizer logic in this situation.

The defect revolves around a slight inconsistency in the handling of columns groups – whether they are explicitly created, or simply inferred by reference to user_indexes.distinct_keys. The anomaly is most easily seen by explicitly creating the column group, gathering stats, and reporting from user_tab_cols.


select
        column_name, sample_size, num_distinct, num_nulls, density, histogram, data_default
from
        user_tab_cols
where
        table_name = upper('T1')
order by
        column_id

;

COLUMN_NAME			       Sample	  Distinct  NUM_NULLS	 DENSITY HISTOGRAM	 DATA_DEFAULT
-------------------------------- ------------ ------------ ---------- ---------- --------------- --------------------------------------------
NON_NULL			    1,000,000		10	    0	      .1 NONE
NULL_COL						 0    1000000	       0 NONE
SMALL_VC			    1,000,000	   995,008	    0 .000001005 NONE
PADDING 			    1,000,000		 1	    0	       1 NONE
SYS_STULC#01EE$DE1QB7UY1K4$PBI	    1,000,000		10	    0	      .1 NONE		 SYS_OP_COMBINED_HASH("NON_NULL","NULL_COL")

As you can see, the optimizer can note that “null_col” is always null so the arithmetic for “null_col = :bind1” is going to produce a very small cardinality estimate; on the other hand when the optimizer sees “null_col = :bind1 and non_null = :bind2” it’s going to transform this into the single predicate “SYS_STULC#01EE$DE1QB7UY1K4$PBI = sys_op_combined_hash(null_col, non_null)”, and the statistics say there are 10 distinct values for this (virtual) column with no nulls – hence the huge cardinality estimate and full tablescan.

The “slight inconsistency” in handling that I mentioned above is that if you used a predicate like “null_col is null and non_null = :bind2″ the optimizer would not use column group because of the “is null” condition – even though it’s exactly the case where the column group statistics would be appropriate. (In the example I’ve constructed the optimizer’s estimate from ignoring the column group would actually be correct – and identical to the estimate it would get from using the column group – because the column is null for every single row.)

tl;dr

Column groups can give you some very bad estimates, and counter-intuitive behaviour, if any of the columns in the group has a significant percentage of nulls; this happens because the column group makes the optimizer lose sight of the number of nulls in the underlying data set.

 

October 9, 2018

Join Cardinality – 3

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:01 pm GMT Oct 9,2018

In the previous posting I listed the order of precision of histograms as:

  • Frequency
  • Top-Frequency
  • Hybrid
  • Height-balanced
  • None

Having covered the Frequency/Frequency join (for a single column, no nulls, equijoin) in the previous posting I’ve decided to work down the list and address Frequency/Top-Frequency in this posting. It gets a little harder to generate data as we move to the less precise histograms since we need to have skew, we want some gaps, and (for Top-Frequency) we need to have some data that can be “ignored”. On the plus side, though, I want to work with a small number of buckets to keep the output of any queries I run fairly short so I’m going to stick with a small number of buckets, which means the “small” volume of “ignorable” data (the “spare” bucket) can be relative large. Here’s the code I used to generate data for my investigation – 100 rows for the table with a frequency histogram and 800 rows for the table with a top-frequency.


rem
rem     Script:         freq_hist_join_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 16'
        );
end;
/

In this example I’ve used the sqrt() function and the dbms_random.normal() function to generate the data. The scaling and truncating I’ve done on the results has given me two sets of data which have a nice skew, some gaps, but different patterns (though both have a small number of small values and a larger number of larger values). The data from dbms_random.normal() will produce 22 distinct values, so I’ve requested a histogram with 16 buckets and checked that this will produce a Top-Frequency histogram. (If I want a Hybrid histogram – for the next thrilling installment in the series – I’ll just reduce the number of buckets slightly).

Here are the resulting stats, preceded by the code that reported them:


select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by table_name
;

select  table_name, num_rows
from    user_tables
where   table_name in ('T1','T2')
order by table_name
;

break on table_name skip 1 on report skip 1

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   TOP-FREQUENCY             22          16    .000625

TABLE_NAME             NUM_ROWS
-------------------- ----------
T1                          100
T2                          800

TABLE_NAME                VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
-------------------- ---------- ------------------- ---------------
T1                            2                   5               5
                              5                  15              20
                              7                  15              35
                             10                  17              52
                             12                  13              65
                             15                  13              78
                             17                  11              89
                             20                   7              96
                             22                   3              99
                             25                   1             100

T2                            1                   1               1
                             13                  14              15
                             15                  11              26
                             16                  22              48
                             17                  34              82
                             18                  31             113
                             19                  36             149
                             20                  57             206
                             21                  44             250
                             22                  45             295
                             23                  72             367
                             24                  70             437
                             25                  87             524
                             26                 109             633
                             27                  96             729
                             28                  41             770

Table t1 reports 100 rows, 10 distinct values and a Frequency histogram with 10 buckets.
Table t2 reports 800 rows, 22 distinct values and a Top-Frequency histogram with 16 buckets.

Things we notice from the histograms are: t1 has a range from 2 to 25, while t2 has a range from 1 to 28. We also notice that the highest endpoint_number for t2 is only 770 out of a possible 800 – we’ve “lost” 30 rows. We don’t really care what they are for the purposes of the arithmetic, but if we did a quick “select j2, count(*)” query we’d see that we had lost the following:


SQL> select j2, count(*) from t2 group by j2 order by count(*), j2;

	J2   COUNT(*)
---------- ----------
	 1	    1
	 9	    1  *
	 8	    3  *
	11	    4  *
	10	    5  *
	12	    8  *
	14	    9  *
	15	   11
...

The reason why the total number of rows accounted for is less than the total number of rows in the table comes in two parts. The Top-Frequency histogram is designed to hold the Top N most popular entries in the table, so there will be some entries that don’t make an appearance in the histogram despite contributing rows to the total table count; the number of “lost” rows can then be increased because the Top N popular values may not include the column low and high values, and these two values must appear in the histogram. Looking at the output above we can see that we could have reported 14 as the 16th most popular value, instead we have to record 1, losing a further 9 rows and regaining 1.

Let’s test the pure join query on the two tables to see what the optimizer is predicting as the join cardinality, and then try to re-create that cardinality from the histogram data:


alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*) 
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1608 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1355K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Our target is to work out how we can query the histogram data in a way that gets the result 1,608. Ideally we’ll also think of a rationale for justifying our method, and then we’ll apply the same method with 15 buckets and 17 buckets, and with a couple of variations to the data (e.g. update all rows where j1 = 25 to set j1 = 28), to see if the method still gets the right result.

All we did with the frequency/frequency join was to join the two histograms on matching values, multiply the frequencies on each resulting row , then sum down the set, and this automatically eliminated rows which were outside the “highest low” and “lowest high” (i.e. we only examined rows where the histograms overlapped). We might hope that things shouldn’t be too different when one of the histograms is a top-frequency histogram.

There is an important difference, though, between frequency and top-frequency histograms – in the latter case there are values in the table which will not be in the histogram, so we ought to make some allowance for these (even though it’s only “one bucket’s worth”). It’s possible that some of these values might match values in the frequency histogram so we need to include a mechanism for adding in a factor to allow for them. So as a first step let’s work out the “average number of rows per value” for the missing values.

We have 22 distinct values and 16 end points so there are 6 missing values. We have 800 rows in the table but only 770 rows reported in the histogram so there are 30 missing rows. So let’s say the missing values have an average cardinality of 30/6 = 5 (and we might extend that to say they have an average selectivity of 5/800 = 0.00625).

Let’s bring that value into the query we wrote for the frequency/frequency case by using an outer join (which I’ll write as an “ANSI” Full Outer Join”) with a predicate in place that restricts the result to just the overlapping range, which is [2,25], the “higher low value” and “lower high value” across the two histograms. Here’s some code – with an odd little detail included:


column product format 999,999,999.99
compute sum of product on report

compute sum of t1_count on report
compute sum of t1_value on report
compute sum of t2_count on report
compute sum of t2_value on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,0.00) t1_count, 
        nvl(f2.row_or_bucket_count,800*0.00625) t2_count,
        nvl(f1.row_or_bucket_count,0.00) * 
        nvl(f2.row_or_bucket_count,800*0.006250) product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;

I’ve included an nvl() on the columns for the top-frequency histograms that convert nulls (i.e. the preserved rows derived from the frequency histogram) into the average frequency we’ve just calculated, using the “num_rows * selectivity” representation. The odd little detail that I commented on above does something similar for the preserved rows derived from the top-frequency histogram because this first guess at the calculation was wrong and needed an adjustment which I’m anticipating. Here are the results I got with this code:

  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13          0         14             .00
        15         15         13         11          143.00
                   16          0         22             .00
        17         17         11         34          374.00
                   18          0         31             .00
                   19          0         36             .00
        20         20          7         57          399.00
                   21          0         44             .00
        22         22          3         45          135.00
                   23          0         72             .00
                   24          0         70             .00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233        100        548        1,463.00

The figure is too low, so there has to be an adjustment. What if the code is allowing for the “maybe there are other values” algorithm that the optimizer uses with fequency histograms ? If you’ve gathered a frequency histogram on a column but query it with a value that isn’t in the histogram than Oracle applies an algorithm that looks like: “if you’re asking for something that isn’t in the histogram I’ll assume that there must be some data there and use a frequency that’s half the lowest frequency I have recorded”**Important footnote. The value 25 appears once in our histogram so let’s include a fudge-factor of 0.5 (i.e. half a row) in the nvl() expression for the t1 frequencies and see what happens. This is what the new results look like:


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13         .5         14            7.00
        15         15         13         11          143.00
                   16         .5         22           11.00
        17         17         11         34          374.00
                   18         .5         31           15.50
                   19         .5         36           18.00
        20         20          7         57          399.00
                   21         .5         44           22.00
        22         22          3         45          135.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233      103.5        548        1,607.50

Since we were looking for 1,608 I’m going to call that a success. I can check precision, of course, by looking at the 10053 trace file. Extracting a few critical lines:

egrep -e"Density" -e"Join Card" orcl12c_ora_6520_BASELINE.trc

    AvgLen: 3 NDV: 22 Nulls: 0 Density: 0.006250 Min: 1.000000 Max: 28.000000
    AvgLen: 3 NDV: 10 Nulls: 0 Density: 0.005000 Min: 2.000000 Max: 25.000000

Join Card:  1607.500000 = outer (100.000000) * inner (800.000000) * sel (0.020094)

The “Density” lines come from the column statistics – note the 0.00625 that matches the “average selectivity” I derived from the top-frequency figures. You might also note that the “half the least frequent value” could be derived from the t1.j1 density (0.005) * t1.num_rows (100).

The “Join Card” line is exactly what it says – the join cardinality calculation showing that the plan’s prediction of 1,608 rows was actually a rounded 1607.5

There is one more important thing to check before I start tweaking the data to see if there are any other factors involved. Is the 0.5 I stuck into the query really the value of “half the least common frequency” or is it a fixed value in all cases. A nice easy way of testing this is to update the t1 table to change one row from 22 to 25 (22 will still be present in the table and histogram before and after this test, so it’s a minimal and safe change). Making this change and re-running the calculation query leaving the 0.5 unchanged gives the following:


update t1 set j1 = 25 where j1 = 22 and rownum = 1;

...

                   21         .5         44           22.00
        22         22          2         45           90.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          2         87          174.00
                      ---------- ---------- ---------------
sum                        103.5        548        1,649.50

Without reporting all the details:

  • the estimate in the plan went up from 1,608 to 1,794
  • leaving 0.5 in the query the derived result was 1,649.5 (last few lines of output above)
  • changing the 0.5 to 1.0 the derived result was 1,794.0

Conclusion – the “fudge factor” is consistent with the model the optimizer uses with frequency histogram calculations. The optimizer models “missing” rows in the join calculation as “half the number of the least frequently occuring value**Important footnote

Filter Predicates:

After a dozen tests varying the number of buckets in the top-frequency histogram (and checking it really was still a top-frequency histogram), and tweaking the t1 (frequency histogram) data to use values on the boundaries of, or outside, the range of the t2 (top-frequency) data, I concluded that my approach was probably correct. Outer join the two histograms, restrict to the overlap, supply the “num_rows * density” figure on the top-frequency side, and “half the lowest frequency”**Important footnote on the frequency side, and the query produces the same result as the optimizer for the pure join cardinality.

So the next step is to check what happens when you add filter predicates on one, or both, sides. I listed a fragment of code earlier on to execute the pure join and count the number of rows it produced, enabling the 10053 trace and pulling the actual plan from memory at the same time. I repeated this code with 3 variations and checked the “Join Card” lines from the resulting trace files:


select count(*) from  t1, t2 where  t1.j1 = t2.j2
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2
select count(*) from  t1, t2 where  t1.j1 = t2.j2                and t2.n30 = 25
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2 and t2.n30 = 25

egrep -e"Join Card" orcl12c_ora_10447*.trc

orcl12c_ora_10447_BASELINE.trc:Join Card:  1607.500000 = outer (800.000000) * inner (100.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ1.trc:Join Card:  401.875000 = outer (800.000000) * inner (25.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ2.trc:Join Card:  53.583333 = outer (100.000000) * inner (26.666667) * sel (0.020094)
orcl12c_ora_10447_FILTJ1J2.trc:Join Card:  13.395833 = outer (26.666667) * inner (25.000000) * sel (0.020094)

As you can see in all 4 cases, Oracle reports an inner and outer cardinality estimate and a join selectivity. The join selectivity remains unchanged throughout; it’s the value we can derive from our pure join test (0.020094 = 1607.5 / (100 * 800)). All that changes is that the individual table predicates are applied to the base tables before the join selectivity is applied to the product of the filtered base table cardinalities:

  • Column n04 has 4 distinct values in 100 rows – filter cardinality = 100/4 = 25
  • Column n30 has 30 distinct values in 800 rows – filter cardinality = 800/30 = 26.66666…

Conclusion

For a single column equijoin on columns with no nulls where one column has a frequency histogram and the other has a top-frequency histogram the optimizer calculates the “pure” join cardinality using the overlapping range of column values and two approximating frequencies, then derives the filtered cardinality by applying the base table filters, calculates the cardinality of the cartesian join of the filtered data sets, then multiplies by the pure join selectivity.

 

 

**Important Footnote  Until Chinar Aliyev questioned what I had written, I had never noticed that the “half the lowest frequency” that I describe at various point in the arithmetic was anything other than a fixed fudge factor. In fact, in perfect symmetry with the expression used for the average selectivity in the top-frequency part of the calculcation, this “fudge factor” is simple “num_rows * column_density” for the column with the frequency histogram. (Whether the “half the lowest frequency” drops out as a side effect of the density calculation, or whether the column density is derived from half the lowest frequency is another matter.)

October 5, 2018

Join Cardinality – 2

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 3:37 pm GMT Oct 5,2018

In the previous note I posted about Join Cardinality I described a method for calculating the figure that the optimizer would give for the special case where you had a query that:

  • joined two tables
  • used a single-column to join on equality
  • had no nulls in the join columns
  • had a perfect frequency histogram on the columns at the two ends of the join
  • had no filter predicates associated with either table

The method simply said: “Match up rows from the two frequency histograms, multiply the corresponding frequencies” and I supplied a simple SQL statement that would read and report the two sets of histogram data, doing the arithmetic and reporting the final cardinality for you. In an update I also added an adjustment needed in 11g (or, you might say, removed in 12c) where gaps in the histograms were replaced by “ghost rows” with a frequency that was half the lowest frequency in the histogram.

This is a nice place to start as the idea is very simple, and it’s likely that extensions of the basic idea will be used in all the other cases we have to consider. There are 25 possibilities that could need separate testing – though only 16 of them ought to be relevant from 12c onwards. Oracle allows for four kinds of histograms – in order of how precisely they describe the data they are:

  • Frequency – with a perfect description of the data
  • Top-N (a.k.a. Top-Frequency) – which describes all but a tiny fraction (ca. one bucket’s worth) of data perfectly
  • Hybrid – which can (but doesn’t usually, by default) describe up to 2,048 popular values perfectly and gives an approximate distribution for the rest
  • Height-balanced – which can (but doesn’t usually, by default) describe at most 1,024 popular values with some scope for misinformation.

Finally, of course, we have the general case of no histogram, using only 4 numbers (low value, high value, number of “non-null” rows, number of distinct values) to give a rough picture of the data – and the need for histograms appears, of course, when the data doesn’t look anything like an even distribution of values between the low and high with close to “number of non-null rows”/“number of distinct values” for each value.

So there are 5 possible statistical descriptions for the data in a column – which means there are 5 * 5 = 25 possible options to consider when we join two columns, or 4 * 4 = 16 if we label height-balanced histograms as obsolete and ignore them (which would be a pity because Chinar has done some very nice work explaining them).

Of course, once we’ve worked out a single-column equijoin between two tables there are plenty more options to consider:  multi-column joins, joins involving range-based predicates, joins involving more than 2 tables, and queries which (as so often happens) have predicates which aren’t involved in the joins.

For the moment I’m going to stick to the simplest case – two tables, one column, equality – and comment on the effects of filter predicates. It seems to be very straightforward as I’ll demonstrate with a new model

rem
rem     Script:         freq_hist_join_03.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1(
        id      number(8,0),
        n0040   number(4,0),
        n0090   number(4,0),
        n0190   number(4,0),
        n0990   number(4,0),
        n1      number(4,0)
)
;

create table t2(
        id      number(8,0),
        n0050   number(4,0),
        n0110   number(4,0),
        n0230   number(4,0),
        n1150   number(4,0),
        n1      number(4,0)
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   40) + 1                   n0040,
        mod(rownum,   90) + 1                   n0090,
        mod(rownum,  190) + 1                   n0190,
        mod(rownum,  990) + 1                   n0990,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   50) + 1                   n0050,
        mod(rownum,  110) + 1                   n0110,
        mod(rownum,  230) + 1                   n0230,
        mod(rownum, 1150) + 1                   n1150,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
end;
/

You’ll notice that in this script I’ve created empty tables and then populated them. This is because of an anomaly that appeared in 18.3 when I used “create as select”, and should allow the results from 18.3 be an exact match for 12c. You don’t need to pay much attention to all the Nxxx columns, they were there so I could experiment with a few variations in the selectivity of filter predicates.

Given the purpose of the demonstration I’ve gathered histograms on the column I’m going to use to join the tables (called n1 in this case), and here are the summary results:


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS
-------------------- -------------------- --------------- ------------ -----------
T1                   N1                   FREQUENCY                119         119
T2                   N1                   FREQUENCY                124         124

     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2488       2619    6,516,072
         1       2693       2599    6,999,107
         2       2635       2685    7,074,975
         3       2636       2654    6,995,944
...
       113          1          3            3
       115          1          2            2
       116          4          3           12
       117          1          1            1
       120          1          2            2
                                 ------------
sum                               188,114,543

We’ve got frequencyy histograms, and we can see that they don’t have a perfect overlap. I haven’t printed every single line from the cardinality query, just enough to show you the extreme skew, a few gaps (114, 118, 119), and the total. So here are three queries with execution plans:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));


select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
and     t2.n1150 = 25
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I’ve queried the pure join – the count was exactly the 188,114,543 predicted by the cardinality query, of course – then I’ve applied a filter to one table, then to both tables. The first filter n0990 = 20 will (given the mod(,990)) definition identify one row in 990 from the original 100,000 in t1; the second filter n1150 = 25 will identify one row in 1150 from t2. That’s filtering down to 101 rows and 87 rows respectively from the two tables. So what do we see in the plans:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:23.47 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:23.47 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    188M|    188M|00:00:23.36 |     748 |  6556K|  3619K| 8839K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.02 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.02 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    190K|    200K|00:00:00.02 |     748 |  2715K|  2715K| 1647K (0)|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T1"."N0990"=20)



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    165 |    165 |00:00:00.01 |     748 |  2715K|  2715K| 1678K (0)|
|*  3 |    TABLE ACCESS FULL| T2   |      1 |     87 |     87 |00:00:00.01 |     374 |       |       |          |
|*  4 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T2"."N1150"=25)
   4 - filter("T1"."N0990"=20)


The first execution plan shows an estimate of 188M rows – but we’ll have to check the trace file to confirm whether that’s only an approximate match to our calculation, or whether it’s an exact match. So here’s the relevant pair of lines:


Join Card:  188114543.000000 = outer (100000.000000) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 188114543 Computed: 188114543.000000

Yes, the cardinality calculation and the execution plan estimates match perfectly. But there are a couple of interesting things to note. First, Oracle seems to be deriving the cardinality by multiplying the individual cardinalities of the two tables with a figure it calls “sel” – the thing that Chinar Aliyev has labelled Jsel the “Join Selectivity”. Secondly, Oracle can’t do arithmetic (or, removing tongue from cheek) the value it’s reported for the join selectivity is reported at only 6 decimal places, but stored to far more. What is the Join Selectivity, though ? It’s the figure we derive from our cardinality query divided by the cardinality of the cartesian join of the two tables – i.e. 188,114,543 / (100,000 * 100,000).

With the clue from the first trace file, can we work out why the second and third plans show 190K and 165 rows respectively. How about this – multiply the filtered cardinalities of the two separate tables, then multiply the result by the join selectivity:

  • 1a)   n0990 = 20: gives us 1 row in every 990.    100,000 / 990 = 101.010101…    (echoing the rounded execution plan estimate).
  • 1b)   100,000 * (100,000/990) * 0.0188114543 = 190,014.69898989…    (which is in the ballpark of the plan and needs confirmation from the trace file).

 

  • 2a)   n1150 = 25: gives us 1 row in every 1,150.    100,000 / 1,150 = 86.9565217…    (echoing the rounded execution plan estimate)
  • 2b)   (100,000/990) * (100,000/1,150) * 0.0188114543 = 165.2301651..    (echoing the rounded execution plan estimate).

Cross-checking against extracts from the 10053 trace files:


Join Card:  190014.689899 = outer (101.010101) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 190015 Computed: 190014.689899

Join Card:  165.230165 = outer (86.956522) * inner (101.010101) * sel (0.018811)
Join Card - Rounded: 165 Computed: 165.230165

Conclusion.

Remembering that we’re still looking at very simple examples with perfect frequency histograms: it looks as if we can work out a “Join Selectivity” (Jsel) – the selectivity of a “pure” unfiltered join of the two tables – by querying the histogram data then use the resulting value to calculate cardinalities for simple two-table equi-joins by multiplying together the individual (filtered) table cardinality estimates and scaling by the Join Selectivity.

Acknowledgements

Most of this work is based on a document written by Chinar Aliyev in 2016 and presented at the Hotsos Symposium the same year. I am most grateful to him for responding to a recent post of mine and getting me interested in spending some time to get re-acquainted with the topic. His original document is a 35 page pdf file, so there’s plenty more material to work through, experiment with, and write about.

 

October 3, 2018

Join Cardinality

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 12:01 pm GMT Oct 3,2018

Following up my “Hacking for Skew” article from a couple of days ago, Chinar Aliyev has written an article about a method for persuading the optimizer to calculate the correct cardinality estimate without using any undocumented, or otherwise dubious, mechanisms. His method essentially relies on the optimizer’s mechanism for estimating join cardinality when there are histograms at both ends of the join so I thought I’d write a short note describing the simplest possible example of the calculation – an example where the query is a single column equi-join with no nulls in either column and a perfect frequency histograms at both ends of the join.  (For a detailed description of more general cases I always refer to the work done by Alberto Dell’Era a few years ago). We start with two data sets that exhibit a strong skew in their data distributions:

rem
rem     Script:         freq_hist_join_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 254'
        );
end;
/


I’ve generated two tables of 10,000 randomly generated values using the dbms_random.normal() function, but I’ve scaled the value up by a factor of three and taken the absolute value – which has given me a range of 12 distinct integer values with a nicely skewed distribution. Then I’ve gathered stats requesting histograms of up to 254 buckets. Since I’ve tested this only on versions from 11.2.0.4 onwards this means I’ll get a perfect histogram on the n1 columns on both tables.

Now I’m going run a query that reports the values and frequencies from the two tables by querying user_tab_histograms using a variant of an analytic query I published a long time ago to convert the cumulative frequencies recorded as the endpoint values into simple frequencies. If, for some reason, this query doesn’t run very efficiently in your tests you could always /*+ materialize */ the two factored subqueries (CTEs – common table expressions):


prompt  =======================================================================
prompt  Multiply and sum matching frequencies. An outer join is NOT needed
prompt  because rows that don't match won't contributed to the join cardinality
prompt  =======================================================================

break on report skip 1
compute sum of product on report
column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         4        856        845      723,320
         5        513        513      263,169
         6        294        249       73,206
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                18,746,698

As you can see, the two columns do have a highly skewed data distribution. The pattern of the two data sets is similar though the frequencies aren’t identical, of course. The total I get from this calculation is (I claim) the cardinality (rows) estimate that the optimizer will produce for doing an equi-join on these two tables – so let’s see the test:


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the resulting output:

Session altered.
Session altered.


  COUNT(*)
----------
  18746698


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:03.23 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:03.23 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     18M|     18M|00:00:02.96 |      40 |  2616K|  2616K| 2098K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

As we can see, the estimate for the hash join is “18M” which is in the right ballpark but, in its current format, isn’t entirely helpful which is why I’ve enabled the 10053 trace to get an exact figure from the trace file, and this is what we see:


***********************
Best so far:  Table#: 0  cost: 4.352468  card: 9487.000000  bytes: 28461.000000
              Table#: 1  cost: 378.482370  card: 18467968.000000  bytes: 110807808.000000
***********************

The optimizer’s estimate is exactly the sum of the products of the frequencies of matching values from the (frequency) histogram data. There is a simple rationale for this – it gets the right answer.

For each row in t1 with value ‘X’ the (frequency) histogram on t2 tells Oracle how many rows will appear in the join, so multiplying the frequency of ‘X’ in t1 by the frequency of ‘X’ in t2 tells Oracle how many rows the ‘X’s will contribute to the join. Repeat for every distinct value that appears in both (frequency) histograms and sum the results.

As a refinement on this (very simple) example, let’s delete data from the two tables so that we have rows in t1 that won’t join to anything in t2, and vice versa – then re-gather stats, query the histograms, and check the new prediction. We want to check whether a value that appears in the t1 histogram contributes to the join cardinality estimate even if there are no matching values in the t2 histogram (and vice versa):


delete from t1 where n1 = 4;
delete from t2 where n1 = 6;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 254', no_invalidate=>false)
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 254', no_invalidate=>false)

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the output – with a little cosmetic tidying:


856 rows deleted.
249 rows deleted.

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         5        513        513      263,169
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                17,950,172


Session altered.
Session altered.


  COUNT(*)
----------
  17950172


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:02.89 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:02.89 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     17M|     17M|00:00:02.61 |      40 |  2616K|  2616K| 2134K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |   9144 |   9144 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   9751 |   9751 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")


From the 10053 trace file:
***********************
Best so far:  Table#: 0  cost: 4.340806  card: 9144.000000  bytes: 27432.000000
              Table#: 1  cost: 368.100010  card: 17950172.000000  bytes: 107701032.000000
***********************

You can see from the frequency histogram report that we “lost” values 4 and 6 from the report; then the total from the report matches the actual number of rows returned by the query, and the cardinality estimate in the plan is again in the right ballpark – with the trace file showing an exact match.

I’ve run this test on 11.2.0.4,  12.1.0.2,  12.2.0.1 and  18.3.0.0 (which generated a different set of random values) – and there’s an anomaly that appears in 11.2.0.4 (though maybe that should be “disappeared from”): the optimizer’s estimate for the cardinality was a little larger than the value generated in the query against user_tab_histograms. [Edit: Now explained (probably), see below]

Conclusion:

For an incredibly simple class of queries with perfect frequency histograms there’s a very simple way to calculate the cardinality estimate that the optimizer will predict. Match up rows from the two frequency histograms, multiply the corresponding frequencies (making sure you don’t multiply the cumulative frequencies) and sum.

This is, of course, only a tiny step in the direction of seeing how Oracle uses histograms and covers only a type of query that is probably too simple to appear in a production system, but it’s a basis on which I may build in future notes over the next few weeks.

Update (5th Oct)

The “error” in the 11g calculation irritated me a little, and I woke up this morning with an idea about the solution. In 10.2.0.4 Oracle changed the way the optimizer calculated for a predicate that used a value that did not appear in the frequency histogram: it did the arithmetic for  “half the least frequently occurring value”. So I thought I’d run up a test where for my “sum of products” query I emulated this model. I had to change my query to an “ANSI”-style full outer join, and here it is:

with f1 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'N1'
),
f2 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'N1'
)
select
        f1.value, f2.value,
        nvl(f1.frequency, 0)                t1_frequency,
        nvl(f2.frequency, 0)                t2_frequency,
        nvl(f1.frequency, &t1_least / 2) *
        nvl(f2.frequency, &t2_least / 2)    product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;

Running this code, and noting that the lowest frequency in t1 was 4, while the lowest frequency in t2 was 2, I got the following results (with the 10053 trace file summary following the output)


     VALUE      VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ---------- ------------ ------------ ------------
         0          0         2658         2532    6,730,056
         1          1         2341         2428    5,683,948
         2          2         1828         1968    3,597,504
         3          3         1305         1270    1,657,350
                    4            0          845        1,690
         5          5          513          513      263,169
         6                     294            0          294
         7          7          133          117       15,561
         8          8           40           54        2,160
         9          9           23           17          391
        10         10            5            5           25
        11         11            4            2            8
                      ------------ ------------ ------------
sum                           9144         9751   17,952,156


Join Card:  17952157.000000 = outer (9751.000000) * inner (9144.000000) * sel (0.201341)
Join Card - Rounded: 17952157 Computed: 17952157.00
 

That’s a pretty good match to the trace file result – and the difference of 1 may simply be a rounding error (despite the trace files text suggesting it is accurate to 6 d.p.). Maybe one day I’ll wake up with an inspired guess about that difference – but since it’s relevant only to 11g I’m not going to worry about it anymore.

Footnote

Following an exchange of email with Chinar Aliyev, it’s fairly clear that the “half the least frequency” can actually be derived as “table.num_rows * column.density”.

 

September 27, 2018

Column Group Catalog

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:16 pm GMT Sep 27,2018

I seem to have written a number of aricles about column groups – the rather special, and most useful, variant on extended stats. To make it as easy as possible to find the right article I’ve decided to produce a little catalogue (catalog) of all the relevant articles, with a little note about the topic each article covers. Some of the articles will link to others in the list, and there are a few items in the list from other blogs. There are also a few items which are the titles of drafts which have been hanging around for the last few years.

 

July 31, 2018

Extended Histograms

Filed under: CBO,extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 11:05 pm GMT Jul 31,2018

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

  • Create simple stats (no histograms) on c2 and c3.
  • Create frequency histograms on c2 and c3
  • Create a column group (extended stats) on (c2,c3) but no histograms
  • Create a column group (extended stats) on (c2,c3) with a histogram on (c2, c3)

If you do these tests you’ll find the estimated cardinalities are (from 12.1.0.2):

  • 18,369 – derived as 73,476 / 4  … total rows over total possible combinations
  • 1,940   – derived as 73,476 * (1,994/73,476) * (71,482/73,476) … total rows * fraction where c2 = ‘N’ * fraction where c3 = ‘N’
  • 36,738 – derived as 73,476 / 2 … total rows / number of distinct combinations of (c2, c3)
  • 997      – derived as 1,994 / 2 … half the frequency of the least frequently occurring value in the histogram

The last algorithm appeared in 10.2.0.4; prior to that a “value not in frequency histogram” would have been given an estimated cardinality of 1 (which is what the person on Oracle-L wanted to see).

In fact the optimizer’s behaviour can be reverted to the 10.2.0.3 mechanism by setting fix-control 5483301 to zero (or off), either with an “alter session” call or inside the /*+ opt_param() */ hint. There is, however, another option – if you get the column stats, then immediately set them (dbms_stats.get_column_stats(), dbms_stats.set_column_stats()) the optimizer defines the stats as “user defined” and (for reasons I don’t know – perhaps it’s an oversight) reverts to the 10.2.0.3 behaviour. Here’s some code to demonstrate the point; as the srcipt header says, I’ve tested it on versions up to 18.1


rem
rem     Script:         histogram_hack_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             18.1.0.0        via LiveSQL (with some edits)
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to avoid format issue
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 10 for columns (c2,c3) size 10');

column column_name format a128 new_value m_colname

select  column_name
from    user_tab_cols
where   table_name = 'T1'
and     column_name not in ('C2','C3')
;

set autotrace traceonly explain
select /* pre-hack */ * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

declare
        l_distcnt number default null;
        l_density number default null;
        l_nullcnt number default null;
        l_srec    dbms_stats.statrec;
        l_avgclen number default null;

begin

        dbms_stats.get_column_stats (
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

        dbms_stats.set_column_stats(
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

end;
/

set autotrace traceonly explain
select /* post-hack */  * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

I’ve created a simple table for the data and collected stats including histograms on the two columns and on the column group. I’ve taken a simple strategy to find the name of the column group (I could have used the function dbms_stats.create_extended_stats() to set an SQL variable to the name of the column group, of course), and then run a little bit of PL/SQL that literally does nothing more than copy the column group’s stats into memory then write them back to the data dictionary.

Here are the “before” and “after” execution plans that we get from autotrace:

BEFORE
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   997 |  3988 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   997 |  3988 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')


AFTER
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     4 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     4 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')

As required – the estimate for the (‘N’,’N’) rows drops down to (the optimizer’s best approximation to ) zero.

Footnote:

An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past  but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well.

You might wonder why the optimizer is programmed to use “half the least popular” for predicates references values not in the index. Prior to 12c it’s easy to make an argument for the algorithm. Frequency histograms used to be sampled with a very small sample size, so if you were unlucky a “slightly less popular” value could be missed completely in the sample; if you were requesting a value that didn’t appear in the histogram then (presumably) you knew it should exist in the data, so guessing a cardinality somewhat less than the least popular must have seemed like a good idea.

In 12c, of course, you ought to be taking advantage of the “approximate NDV” implementation for using a 100% sample to generate frequency (and Top-N / Top-Frequency histograms). If you’ve got a 12c frequency histogram then the absence of a value in the histogram means the data really wasn’t there so a cardinality estimate of 1 makes more sense. (Of course, you might have allowed Oracle to gather the histogram at the wrong time – but that’s a different issue). If you’ve got a Top-N histogram then the optimizer will behave as if a “missing” value is one of those nominally allowed for in the “low frequency” bucket and use neither the 1 nor the “half the least popular”.

So, for 12c and columns with frequency histograms it seems perfectly reasonably to set the fix control to zero – after getting approval from Oracle support, of course.

 

July 13, 2018

pushing predicates

Filed under: CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:05 pm GMT Jul 13,2018

I came across this odd limitation (maybe defect) with pushing predicates (join predicate push down) a few years ago that made a dramatic difference to a client query when fixed but managed to hide itself rather cunningly until you looked closely at what was going on. Searching my library for something completely different I’ve just rediscovered the model I built to demonstrate the issue so I’ve tested it against a couple of newer versions  of Oracle (including 18.1) and found that the anomaly still exists. It’s an interesting little detail about checking execution plans properly so I’ve written up the details. The critical feature of the problem is a union all view:


rem
rem	Script:		push_pred_limitation.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2015
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2
rem		11.2.0.4
rem

create table t1
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t2
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t3
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1 for columns owner size 254'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T3',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t2_id on t2(object_id);
-- create index t2_id_ot on t2(object_id, object_type);

create index t3_name_type on t3(object_name, object_type);

create or replace view v1
as
select 
	/*+ qb_name(part1) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.created	date_2,
	t3.created	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
union all
select
	/*+ qb_name(part2) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.last_ddl_time	date_2,
	t3.last_ddl_time	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
;

Two points to note so far: first, the view is basically joining the same two tables in the same way twice but selecting different columns. It’s a close model of what the client was doing but so much simpler that it wouldn’t be hard to find a different way of getting the same result: the client’s version would have been much far harder to rewrite. Secondly, I’ve listed two possible indexes for table t2 but commented one of them out. The indexing will make a difference that I’ll describe later.

So here’s the query with execution plan (from explain plan – but pulling the plan from memory gives the same result):


select
	/*+ qb_name(main) */
	t1.object_name, t1.object_type,
	v1.object_id, v1.date_2, v1.date_3
from
	t1,
	v1
where
	v1.object_id = t1.object_id
and	v1.object_type_2 = t1.object_type
and	v1.object_type_3 = t1.object_type
and	t1.owner = 'OUTLN'
;

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   588 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   588 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|*  3 |   VIEW                                   | V1           |     1 |    44 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   3 - filter("V1"."OBJECT_TYPE_2"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

The execution plan appears to be fine – we can see at operation 4 that the union all view has been access with the pushed predicate option and that the subsequent sub-plan has
used index driven nested loop joins in both branches – until we look a little more closely and examine the Predicate section of the plan. What, exactly, has been pushed ?

Look at the predicate for operation 3: “V1″.”OBJECT_TYPE_2″=”T1″.”OBJECT_TYPE”. It’s a join predicate that hasn’t been pushed into the view. On the other hand the original, and similar, join predicate v1.object_type_3 = t1.object_type has been pushed into the view, appearing at operations 9 and 15. There is a difference, of course, the object_type_3 column appears as the second column of the index on table t3.

Two questions then: (a) will the object_type_2 predicate be pushed if we add it to the relevant index on table t2, (b) is there a way to get the predicate pushed without adding it to the index. The answer to both questions is yes. First the index – re-run the test but create the alternative index on t2 and the plan changes to:

Plan hash value: 497545587

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

Notice how the predicate at operation 3 has disappeared, and the access predicate at operation 8 now includes the predicate “T2″.”OBJECT_TYPE”=”T1″.”OBJECT_TYPE”.

Alternatively, don’t mess about with the indexes – just tell Oracle to push the predicate. Normally I would just try /*+ push_pred(v1) */ as the hint to do this, but the Outline section of the original execution plan already included a push_pred() hint that looked like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 1), so I first copied exactly that into the SQL to see if it would make any difference. It did – I got the following plan (and the hint in the outline changed to PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2 1) so this may be a case where the plan produced by a baseline will perform better than the plan that the produced the baseline!):

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='TEST_USER')
   7 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  13 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

In this case we see that the critical late-joining predicate has disappeared from operation 3 and re-appeared as a filter predicate at operation 7 In many cases you may find that the change in predicate use makes little difference to the performance – in my example the variation in run time over several executions of each query was larger than the average run time of the query; nevertheless it’s worth noting that the delayed use of the predicate could have increased the number of probes into table t3 for both branches of the union all and resulted in redundant data passing up through several layers of the call stack before being eliminated … and “eliminate early” is one of the major commandments of optimisation.

You might notice that the Plan Hash Value for the hinted execution plan is the same as for the original execution plan: the hashing algorithm doesn’t take the predicates into account (just one of many points that Randolf Geist raised in a blog post several years ago). This is one of the little details that makes it easy to miss the little changes in a plan that can make a big difference in performance.

Summary

If you have SQL that joins simple tables to set based (union all, etc.) views and you see the pushed predicate option appearing take a little time to examine the predicate section of the execution plan to see if the optimizer is pushing all the join predicates that it should and, if it isn’t, test the effects of pushing more predicates.

In many cases adding the hint /*+ push_pred(your_view_name) */ at the top of the query may be sufficient to get the predicate pushing you need, but you may need to look at the outline section of the execution plan and add a series of more complicated push_pred() and no_push_pred() hints because the push_pred hint has evolved over time to deal with increasingly complicated transformations.

 

July 2, 2018

Clustering_Factor

Filed under: CBO,Indexing,Oracle,Statistics — Jonathan Lewis @ 1:24 pm GMT Jul 2,2018

Here’s another little note on the clustering_factor for an index and the table preference table_cached_blocks that can be set with a call to dbms_stats.set_table_prefs(). I might be repeating a point that someone made in a comment on an older posting but if that’s the case I can’t find the comment at present, and it’s worth its own posting anyway.

The call to dbms_stats.set_table_prefs(null,'{tablename}’,’table_cached_blocks’,N) – where N can be any integer between 1 and 255, will modify Oracle’s algorithm for calculating the clustering_factor of an index. The default is 1, which often means the clustering_factor is much higher than it ought to be from a humanly visible perspective and leads to Oracle not using an index that could be a very effective index.

The big point is this: the preference has no effect when you execute a “create index” statement, or an “alter index rebuild” statement. Here’s a simple script to demonstrate the point.


rem
rem     Script:         table_cached_blocks_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

drop table t1 purge;
create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,100)               n1,
        mod(rownum-1,100)               n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator
;

column blocks new_value m_blocks

select  blocks 
from    user_tables
where   table_name = 'T1'
;

column preference_value format a40

select  preference_name, preference_value
from    user_tab_stat_prefs
where
        table_name = 'T1'
;

I’ve created a very simple table of 10,000 rows with two identical columns and captured the number of blocks (which I know will be less than 256) in a substitution variable which I’m going to use in a call to set_table_prefs(). I’ve also run a quick check to show there are no table preferences set for the table. I’ll be running the same check again after setting the table_cached_blocks preference. Step 1 – create two indexes, but set the preference after building the first one; I’ve shown the result of the query against user_indexes immediately after the query:


create index t1_i1 on t1(n1);

execute dbms_stats.set_table_prefs(null,'t1','table_cached_blocks',&m_blocks)

create index t1_i2 on t1(n2);

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				 10000 26-jun-2018 14:13:51


Now we check the effect of rebuilding the t1_i2 index – the one second sleep is so that we can use the last_analyzed time to see that new stats have been created for the index:


execute dbms_lock.sleep(1)
alter index t1_i2 rebuild /* online */ ;

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				 10000 26-jun-2018 14:13:52


Finally we do an explicit gather_index_stats():


execute dbms_lock.sleep(1)
execute dbms_stats.gather_index_stats(null,'t1_i2')

select
        index_name, clustering_factor, to_char(last_analyzed, 'dd-mon-yyyy hh24:mi:ss') analyzed
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME	     CLUSTERING_FACTOR ANALYZED
-------------------- ----------------- -----------------------------
T1_I1				 10000 26-jun-2018 14:13:51
T1_I2				   179 26-jun-2018 14:13:53

At last – on the explicit call to gather stats – the table_cached_blocks preference is used.

Dire Threat

Think about what this means: you’ve carefully worked out that a couple of indexes really need a special setting of table_cached_blocks and you gathered stats on those indexes so you have a suitable value for the clustering_factor. Then, one night, someone decides that they’re going to rebuild some of those indexes. The following morning the clustering_factor is much higher and a number of critical execution plans change as a consequence, and don’t revert until the index statistics (which are perfectly up to date!) are re-gathered.

Footnote

The same phenomenon appears even when you’ve set the global preference for stats collection with dbms_stats.set_global_prefs().

June 23, 2018

Cursor_sharing force

Filed under: CBO,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 12:05 pm GMT Jun 23,2018

Prompted by a recent ODC (OTN) question I’ve just written up an example of one case where setting the cursor_sharing parameter to force doesn’t work as you might expect. It’s a specific example of what I believe is a theme that can appear in several different circumstances: if your SQL mixes “genuine” bind variable with literals then the literals may not be substituted.

Here’s a simple data set to start with:


rem
rem     Script:         cursor_sharing_limit.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem
rem     Last tested
rem             18.1.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4 -- > comment to avoid WordPress format issue
;

I haven’t bothered to gather stats – it’s not needed in trivial cases of CTAS.

Here’s one way to execute SQL that doesn’t get literal substitution when (apparently) it should:


alter system flush shared_pool;
alter session set cursor_sharing=force;

declare
        m_ct number;
        m_n1 number := 20;
begin
        execute immediate
                'select /*+ trace this */ count(*) from t1 where n2 = 15 and n1 = :b1'
                into m_ct using m_n1;
        dbms_output.put_line(m_ct);

        execute immediate
                'select /*+ trace this too */ count(*) from t1 where n1 = 15 and n2 = 15'
                into m_ct;
        dbms_output.put_line(m_ct);
end;
/

alter session set cursor_sharing=exact;

I’ve included a “hint” that allows me to find the SQL statements in v$sql very easily – and here they are, preceded by the query I used to find them:


select  sql_id, parse_calls, executions, rows_processed, sql_text 
from    v$sql
where   sql_text like 'select%trace this%'
and     sql_text not like '%v$sql%'
;


SQL_ID        PARSE_CALLS EXECUTIONS ROWS_PROCESSED
------------- ----------- ---------- --------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
cbu4s78h5pfj5           1          1              1
select /*+ trace this too */ count(*) from t1 where n1 = :"SYS_B_0" and n2 = :"SYS_B_1"

cru67sufptx8x           1          1              1
select /*+ trace this */ count(*) from t1 where n2 = 15 and n1 = :b1

Notice how one statement – which used only literals – underwent bind variable transformation but the other – with its mix of literals and bind variables – didn’t. On the other hand, if I execute the mixed format statement from a basic SQL*Plus session then pull the plan from memory, this is what I get:


SQL> select /*+ SQL*Plus session */ count(*) from t1 where n2 = 15 and n1 = :b1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select  sql_id, parse_calls, executions, rows_processed, sql_text
  2  from    v$sql
  3  where   sql_text like 'select%Plus session%'
  4  and     sql_text not like '%v$sql%'
  5  ;

SQL_ID        PARSE_CALLS EXECUTIONS ROWS_PROCESSED
------------- ----------- ---------- --------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
gq2qy2a9yuta7           1          1              1
select /*+ SQL*Plus session */ count(*) from t1 where n2 = :"SYS_B_0" and n1 = :b1

When trying to take advantage of cursor_sharing = force, you have to think about how the SQL gets to the database engine – is it delivered directly, or is it sent through some layer of code that means the raw SQL statement is protected in some way from the substitution code.

Footnote:

I happen to have used an example that puts the bind variable into the where clause; you will see the same effect even if the bind variables are in the select list – for example if you’ve selected something like to_char(date_col, :formatting_variable).

May 30, 2018

Index Bouncy Scan 3

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:15 pm GMT May 30,2018

This is a follow-up to a problem I had with yesterday’s example of using recursive CTEs to “bounce” along a multi-column index to pick out the unique set of combinations of the first two columns. Part of the resulting query used a pair of aggregate scalar subqueries in a select list – and Andrew Sayer improved on my query by introducing a “cross apply” (which I simply hadn’t thought of) which the optimizer transformed into a lateral view (which I had thought of, but couldn’t get to work).

After seeing what the Andrew and the optimizer had done I looked a little more closely at my lateral view experiment and modified it so that it worked. Here are the three critical versions of the relevant code fragment; first is my original code, then Andrew’s cross apply, then my working lateral view version:

select
        (select min(t1.val1) val1 from t1 where t1.val1 > bounce1.val1) val1,
        (select min(t1.val2) val2 from t1 where t1.val1 > bounce1.val1 and rownum = 1) val2
from    bounce1
where   bounce1.val1 is not null
 
 
select
        ca.val1 ,ca.val2
from    bounce1
cross  apply (select val1, val2
              from  (select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
                    )
             ) ca
where  bounce1.val1 is not null
 
----

select
        ca.val1 ,ca.val2
from    bounce1, 
        lateral(select val1, val2
              from  (select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
                    )
             ) ca
where  bounce1.val1 is not null

All I’ve done to modify Andrew’s code is put a comma after the table (actually CTE) bounce1, then change “cross apply” to “lateral”. Compare the resulting text with the following lateral version that doesn’t work:


select
        ca.val1 ,ca.val2
from    bounce1, 
        lateral (
                   select /*+ index(t1) no_index_ffs(t1) */
                             val1, val2
                     from    t1
                     where   t1.val1 > bounce1.val1
                     and     rownum = 1
             ) ca
where  bounce1.val1 is not null

To get from not working to working all I’ve done is wrap the text in my lateral() subquery inside one more (apparently redundant) layer of “select * from ()”!

In fact my assumption that my code wasn’t working was incorrect – what was really going on was that the code I had written was producing the wrong results but I thought that I had made a mistake in the way I was writing it and couldn’t figure out what I had done wrong.

Problem Solving:

To get a better idea of what’s going on, I took a closer look at the execution plans. Here are the plans (main body only) for the two variants of using the lateral() view – the first from the SQL with the “redundant” select, the second as I originally wrote it. Notice that the number of rows (A-Rows) returned in the first case is the 30 expected while in the second case it’s only 10.


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1 |        |   125 (100)|     30 |00:00:00.01 |      40 |     28 |       |       |          |
|   1 |  SORT ORDER BY                               |                 |      1 |      4 |   125   (3)|     30 |00:00:00.01 |      40 |     28 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                       |                 |      1 |      4 |   124   (2)|     30 |00:00:00.01 |      40 |     28 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |     33 |00:00:00.01 |      40 |     28 |  1024 |  1024 |          |
|*  4 |     VIEW                                     |                 |      1 |      2 |    61   (2)|      3 |00:00:00.01 |       8 |      4 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |      3 |00:00:00.01 |       8 |      4 |  1024 |  1024 |          |
|*  6 |       VIEW                                   |                 |      1 |      1 |    29   (0)|      1 |00:00:00.01 |       2 |      1 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY                 |                 |      1 |  10000 |    29   (0)|      1 |00:00:00.01 |       2 |      1 | 73728 | 73728 |          |
|   8 |         INDEX FULL SCAN                      | T1_PK           |      1 |  10000 |    29   (0)|      2 |00:00:00.01 |       2 |      1 |       |       |          |
|   9 |       NESTED LOOPS                           |                 |      3 |      1 |    31   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  10 |        RECURSIVE WITH PUMP                   |                 |      3 |        |            |      3 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |        VIEW                                  | VW_LAT_1BBF5C63 |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  12 |         VIEW                                 |                 |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|* 13 |          COUNT STOPKEY                       |                 |      3 |        |            |      2 |00:00:00.01 |       6 |      3 |       |       |          |
|* 14 |           INDEX RANGE SCAN                   | T1_PK           |      3 |      1 |     2   (0)|      2 |00:00:00.01 |       6 |      3 |       |       |          |
|  15 |     SORT AGGREGATE                           |                 |     30 |      1 |            |     30 |00:00:00.01 |      32 |     24 |       |       |          |
|  16 |      FIRST ROW                               |                 |     30 |      1 |     2   (0)|     27 |00:00:00.01 |      32 |     24 |       |       |          |
|* 17 |       INDEX RANGE SCAN (MIN/MAX)             | T1_PK           |     30 |      1 |     2   (0)|     27 |00:00:00.01 |      32 |     24 |       |       |          |
|  18 |     RECURSIVE WITH PUMP                      |                 |     11 |        |            |     30 |00:00:00.01 |       0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1 |        |   125 (100)|     10 |00:00:00.01 |      16 |       |       |          |
|   1 |  SORT ORDER BY                               |                 |      1 |      4 |   125   (3)|     10 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                                       |                 |      1 |      4 |   124   (2)|     10 |00:00:00.01 |      16 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                 |      1 |        |            |     11 |00:00:00.01 |      16 |  1024 |  1024 |          |
|*  4 |     VIEW                                     |                 |      1 |      2 |    61   (2)|      1 |00:00:00.01 |       4 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|                 |      1 |        |            |      1 |00:00:00.01 |       4 |  1024 |  1024 |          |
|*  6 |       VIEW                                   |                 |      1 |      1 |    29   (0)|      1 |00:00:00.01 |       2 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY                 |                 |      1 |  10000 |    29   (0)|      1 |00:00:00.01 |       2 | 73728 | 73728 |          |
|   8 |         INDEX FULL SCAN                      | T1_PK           |      1 |  10000 |    29   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|   9 |       NESTED LOOPS                           |                 |      1 |      1 |    31   (0)|      0 |00:00:00.01 |       2 |       |       |          |
|  10 |        RECURSIVE WITH PUMP                   |                 |      1 |        |            |      1 |00:00:00.01 |       0 |       |       |          |
|* 11 |        VIEW                                  | VW_DCL_1BBF5C63 |      1 |      1 |     2   (0)|      0 |00:00:00.01 |       2 |       |       |          |
|* 12 |         COUNT STOPKEY                        |                 |      1 |        |            |      1 |00:00:00.01 |       2 |       |       |          |
|  13 |          INDEX FULL SCAN                     | T1_PK           |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       2 |       |       |          |
|  14 |     SORT AGGREGATE                           |                 |     10 |      1 |            |     10 |00:00:00.01 |      12 |       |       |          |
|  15 |      FIRST ROW                               |                 |     10 |      1 |     2   (0)|      9 |00:00:00.01 |      12 |       |       |          |
|* 16 |       INDEX RANGE SCAN (MIN/MAX)             | T1_PK           |     10 |      1 |     2   (0)|      9 |00:00:00.01 |      12 |       |       |          |
|  17 |     RECURSIVE WITH PUMP                      |                 |     11 |        |            |     10 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Most importantly we can see that the optimizer has used two different transformations. For the working query we see the view name VW_LAT_xxxxxxxx at operation 11, this is Oracle implementing a lateral view; for the problem query we see the view name VW_DCL_xxxxxxxx at operation 11, which is Oracle implementing a transformation to a “decorrelated lateral view”.

My first test after noting this difference was to see what would happen in I added the hint /*+ no_query_transformation */ to the query: it resulted in the VW_DCL_xxxxxxxx view name changing to VW_LAT_xxxxxxxx and the query producing the right result. Andrew Sayer, on the ODC thread, then pointed out that he’d done a couple more experiments and used the /*+ no_decorrelate() */ hint so I tried that with my query, adding it (with no parameters) to the subquery inside the lateral() clause – again the plan changed from using VW_DCL to VW_LAT and the results were correct.

Test Case

Bottom line on this – it looks like the optimizer is decorrelating a subquery when it shouldn’t, leading to wrong results. To make it easier to see this anomaly I stripped the original sample down to a basic test case starting with the table that I used in the previous posting:

rem
rem     Script:         decorralate.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem
rem     Last tested 
rem             18.1.0.0  -- via liveSQL
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,3)                 val1,
        mod(rownum-1,10)                val2,
        lpad('x',100,'x')               padding
from
        generator       v1
order by
        dbms_random.value
;

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table t1 add constraint t1_pk primary key(val1, val2, id);

Now two versions of a simplified piece of code that should select the distinct values of val1 greater than the lowest value (each row in the UNION ALL of dual is emulating the way in which yesterday’s recursive CTE was effectively saying “this is a current known value, find the next higher”):


prompt  =============
prompt  Right results
prompt  =============

select
        v1.val1, v1.val2
from    (
        select  0 val1, 0 val2 from dual
        union all
        select 1,0 from dual
        union all
        select 2,0 from dual
        ) bounce1,
        lateral (
            select val1, val2 from (
              select  /*+ index(t1) no_index_ffs(t1) */
                      t1.val1, t1.val2
              from    t1
              where   t1.val1 > bounce1.val1
              and     rownum = 1
            )
        ) v1
;

prompt  ===========================================
prompt  Wrong results -- "redundant" select removed
prompt  ===========================================

select
        v1.val1, v1.val2
from    (
        select  0 val1, 0 val2 from dual
        union all
        select 1,0 from dual
        union all
        select 2,0 from dual
        ) bounce1,
        lateral (
            -- select val1, val2 from (
              select  /*+ index(t1) no_index_ffs(t1) */
                      t1.val1, t1.val2
              from    t1
              where   t1.val1 > bounce1.val1
              and     rownum = 1
            -- )
        ) v1
;

Here’s a cut-n-paste from running the two queries:


=============
Right results
=============

      VAL1       VAL2
---------- ----------
         1          0
         2          0

2 rows selected.

============================================
Wrong results  -- "redundant" select removed
============================================

no rows selected

Finally, to get an idea of what’s gone wrong – and to show that the optimizer has done something wrong when attempting to decorrelate – we can take a look at the optimizer trace file to see the final transformed SQL that the optimizer has produced a plan for. (I enabled the trace with the command “alter session set events ‘trace [rdbms.SQL_Transform.*]’;” to limit the trace to just the information about optimizer transformations.) This – cosmetically altered – is the final “unparsed” query:

select 
        vw_dcl_a18161ff.val1 val1,
        vw_dcl_a18161ff.val2 val2 
from    ( 
                (select 0 val1 from sys.dual dual) 
                union all  
                (select 1 1 from sys.dual dual) 
                union all  
                (select 2 2 from sys.dual dual)
        ) bounce1, 
        (
        select
                 /*+ no_index_ffs (t1) index (t1) */ 
                t1.val1 val1_0,
                t1.val2 val2_1 
        from
                test_user.t1 t1
        where 
                rownum = 1
        ) vw_dcl_a18161ff 
where 
        vw_dcl_a18161ff.val1 > bounce1.val1

As you can see, the lateral view has turned into a non-mergeable inline view which selects the first row available from t1 by following the supplied hints, and joins that single row result set to bounce1. I have a suspicion that lateral views which include rownum predicates should not be decorrelated. I have looked on MoS to see if I can find any bugs related to decorrelating lateral views, but either there are none or my search terms weren’t good enough.

 

May 28, 2018

Filtering LOBs

Filed under: CBO,Execution plans,LOBs,Oracle,subqueries — Jonathan Lewis @ 8:25 am GMT May 28,2018

A two-part question about the FILTER operation appeared on the Oracle-L list server a couple of days ago. The first part was a fairly common question – one that’s often prompted by the way the optimizer used to behave in older versions of Oracle. Paraphrased, it was: “Why is the total cost of the query so high compared to the sum of its parts?”

Here’s the query, and the execution plan.

 INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1               =1
  AND PHT.PHOTO IS NOT NULL
  AND NOT EXISTS
    (SELECT 'x'
    FROM TEMP TMP
    WHERE PHT.EMPLID=TMP.EMPLID_SRCH
    AND TMP.OPRID  = 'id'
    )
  ;  

  
-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          | 21210 |  3334K|  5802K  (2)| 00:03:47 |
|   1 |  LOAD TABLE CONVENTIONAL | TEMP     |       |       |            |          |
|*  2 |   FILTER                 |          |       |       |            |          |
|*  3 |    TABLE ACCESS FULL     | PHOTO    | 21211 |  3334K|   313   (1)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL     | TEMP     |     1 |    17 |   380   (2)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM " TEMP" "TMP" WHERE
              "TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id'))
   3 - filter("PHT"."PHOTO" IS NOT NULL)
   4 - filter("TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id')

Note that the “not exists” subquery against temp runs as a filter subquery with a cost of 380 for the tablescan. Combine that with the cost of 313 for the driving tablescan of photo and you might wonder why the resulting cost isn’t something like 693 – and in some old versions of Oracle that’s probably how it would be reported.

Historically the optimizer has been very bad about producing a final cost when queries have included subqueries – whether as filter subqueries in the predicate section or as scalar subqueries in the select list. Sometimes the cost would simply vanish from the final cost, sometimes it would be added just once to the final cost regardless of how many times the subquery might actually execute.

In this example the subquery against temp is a correlated subquery and might have to run once for every row in photo where the column photo was not null. At best it would have to run at least once for every distinct value of the photo.emplid column (the correlation column) found in those rows. In recent versions of Oracle the optimizer has tried to introduce some estimate of how many times the subquery would run as part of its calculation of the total cost. So (to a crude approximation) 5802K = 313 + N * 380. Unfortunately if we try to work backwards to N we find it would be about 15,267 which is about 72% of the 21,200 rows estimated as the result of the tablescan of photo – I haven’t tried to investigate the algorithms yet but presumably the optimizer makes some allowances somewhere for “self caching” as the subquery runs.

The more interesting part of the question came when the OP decided to test the effect of getting rid of the subquery. Check the costs in the resulting plan:


  INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1               =1
  AND PHT.UC_PBI_PHOTO IS NOT NULL;

  
-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          | 21211 |  3334K|  3659   (1)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | TEMP     |       |       |            |          |
|*  2 |   TABLE ACCESS FULL      | PHOTO    | 21211 |  3334K|  3659   (1)| 00:00:01 |
-------------------------------------------------------------------------------------
  
Predicate Information (identified by operation id):
---------------------------------------------------
     2 - filter("PHT"."PHOTO" IS NOT NULL)

Note how the cost of the tablescan of photo has gone up from 313 in the previous query to 3,659 in the simpler query! How can a tablescan that drives a subquery have a lower cost than the tablescan on its own? Bear in mind that in both cases the Cost attributed to the operation “Table Access Full” is purely about scanning the rows in the photo table and is (or should be) entirely disconnected from the cost and frequency of the subquery.

The clue is in the table definition. The column photo.photo is a BLOB.

Models

I think there are potentially two errors in the optimizer displayed by this example. The first is that it’s adding in a cost that it shouldn’t even be considering; the second is that it’s being inconsistent in the way that it’s deriving that cost.

To demonstrate what I think is happening, I built a variant of the OP’s example as follows:


rem
rem     Script:         optimizer_lob_costs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table photo (
        emplid          varchar2(11) not null,
        photo           clob,
        other_col       varchar2(1000)
)
lob (photo) 
store as
        photo_lob(
        disable storage in row 
        cache
        logging
)
;

create unique index ph_uk on photo(emplid);

insert /*+ append */ into photo
select
        lpad(2 * rownum,10,0),
        rpad('x',1000),
        rpad('x',1000)
from
        all_objects
where
        rownum <= 10000 -- > comment to avoid wordpress format issue
;

commit;

create table temp(
        oprid           varchar2(30),
        emplid_srch     varchar2(11)
)
;

insert /*+ append */ into temp
select
        'id',
        lpad(2 * rownum,10,0)
from
        all_objects
where
        rownum <= 1000 -- > comment to avoid wordpress format issue
;

commit;

execute dbms_stats.gather_table_stats(user,'photo',method_opt=>'for all columns size 1', cascade=>true)
execute dbms_stats.gather_table_stats(user,'temp', method_opt=>'for all columns size 1', cascade=>true)


I’ve changed the BLOB to a CLOB defined with storage in row disabled, and I’ve introduced a varchar2() column of the same size as the CLOB column. I’ve declared the correlating column not null and created a unique index on it. Here are the two queries I want to review – slightly simplified versions of the original:


explain plan for
insert into temp(emplid_srch)
select 
        distinct pht.emplid
from 
        photo pht
where 
        1 = 1
and  pht.photo is not null
-- and     pht.other_col is not null
and     not exists (
                select /*+ no_unnest */
                        null
                from 
                        temp tmp
                where 
                        pht.emplid=tmp.emplid_srch
        )
;  

select * from table(dbms_xplan.display);

explain plan for
insert into temp(emplid_srch)
select
        distinct pht.emplid
from    photo pht
where   1               =1
and  pht.photo is not null
-- and     pht.other_col is not nulL
;  

select * from table(dbms_xplan.display);

As you can see I’ve had to include a /*+ no_unnest */ hint in my SQL to get the FILTER operation to appear in the plan (the OP had the hidden parameter “_unnest_subquery” set to false); I’ve also allowed for two variants of each query, one referencing the CLOB column the other referencing the varchar2() column. The only results I’ll show are for the queries accessing the CLOB, and here are the plans first with, then without, the subquery. Check the cost of the tablescan of the photo table in the two cases:


----------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |       |  9999 |   956K| 10458   (3)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | TEMP  |       |       |            |          |
|*  2 |   FILTER                 |       |       |       |            |          |
|*  3 |    TABLE ACCESS FULL     | PHOTO | 10000 |   957K|   216   (1)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL     | TEMP  |     1 |    11 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "TEMP" "TMP"
              WHERE "TMP"."EMPLID_SRCH"=:B1))
   3 - filter("PHT"."PHOTO" IS NOT NULL)
   4 - filter("TMP"."EMPLID_SRCH"=:B1)


----------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |       | 10000 |   957K|   285   (2)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | TEMP  |       |       |            |          |
|*  2 |   TABLE ACCESS FULL      | PHOTO | 10000 |   957K|   285   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PHT"."PHOTO" IS NOT NULL)

With the subquery in place the tablescan of photo reports a cost of 285, in the absence of the subquery it reports a cost of 216, a difference of 69. Repeating the test but using the varchar2() column the cost of the tablescan was 213 in both cases – suggesting that the variation was due to the column being a LOB.

With no further clues in the plan it looked like one of those rare occasions when I have to look at the 10053 (optimizer) trace file – and this is what I got from the 12.1.0.2 trace, looking at the section headed “SINGLE TABLE ACCESS PATH” for the photo table. First the base query without the subquery:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for PHOTO[PHT]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#2): PHOTO(LOB)
    AvgLen: 87 NDV: 0 Nulls: 0 Density: 0.000000
  Table: PHOTO  Alias: PHT
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.000000  Non Adjusted: 10000.000000
  Scan IO  Cost (Disk) =   210.000000
  Scan CPU Cost (Disk) =   13571440.480000
  Total Scan IO  Cost  =   210.000000 (scan (Disk))
                         + 70.000000 (io filter eval) (= 0.007000 (per row) * 10000.000000 (#rows))
                       =   280.000000
  Total Scan CPU  Cost =   13571440.480000 (scan (Disk))
                         + 9138463.200000 (cpu filter eval) (= 913.846320 (per row) * 10000.000000 (#rows))
                       =   22709903.680000

Note the “Total Scan IO Cost” described at line 13 includes a component at line 12 labelled “(io filter eval)” – why, for the predicate “photo is null”, would we do any special I/O when that predicate can be met in the basic table scan.

(Note: A predicate like “lob_column is null” means there is no lob locator in place, so no lob access need be done for that test. In fact the related, but very different, predicate “length(lob_column) = 0” meaning the lob locator exists but the lob is “empty” could also be satisfied during the tablescan without reference to the physical lob segment(s) because the length of the lob is included in the lob locator.)

Let’s assume that the optimizer is incorrectly assuming the run-time engine will have to access the lob in some way to determine that the lob is null. The worst case scenario is that Oracle will start by accessing the LOBindex – so why don’t we check how big the LOBindex is. The first step I took was to check the object_id of the LOBindex and then do a tree dump (which showed 66 leaf blocks) and then I checked the segment header block and dumped that with the following results:


  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 1      #blocks: 127
                  last map  0x00000000  #maps: 0      offset: 4128
      Highwater::  0x01400447  ext#: 0      blk#: 70     ext size: 127
  #blocks in seg. hdr's freelists: 4
  #blocks below: 70
  mapblk  0x00000000  offset: 0
                   Unlocked
     Map Header:: next  0x00000000  #extents: 1    obj#: 194295 flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x01400401  length: 127

See the “Highwater::” information at line 6 – the allocated space in the segment is the first 70 blocks of the first extent. That’s (almost certainly) where the incremental cost of 70 (single block I/Os) comes from.  (And I did couple of big updates to the LOB, designed to expand the LOBindex without changing the segment size of the underlying table, to corroborate that hypothesis.)

This brings us to the question of why the cost of the tablescan drops when the subquery is included. Again we generate the 10053 trace and examine the details under the “SINGLE TABLE ACCESS PATH”:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for PHOTO[PHT]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Table: PHOTO  Alias: PHT
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.000000  Non Adjusted: 10000.000000
  Scan IO  Cost (Disk) =   210.000000
  Scan CPU Cost (Disk) =   13571440.480000
  Total Scan IO  Cost  =   210.000000 (scan (Disk))
                         + 3.500000 (io filter eval) (= 0.000350 (per row) * 10000.000000 (#rows))
                       =   213.500000
  Total Scan CPU  Cost =   13571440.480000 (scan (Disk))
                         + 656923.160000 (cpu filter eval) (= 65.692316 (per row) * 10000.000000 (#rows))
                       =   14228363.640000


In this case the “(io filter eval)” at line 10 is only 3.5 – and if you know your optimizer and how it handles subqueries you’re allowed to guess that could be one of Oracle’s standard guesses of 5% coming into play. (Again, growing the index seemed to corroborate this hypothesis.)

So here’s (possible) bug number 2: the first bug is adding a cost for accessing the LOBindex when there should be no need to  access the index at all – the execution plan says we will get 10,000 rows from the table, the filter predicate does report a cardinality reduced by just 1 on a column that has been declared with a uniqueness constraint, but a fairly commonly used “guess factor” of 5% is used as an indicator of the number of times the lob predicate will be tested. The various bits of the arithmetic are not consistent with each other.

Summary notes:

If you have a tablescan with a predicate that references a lob column then the cost of the tablescan includes the cost of the lob access – and there are cases where lob access is not needed but still gets costed {this is bug number 1 – the predicates are column is/is not null, and length(column) = / != 0)}.

If the lob data itself does not need to be accessed then the size of the lob index – which you can’t easily find – may have a significant impact on the cost of the tablescan.

If the query also includes predicates that result in the optimizer guessing about cardinality effects (1%, 5%, 0.25% are common guesses) then that guess may be used to scale the assumed (and potentially irrelevant) cost of the lob access. (There is scope for further experimentation in this area to examine the effects of “non-guess” predicates and the assumed order of application of predicates, e.g. are lob predicates costed as the last to be applied, does the algorithm for costing matched the execution order.)

As often happens it’s easy to see that there are oddities in the arithmetic that affect the cost of a query in ways that might make the optimizer pick a silly execution plan. Unfortunately it’s not easy to predict when you’re likely to see the effects of these oddities; the best we can do is remember that there is an anomaly with costing lob-based predicates and hope that we think of it when we see the optimizer picking a bad plan for reasons that initially are not obvious.

Next Page »

Powered by WordPress.com.