Oracle Scratchpad

November 1, 2018

Join Cardinality – 5

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:34 pm GMT Nov 1,2018

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

It’s now time to look at a join where the other end has a height-balanced histogram. Arguably it’s not sensible to spend time writing about this since you shouldn’t be creating them in 12c (depending, instead, on the hybrid histogram that goes with the auto_sample_size), and the arithmetic is different in 11g. However, there still seem to be plenty of people running 12c but not using the auto_sample_size and that means they could be generating some height-balanced histograms – so let’s generate some data and see what happens.


rem
rem     Script:         freq_hist_join_04a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4        Different results
rem

drop table t2 purge;
drop table t1 purge;

set linesize 156
set trimspool on
set pagesize 60

set feedback off

execute dbms_random.seed(0)

create table t1(
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))   j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

prompt  ==========================================================
prompt  Using estimate_percent => 100 to get height-balanced in t2
prompt  ==========================================================

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                estimate_percent => 100,
                method_opt       => 'for all columns size 1 for columns j2 size 20'
        );
end;
/

As in earlier examples I’ve created some empty tables, then inserted randomly generated data (after calling the dbms_random.seed(0) function to make the data reproducible). Then I’ve gathered stats, knowing that there will be 22 distinct values in t2 so forcing a height-balanced histogram of 20 buckets to appear.

When we try to calculate the join cardinality we’re going to need various details from the histogram information, such as bucket sizes, number of distinct values, and so on, so in the next few queries to display the histogram information I’ve captured a few values into SQL*Plus variables. Here’s the basic information about the histograms on the join columns t1.j1 and t2.j2:


column num_distinct new_value m_t2_distinct
column num_rows     new_value m_t2_rows
column num_buckets  new_value m_t2_buckets
column bucket_size  new_value m_t2_bucket_size

select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by
        table_name
;

select  table_name, num_rows, decode(table_name, 'T2', num_rows/&m_t2_buckets, null) bucket_size
from    user_tables
where   table_name in ('T1','T2')
order by
        table_name
;

column table_name format a3 heading "Tab"
break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


Tab                  COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   HEIGHT BALANCED           22          20 .052652266

Tab                    NUM_ROWS BUCKET_SIZE
-------------------- ---------- -----------
T1                          100
T2                          800          40

Tab      VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
--- ---------- ------------------- ---------------
T1           2                   5               5
             5                  15              20
             7                  15              35
            10                  17              52
            12                  13              65
            15                  13              78
            17                  11              89
            20                   7              96
            22                   3              99
            25                   1             100

T2           1                   0               0
            14                   1               1
            17                   1               2
            18                   1               3
            19                   1               4
            20                   1               5
            21                   2               7
            22                   1               8
            23                   1               9
            24                   2              11
            25                   2              13
            26                   3              16
            27                   2              18
            28                   2              20

As you can see, there is a frequency histogram on t1 reporting a cumulative total of 100 rows; and the histogram on t2 is a height-balanced histogram of 20 buckets, showing 21, 24, 25, 26, 27 and 28 as popular values with 2,2,2,2,3 and 2 endpoints (i.e. buckets) respectively. You’ll also note that the t2 histogram has 21 rows with row/bucket 0 showing us the minimum value in the column and letting us know that bucket 1 is not exclusively full of the value 14. (If 14 had been the minimum value for the column as well as an end point Oracle would not have created a bucket 0 – that may be a little detail that isn’t well-known – and will be the subject of a little follow-up blog note.)

Let’s modify the code to join the two sets of hisogram data on data value – using a full outer join so we don’t lose any data but restricting ourselves to values where the histograms overlap. We’re going to follow the idea we’ve developed in earlier postings and multiply frequencies together to derive a join frequency, so we’ll start with a simple full outer join and assume that when we find a real match value we should behave as if the height-balanced buckets (t2) where the bucket count is 2 or greater represent completely full buckets and are popular values..

I’ve also included in this query (because it had a convenient full outer join) a column selection that counts how many rows there are in t1 with values that fall inside the range of the t2 histogram but don’t match a popular value in t2.


column unmatch_ct   new_value m_unmatch_ct
column product format 999,999.99

break on report skip 1
compute sum of product on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, 
        f2.value t2_value, 
        f1.frequency t1_frequency, 
        f2.frequency t2_frequency, 
        sum(
                case
                        when f2.frequency > 1 and f1.frequency is not null
                                then 0
                                else f1.frequency
                end
        ) over()        unmatch_ct,
        f2.frequency * &m_t2_bucket_size *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
        end     product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
--      coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
;

  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY UNMATCH_CT     PRODUCT
---------- ---------- ------------ ------------ ---------- -----------
	 2			 5			99
	 5			15			99
	 7			15			99
	10			17			99
	12			13			99
		   14			      1 	99
	15			13			99
	17	   17		11	      1 	99
		   18			      1 	99
		   19			      1 	99
	20	   20		 7	      1 	99
		   21			      2 	99
	22	   22		 3	      1 	99
		   23			      1 	99
		   24			      2 	99
	25	   25		 1	      2 	99	 80.00
							   -----------
sum								 80.00


We captured the bucket size (&m_bucket_size) for the t2 histogram as 40 in the earlier SQL, and we can see now that in the overlap range (which I’ve hard coded as 2 – 25) we have three buckets that identify popular values, but only one of them corresponds to a value in the frequency histogram on t1, so the Product column shows a value of 1 * 2 * 40 = 80. Unfortunately this is a long way off the prediction that the optimizer is going to make for the simple join. (Eventually we’ll see it’s 1,893 so we have a lot more rows to estimate for).

Our code so far only acounts for items that are popular in both tables. Previous experience tells us that when a popular value exists only at one end of the join predicate we need to derive a contribution to the total prediction through an “average selectivity” calculated for the other end of the join predicate. For frequency histograms we’ve seen that “half the number of the least frequently occuring value” seems to be the appropriate frequency estimate, and if we add that in we’ll get two more contributions to the total from the values 21 and 24 which appear in the height-balanced (t2) histogram as popular but don’t appear in the frequency (t1) histogram. Since the lowest frequency in t1 is 1 this would give us two contributions of 0.5 * 2 (buckets) * 40 (bucket size) viz: two contributions of 40 bringing our total to 160 – still a serious shortfall from Oracle’s prediction. So we need to work out how Oracle generates an “average frequency” for the non-popular values of t2 and then apply it to the 99 rows in t1 that haven’t yet been accounted for in the output above.

To calculate the “average selectivity” of a non-popular row in t2 I need a few numbers (some of which I’ve already acquired above). The total number of rows in the table (NR), the number of distinct values (NDV), and the number of popular values (NPV), from which we can derive the the number of distinct non-popular values and the number of rows for the non-popular values. The model that Oracle uses to derive these numbers is simply to assume that a value is popular if its frequency in the histogram is greater than one and the number of rows for that value is “frequency * bucket size”.

The first query we ran against the t2 histogram showed 6 popular values, accounting for 13 buckets of 40 rows each. We reported 22 distinct values for the column and 800 rows for the table so the optimizer assumes the non-popular values account for (22 – 6) = 16 distinct values and (800 – 13 * 40) = 280 rows. So the selectivity of non-popular values is (280/800) * (1/16) = 0.021875. This needs to be multiplied by the 99 rows in t1 which don’t match a popular value in t2 – so we now need to write some SQL to derive that number.

We could enhance our earlier full outer join and slot 0.5, 99, and 0.021875 into it as “magic” constants. Rather than do that though I’m going to write a couple of messy queries to derive the values (and the low/high range we’re interested in) so that I will be able to tweak the data later on and see if the formula still produces the right answer.


column range_low    new_value m_low
column range_high   new_value m_high
column avg_t1_freq  new_value m_avg_t1_freq
column new_density  new_value m_avg_t2_dens

with f1 as (
        select  endpoint_value ep_val,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T1'
        and     column_name = 'J1'
),
f2 as (
        select  endpoint_value ep_val,
                endpoint_number ep_num,
                endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
        from    user_tab_histograms
        where   table_name  = 'T2'
        and     column_name = 'J2'
)
select
        max(min_v) range_low, min(max_v) range_high, min(min_f)/2 avg_t1_freq, max(new_density) new_density
from    (
        select  min(ep_val) min_v, max(ep_val) max_v, min(frequency) min_f, to_number(null) new_density
        from f1
        union all
        select  min(ep_val) min_v, max(ep_val) max_v, null           min_f,
                (max(ep_num) - sum(case when frequency > 1 then frequency end)) /
                (
                        max(ep_num) *
                        (&m_t2_distinct - count(case when frequency > 1 then 1 end))
                )       new_density
        from    f2
        )
;

 RANGE_LOW RANGE_HIGH AVG_T1_FREQ NEW_DENSITY
---------- ---------- ----------- -----------
         2         25          .5     .021875


This query finds the overlap by querying the two histograms and reporting the lower high value and higher low value. It also reports the minimum frequency from the frequency histogram and divides by 2, and calculates the number of non-popular rows divided by the total number of rows and the number of distinct non-popular values. (Note that I’ve picked up the number of distinct values in t2.j2 as a substituion variable generated by one of my earlier queries.) In my full script this messy piece of code runs before the query that showed I showed earlier on that told us how well (or badly) the two histograms matched.

Finally we can use the various values we’ve picked up in a slightly more complex version of the full outer join – with a special row added through a union all to give us our the estimate:


break on report skip 1
compute sum of product on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency,
        endpoint_number
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
),
join1 as (
select
        f1.value t1_value, f2.value t2_value,
        f1.frequency t1_frequency, f2.frequency t2_frequency,
        f2.frequency *
        case
                when f2.frequency > 1 and f1.frequency is not null
                        then f1.frequency
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
        &m_t2_bucket_size        product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between &m_low and &m_high
order by
        coalesce(f1.value, f2.value)
)
select  *
from    join1
union all
select
        null,
        &m_avg_t2_dens,
        &m_unmatch_ct,
        &m_t2_rows * &m_avg_t2_dens,
        &m_t2_rows * &m_avg_t2_dens * &m_unmatch_ct
from
        dual
;


  T1_VALUE   T2_VALUE T1_FREQUENCY T2_FREQUENCY     PRODUCT
---------- ---------- ------------ ------------ -----------
         2                       5
         5                      15
         7                      15
        10                      17
        12                      13
                   14                         1
        15                      13
        17         17           11            1
                   18                         1
                   19                         1
        20         20            7            1
                   21                         2       40.00
        22         22            3            1
                   23                         1
                   24                         2       40.00
        25         25            1            2       80.00
              .021875           99         17.5    1,732.50
                                                -----------
sum                                                1,892.50


It remains only to check what the optimizer thinks the cardinality will be on a simple join, and then modify the data slightly to see if the string of queries continues to produce the right result. Here’s a starting test:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


 COUNT(*)
----------
      1327


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1893 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1367K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."J1"="T2"."J2")

The E-rows for the hash join operation reports 1893 – and a quick check of the 10053 trace file shows that this is 1892.500000 rounded – a perfect match for the result from my query. I’ve modified the data in various ways (notably updating the t1 table to change the value 25 (i.e. the current maximum value of j1) to other, lower, values) and the algorithm in the script seems to be sound – for 12c and 18c. I won’t be surprised, however, if someone comes up with a data pattern where the wrong estimate appears.

Don’t look back

Upgrades are a pain. With the same data set and same statistics on 11.2.0.4, running the same join query between t1 and t2, here’s the execution plan I got:


SQL_ID  f8wj7karu0hhs, child number 0
-------------------------------------
select         count(*) from         t1, t2 where         t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      12 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      12 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1855 |   1327 |00:00:00.01 |      12 |  2440K|  2440K| 1357K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       6 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Notice that the E-rows value is different. The join cardinality algorithm seems to have changed in the upgrade from 11.2.0.4 to 12c. I haven’t quite figured out how to get to the 11g result, but I seem to get quite close most of the time by making a simple change to the final query that I used to predict the optimizer’s estimate. In the case expression that chooses between the actual t1.j1 frequency and the “average frequency” don’t choose, just use the latter:


        case
                when f2.frequency > 1 and f1.frequency is not null
                        -- then f1.frequency    -- 12c
                        then &m_avg_t1_freq     -- 11g
                when f2.frequency > 1 and f1.frequency is null
                        then &m_avg_t1_freq
        end *
 

As I modified the t1 row with the value 25 to hold other values this change kept producing results that were exactly 2, 2.5, or 3.0 different from the execution plan E-Rows – except in one case where the error was exactly 15.5 (which looks suspiciously like 17.5: the “average frequency in t2” minus 2). I’m not keen to spend time trying to work out exactly what’s going on but the takeaway from this change is that anyone upgrading from 11g to 12c may find that some of their queries change plans because they happen to match the type of example I’ve been working with in this post.

In some email I exchanged with Chinar Aliyev, he suggested three fix-controls that might be relevant. I’ve added these to an earlier posting I did when I first hit the anomaly a few days ago but I’ll repeat them here. I will be testing their effects at some point in the not too distant future:

14033181 1 QKSFM_CARDINALITY_14033181   correct ndv for non-popular values in join cardinality comp.         (12.1.0.1)
19230097 1 QKSFM_CARDINALITY_19230097   correct join card when popular value compared to non popular         (12.2.0.1)
22159570 1 QKSFM_CARDINALITY_22159570   correct non-popular region cardinality for hybrid histogram          (12.2.0.1)

October 25, 2018

Join Cardinality – 4

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 9:09 am BST Oct 25,2018

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

I was nearly right but the arithmetic didn’t quite work out the way I expected.  Fortunately Chinar Aliyev’s document highlighted my error – the optimizer doesn’t use all the repeat counts, it uses only those repeat counts that identify popular values, and a popular value is one where the endpoint_repeat_count is not less than the average number of rows in a bucket. Let’s work through an example – first the data (which repeats an earlier article, but is included here for ease of reference):

rem
rem     Script:         freq_hist_join_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

set linesize 156
set pagesize 60
set trimspool on

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)
)
;

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 13'
        );
end;
/

As before I’ve got a table with 100 rows using the sqrt() function to generate column j1, and a table with 800 rows using the dbms_random.normal function to generate column j2. So the two columns have skewed patterns of data distribution, with a small number of low values and larger numbers of higher values – but the two patterns are different.

I’ve generated a histogram with 254 buckets (which dropped to 10) for the t1.j1 column, and generated a histogram with 13 buckets for the t2.j2 column as I knew (after a little trial and error) that this would give me a hybrid histogram.

Here’s a simple query, with its result set, to report the two histograms – using a full outer join to line up matching values and show the gaps where (endpoint) values in one histogram do not appear in the other:


define m_popular = 62

break on report skip 1

compute sum of product on report
compute sum of product_rp on report

compute sum of t1_count on report
compute sum of t2_count on report
compute sum of t2_repeats on report
compute sum of t2_pop_count on report

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null)
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value t1_value,
        f2.value t2_value,
        f1.row_or_bucket_count t1_count,
        f2.row_or_bucket_count t2_count,
        f1.endpoint_repeat_count t1_repeats,
        f2.endpoint_repeat_count t2_repeats,
        f2.pop_count t2_pop_count
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT T1_REPEATS T2_REPEATS T2_POP_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ------------
                    1                     1                     1
         2                     5                     0
         5                    15                     0
         7                    15                     0
        10                    17                     0
        12                    13                     0
        15         15         13         55          0         11
        17         17         11         56          0         34
                   19                    67                    36
        20         20          7         57          0         57
                   21                    44                    44
        22         22          3         45          0         45
                   23                    72                    72           72
                   24                    70                    70           70
        25         25          1         87          0         87           87
                   26                   109                   109          109
                   27                    96                    96           96
                   28                    41                    41
---------- ---------- ---------- ----------            ---------- ------------
                             100        800                   703          434

You’ll notice that there’s a substitution variable (m_popular) in this script that I use to identify the “popular values” in the hybrid histogram so that I can report them separately. I’ve set this value to 62 for this example because a quick check of user_tables and user_tab_cols tells me I have 800 rows in the table (user_tables.num_rows) and 13 buckets (user_tab_cols.num_buckets) in the histogram: 800/13 = 61.52. A value is popular only if its repeat count is 62 or more.

This is where you may hit a problem – I certainly did when I switched from testing 18c to testing 12c (which I just knew was going to work – but I tested anyway). Although my data has been engineered so that I get the same “random” data in both versions of Oracle, I got different hybrid histograms (hence my complaint in a recent post.) The rest of this covers 18c in detail, but if you’re running 12c there are a couple of defined values that you can change to get the right results in 12c.

At this point I need to “top and tail” the output because the arithmetic only applies where the histograms overlap, so I need to pick the range from 2 to 25. Then I need to inject a “representative” or “average” count/frequency in all the gaps, then cross-multiply. The average frequency for the frequency histogram is “half the frequency of the least frequently occurring value” (which seems to be identical to new_density * num_rows), and the representative frequency for the hybrid histogram is (“number of non-popular rows” / “number of non-popular values”). There are 800 rows in the table with 22 distinct values in the column, and the output above shows us that we have 5 popular values totally 434 rows, so the average frequency is (800 – 434) / (22 – 5) = 21.5294. (Alternatively we could say that the average selectivities (which is what I’ve used in the next query) are 0.5/100 and 21.5294/800.)

[Note for 12c, you’ll get 4 popular values covering 338 rows, so your figurese will be: (800 – 338) / (22 – 4) = 25.6666… and 0.0302833]

So here’s a query that restricts the output to the rows we want from the histograms, discards a couple of columns, and does the arithmetic:


define m_t2_sel = 0.0302833
define m_t2_sel = 0.0269118
define m_t1_sel = 0.005

break on table_name skip 1 on report skip 1

with f1 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        to_number(null) pop_count
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'J1'
order by
        endpoint_value
),
f2 as (
select
        table_name,
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number,
        endpoint_repeat_count,
        case when endpoint_repeat_count >= &m_popular
                        then endpoint_repeat_count
                        else null
        end     pop_count
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'J2'
order by
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,100 * &m_t1_sel) t1_count,
        nvl(f2.pop_count,          800 * &m_t2_sel) t2_count,
        case when (   f1.row_or_bucket_count is not null
                   or f2.pop_count is not null
        )    then
                nvl(f1.row_or_bucket_count,100 * &m_t1_sel) *
                nvl(f2.pop_count,          800 * &m_t2_sel)
        end      product_rp
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;


 F1_VALUE   F2_VALUE   T1_COUNT   T2_COUNT PRODUCT_RP
---------- ---------- ---------- ---------- ----------
         2                     5   21.52944   107.6472
         5                    15   21.52944   322.9416
         7                    15   21.52944   322.9416
        10                    17   21.52944  366.00048
        12                    13   21.52944  279.88272
        15         15         13   21.52944  279.88272
        17         17         11   21.52944  236.82384
                   19         .5   21.52944
        20         20          7   21.52944  150.70608
                   21         .5   21.52944
        22         22          3   21.52944   64.58832
                   23         .5         72         36
                   24         .5         70         35
        25         25          1         87         87
                      ---------- ---------- ----------
sum                          102  465.82384 2289.41456

There’s an important detail that I haven’t mentioned so far. In the output above you can see that some rows show “product_rp” as blank. While we cross multiply the frequencies from t1.j1 and t2.j2, filling in average frequencies where necessary, we exclude from the final result any rows where average frequencies have been used for both histograms.

[Note for 12c, you’ll get the result 2698.99736 for the query, and 2699 for the execution plan]

Of course we now have to check that the predicted cardinality for a simple join between these two tables really is 2,289. So let’s run a suitable query and see what the optimizer predicts:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

SQL_ID  cf4r52yj2hyd2, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.j1 = t2.j2

Plan hash value: 906334482

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     108 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     108 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   2289 |   1327 |00:00:00.01 |     108 |  2546K|  2546K| 1194K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |      18 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |      34 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

As you can see, the E-Rows for the join is 2,289, as required.

I can’t claim that the model I’ve produced is definitely what Oracle does, but it looks fairly promising. No doubt, though, there are some variations on the theme that I haven’t considered – even when sticking to a simple (non-partitioned) join on equality on a single column.

October 9, 2018

Join Cardinality – 3

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:01 pm BST Oct 9,2018

In the previous posting I listed the order of precision of histograms as:

  • Frequency
  • Top-Frequency
  • Hybrid
  • Height-balanced
  • None

Having covered the Frequency/Frequency join (for a single column, no nulls, equijoin) in the previous posting I’ve decided to work down the list and address Frequency/Top-Frequency in this posting. It gets a little harder to generate data as we move to the less precise histograms since we need to have skew, we want some gaps, and (for Top-Frequency) we need to have some data that can be “ignored”. On the plus side, though, I want to work with a small number of buckets to keep the output of any queries I run fairly short so I’m going to stick with a small number of buckets, which means the “small” volume of “ignorable” data (the “spare” bucket) can be relative large. Here’s the code I used to generate data for my investigation – 100 rows for the table with a frequency histogram and 800 rows for the table with a top-frequency.


rem
rem     Script:         freq_hist_join_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 16'
        );
end;
/

In this example I’ve used the sqrt() function and the dbms_random.normal() function to generate the data. The scaling and truncating I’ve done on the results has given me two sets of data which have a nice skew, some gaps, but different patterns (though both have a small number of small values and a larger number of larger values). The data from dbms_random.normal() will produce 22 distinct values, so I’ve requested a histogram with 16 buckets and checked that this will produce a Top-Frequency histogram. (If I want a Hybrid histogram – for the next thrilling installment in the series – I’ll just reduce the number of buckets slightly).

Here are the resulting stats, preceded by the code that reported them:


select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by table_name
;

select  table_name, num_rows
from    user_tables
where   table_name in ('T1','T2')
order by table_name
;

break on table_name skip 1 on report skip 1

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   TOP-FREQUENCY             22          16    .000625

TABLE_NAME             NUM_ROWS
-------------------- ----------
T1                          100
T2                          800

TABLE_NAME                VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
-------------------- ---------- ------------------- ---------------
T1                            2                   5               5
                              5                  15              20
                              7                  15              35
                             10                  17              52
                             12                  13              65
                             15                  13              78
                             17                  11              89
                             20                   7              96
                             22                   3              99
                             25                   1             100

T2                            1                   1               1
                             13                  14              15
                             15                  11              26
                             16                  22              48
                             17                  34              82
                             18                  31             113
                             19                  36             149
                             20                  57             206
                             21                  44             250
                             22                  45             295
                             23                  72             367
                             24                  70             437
                             25                  87             524
                             26                 109             633
                             27                  96             729
                             28                  41             770

Table t1 reports 100 rows, 10 distinct values and a Frequency histogram with 10 buckets.
Table t2 reports 800 rows, 22 distinct values and a Top-Frequency histogram with 16 buckets.

Things we notice from the histograms are: t1 has a range from 2 to 25, while t2 has a range from 1 to 28. We also notice that the highest endpoint_number for t2 is only 770 out of a possible 800 – we’ve “lost” 30 rows. We don’t really care what they are for the purposes of the arithmetic, but if we did a quick “select j2, count(*)” query we’d see that we had lost the following:


SQL> select j2, count(*) from t2 group by j2 order by count(*), j2;

	J2   COUNT(*)
---------- ----------
	 1	    1
	 9	    1  *
	 8	    3  *
	11	    4  *
	10	    5  *
	12	    8  *
	14	    9  *
	15	   11
...

The reason why the total number of rows accounted for is less than the total number of rows in the table comes in two parts. The Top-Frequency histogram is designed to hold the Top N most popular entries in the table, so there will be some entries that don’t make an appearance in the histogram despite contributing rows to the total table count; the number of “lost” rows can then be increased because the Top N popular values may not include the column low and high values, and these two values must appear in the histogram. Looking at the output above we can see that we could have reported 14 as the 16th most popular value, instead we have to record 1, losing a further 9 rows and regaining 1.

Let’s test the pure join query on the two tables to see what the optimizer is predicting as the join cardinality, and then try to re-create that cardinality from the histogram data:


alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*) 
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1608 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1355K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Our target is to work out how we can query the histogram data in a way that gets the result 1,608. Ideally we’ll also think of a rationale for justifying our method, and then we’ll apply the same method with 15 buckets and 17 buckets, and with a couple of variations to the data (e.g. update all rows where j1 = 25 to set j1 = 28), to see if the method still gets the right result.

All we did with the frequency/frequency join was to join the two histograms on matching values, multiply the frequencies on each resulting row , then sum down the set, and this automatically eliminated rows which were outside the “highest low” and “lowest high” (i.e. we only examined rows where the histograms overlapped). We might hope that things shouldn’t be too different when one of the histograms is a top-frequency histogram.

There is an important difference, though, between frequency and top-frequency histograms – in the latter case there are values in the table which will not be in the histogram, so we ought to make some allowance for these (even though it’s only “one bucket’s worth”). It’s possible that some of these values might match values in the frequency histogram so we need to include a mechanism for adding in a factor to allow for them. So as a first step let’s work out the “average number of rows per value” for the missing values.

We have 22 distinct values and 16 end points so there are 6 missing values. We have 800 rows in the table but only 770 rows reported in the histogram so there are 30 missing rows. So let’s say the missing values have an average cardinality of 30/6 = 5 (and we might extend that to say they have an average selectivity of 5/800 = 0.00625).

Let’s bring that value into the query we wrote for the frequency/frequency case by using an outer join (which I’ll write as an “ANSI” Full Outer Join”) with a predicate in place that restricts the result to just the overlapping range, which is [2,25], the “higher low value” and “lower high value” across the two histograms. Here’s some code – with an odd little detail included:


column product format 999,999,999.99
compute sum of product on report

compute sum of t1_count on report
compute sum of t1_value on report
compute sum of t2_count on report
compute sum of t2_value on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,0.00) t1_count, 
        nvl(f2.row_or_bucket_count,800*0.00625) t2_count,
        nvl(f1.row_or_bucket_count,0.00) * 
        nvl(f2.row_or_bucket_count,800*0.006250) product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;

I’ve included an nvl() on the columns for the top-frequency histograms that convert nulls (i.e. the preserved rows derived from the frequency histogram) into the average frequency we’ve just calculated, using the “num_rows * selectivity” representation. The odd little detail that I commented on above does something similar for the preserved rows derived from the top-frequency histogram because this first guess at the calculation was wrong and needed an adjustment which I’m anticipating. Here are the results I got with this code:

  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13          0         14             .00
        15         15         13         11          143.00
                   16          0         22             .00
        17         17         11         34          374.00
                   18          0         31             .00
                   19          0         36             .00
        20         20          7         57          399.00
                   21          0         44             .00
        22         22          3         45          135.00
                   23          0         72             .00
                   24          0         70             .00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233        100        548        1,463.00

The figure is too low, so there has to be an adjustment. What if the code is allowing for the “maybe there are other values” algorithm that the optimizer uses with fequency histograms ? If you’ve gathered a frequency histogram on a column but query it with a value that isn’t in the histogram than Oracle applies an algorithm that looks like: “if you’re asking for something that isn’t in the histogram I’ll assume that there must be some data there and use a frequency that’s half the lowest frequency I have recorded”**Important footnote. The value 25 appears once in our histogram so let’s include a fudge-factor of 0.5 (i.e. half a row) in the nvl() expression for the t1 frequencies and see what happens. This is what the new results look like:


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13         .5         14            7.00
        15         15         13         11          143.00
                   16         .5         22           11.00
        17         17         11         34          374.00
                   18         .5         31           15.50
                   19         .5         36           18.00
        20         20          7         57          399.00
                   21         .5         44           22.00
        22         22          3         45          135.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233      103.5        548        1,607.50

Since we were looking for 1,608 I’m going to call that a success. I can check precision, of course, by looking at the 10053 trace file. Extracting a few critical lines:

egrep -e"Density" -e"Join Card" orcl12c_ora_6520_BASELINE.trc

    AvgLen: 3 NDV: 22 Nulls: 0 Density: 0.006250 Min: 1.000000 Max: 28.000000
    AvgLen: 3 NDV: 10 Nulls: 0 Density: 0.005000 Min: 2.000000 Max: 25.000000

Join Card:  1607.500000 = outer (100.000000) * inner (800.000000) * sel (0.020094)

The “Density” lines come from the column statistics – note the 0.00625 that matches the “average selectivity” I derived from the top-frequency figures. You might also note that the “half the least frequent value” could be derived from the t1.j1 density (0.005) * t1.num_rows (100).

The “Join Card” line is exactly what it says – the join cardinality calculation showing that the plan’s prediction of 1,608 rows was actually a rounded 1607.5

There is one more important thing to check before I start tweaking the data to see if there are any other factors involved. Is the 0.5 I stuck into the query really the value of “half the least common frequency” or is it a fixed value in all cases. A nice easy way of testing this is to update the t1 table to change one row from 22 to 25 (22 will still be present in the table and histogram before and after this test, so it’s a minimal and safe change). Making this change and re-running the calculation query leaving the 0.5 unchanged gives the following:


update t1 set j1 = 25 where j1 = 22 and rownum = 1;

...

                   21         .5         44           22.00
        22         22          2         45           90.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          2         87          174.00
                      ---------- ---------- ---------------
sum                        103.5        548        1,649.50

Without reporting all the details:

  • the estimate in the plan went up from 1,608 to 1,794
  • leaving 0.5 in the query the derived result was 1,649.5 (last few lines of output above)
  • changing the 0.5 to 1.0 the derived result was 1,794.0

Conclusion – the “fudge factor” is consistent with the model the optimizer uses with frequency histogram calculations. The optimizer models “missing” rows in the join calculation as “half the number of the least frequently occuring value**Important footnote

Filter Predicates:

After a dozen tests varying the number of buckets in the top-frequency histogram (and checking it really was still a top-frequency histogram), and tweaking the t1 (frequency histogram) data to use values on the boundaries of, or outside, the range of the t2 (top-frequency) data, I concluded that my approach was probably correct. Outer join the two histograms, restrict to the overlap, supply the “num_rows * density” figure on the top-frequency side, and “half the lowest frequency”**Important footnote on the frequency side, and the query produces the same result as the optimizer for the pure join cardinality.

So the next step is to check what happens when you add filter predicates on one, or both, sides. I listed a fragment of code earlier on to execute the pure join and count the number of rows it produced, enabling the 10053 trace and pulling the actual plan from memory at the same time. I repeated this code with 3 variations and checked the “Join Card” lines from the resulting trace files:


select count(*) from  t1, t2 where  t1.j1 = t2.j2
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2
select count(*) from  t1, t2 where  t1.j1 = t2.j2                and t2.n30 = 25
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2 and t2.n30 = 25

egrep -e"Join Card" orcl12c_ora_10447*.trc

orcl12c_ora_10447_BASELINE.trc:Join Card:  1607.500000 = outer (800.000000) * inner (100.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ1.trc:Join Card:  401.875000 = outer (800.000000) * inner (25.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ2.trc:Join Card:  53.583333 = outer (100.000000) * inner (26.666667) * sel (0.020094)
orcl12c_ora_10447_FILTJ1J2.trc:Join Card:  13.395833 = outer (26.666667) * inner (25.000000) * sel (0.020094)

As you can see in all 4 cases, Oracle reports an inner and outer cardinality estimate and a join selectivity. The join selectivity remains unchanged throughout; it’s the value we can derive from our pure join test (0.020094 = 1607.5 / (100 * 800)). All that changes is that the individual table predicates are applied to the base tables before the join selectivity is applied to the product of the filtered base table cardinalities:

  • Column n04 has 4 distinct values in 100 rows – filter cardinality = 100/4 = 25
  • Column n30 has 30 distinct values in 800 rows – filter cardinality = 800/30 = 26.66666…

Conclusion

For a single column equijoin on columns with no nulls where one column has a frequency histogram and the other has a top-frequency histogram the optimizer calculates the “pure” join cardinality using the overlapping range of column values and two approximating frequencies, then derives the filtered cardinality by applying the base table filters, calculates the cardinality of the cartesian join of the filtered data sets, then multiplies by the pure join selectivity.

 

 

**Important Footnote  Until Chinar Aliyev questioned what I had written, I had never noticed that the “half the lowest frequency” that I describe at various point in the arithmetic was anything other than a fixed fudge factor. In fact, in perfect symmetry with the expression used for the average selectivity in the top-frequency part of the calculcation, this “fudge factor” is simple “num_rows * column_density” for the column with the frequency histogram. (Whether the “half the lowest frequency” drops out as a side effect of the density calculation, or whether the column density is derived from half the lowest frequency is another matter.)

October 5, 2018

Join Cardinality – 2

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 3:37 pm BST Oct 5,2018

In the previous note I posted about Join Cardinality I described a method for calculating the figure that the optimizer would give for the special case where you had a query that:

  • joined two tables
  • used a single-column to join on equality
  • had no nulls in the join columns
  • had a perfect frequency histogram on the columns at the two ends of the join
  • had no filter predicates associated with either table

The method simply said: “Match up rows from the two frequency histograms, multiply the corresponding frequencies” and I supplied a simple SQL statement that would read and report the two sets of histogram data, doing the arithmetic and reporting the final cardinality for you. In an update I also added an adjustment needed in 11g (or, you might say, removed in 12c) where gaps in the histograms were replaced by “ghost rows” with a frequency that was half the lowest frequency in the histogram.

This is a nice place to start as the idea is very simple, and it’s likely that extensions of the basic idea will be used in all the other cases we have to consider. There are 25 possibilities that could need separate testing – though only 16 of them ought to be relevant from 12c onwards. Oracle allows for four kinds of histograms – in order of how precisely they describe the data they are:

  • Frequency – with a perfect description of the data
  • Top-N (a.k.a. Top-Frequency) – which describes all but a tiny fraction (ca. one bucket’s worth) of data perfectly
  • Hybrid – which can (but doesn’t usually, by default) describe up to 2,048 popular values perfectly and gives an approximate distribution for the rest
  • Height-balanced – which can (but doesn’t usually, by default) describe at most 1,024 popular values with some scope for misinformation.

Finally, of course, we have the general case of no histogram, using only 4 numbers (low value, high value, number of “non-null” rows, number of distinct values) to give a rough picture of the data – and the need for histograms appears, of course, when the data doesn’t look anything like an even distribution of values between the low and high with close to “number of non-null rows”/“number of distinct values” for each value.

So there are 5 possible statistical descriptions for the data in a column – which means there are 5 * 5 = 25 possible options to consider when we join two columns, or 4 * 4 = 16 if we label height-balanced histograms as obsolete and ignore them (which would be a pity because Chinar has done some very nice work explaining them).

Of course, once we’ve worked out a single-column equijoin between two tables there are plenty more options to consider:  multi-column joins, joins involving range-based predicates, joins involving more than 2 tables, and queries which (as so often happens) have predicates which aren’t involved in the joins.

For the moment I’m going to stick to the simplest case – two tables, one column, equality – and comment on the effects of filter predicates. It seems to be very straightforward as I’ll demonstrate with a new model

rem
rem     Script:         freq_hist_join_03.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1(
        id      number(8,0),
        n0040   number(4,0),
        n0090   number(4,0),
        n0190   number(4,0),
        n0990   number(4,0),
        n1      number(4,0)
)
;

create table t2(
        id      number(8,0),
        n0050   number(4,0),
        n0110   number(4,0),
        n0230   number(4,0),
        n1150   number(4,0),
        n1      number(4,0)
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   40) + 1                   n0040,
        mod(rownum,   90) + 1                   n0090,
        mod(rownum,  190) + 1                   n0190,
        mod(rownum,  990) + 1                   n0990,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   50) + 1                   n0050,
        mod(rownum,  110) + 1                   n0110,
        mod(rownum,  230) + 1                   n0230,
        mod(rownum, 1150) + 1                   n1150,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
end;
/

You’ll notice that in this script I’ve created empty tables and then populated them. This is because of an anomaly that appeared in 18.3 when I used “create as select”, and should allow the results from 18.3 be an exact match for 12c. You don’t need to pay much attention to all the Nxxx columns, they were there so I could experiment with a few variations in the selectivity of filter predicates.

Given the purpose of the demonstration I’ve gathered histograms on the column I’m going to use to join the tables (called n1 in this case), and here are the summary results:


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS
-------------------- -------------------- --------------- ------------ -----------
T1                   N1                   FREQUENCY                119         119
T2                   N1                   FREQUENCY                124         124

     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2488       2619    6,516,072
         1       2693       2599    6,999,107
         2       2635       2685    7,074,975
         3       2636       2654    6,995,944
...
       113          1          3            3
       115          1          2            2
       116          4          3           12
       117          1          1            1
       120          1          2            2
                                 ------------
sum                               188,114,543

We’ve got frequencyy histograms, and we can see that they don’t have a perfect overlap. I haven’t printed every single line from the cardinality query, just enough to show you the extreme skew, a few gaps (114, 118, 119), and the total. So here are three queries with execution plans:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));


select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
and     t2.n1150 = 25
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I’ve queried the pure join – the count was exactly the 188,114,543 predicted by the cardinality query, of course – then I’ve applied a filter to one table, then to both tables. The first filter n0990 = 20 will (given the mod(,990)) definition identify one row in 990 from the original 100,000 in t1; the second filter n1150 = 25 will identify one row in 1150 from t2. That’s filtering down to 101 rows and 87 rows respectively from the two tables. So what do we see in the plans:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:23.47 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:23.47 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    188M|    188M|00:00:23.36 |     748 |  6556K|  3619K| 8839K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.02 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.02 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    190K|    200K|00:00:00.02 |     748 |  2715K|  2715K| 1647K (0)|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T1"."N0990"=20)



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    165 |    165 |00:00:00.01 |     748 |  2715K|  2715K| 1678K (0)|
|*  3 |    TABLE ACCESS FULL| T2   |      1 |     87 |     87 |00:00:00.01 |     374 |       |       |          |
|*  4 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T2"."N1150"=25)
   4 - filter("T1"."N0990"=20)


The first execution plan shows an estimate of 188M rows – but we’ll have to check the trace file to confirm whether that’s only an approximate match to our calculation, or whether it’s an exact match. So here’s the relevant pair of lines:


Join Card:  188114543.000000 = outer (100000.000000) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 188114543 Computed: 188114543.000000

Yes, the cardinality calculation and the execution plan estimates match perfectly. But there are a couple of interesting things to note. First, Oracle seems to be deriving the cardinality by multiplying the individual cardinalities of the two tables with a figure it calls “sel” – the thing that Chinar Aliyev has labelled Jsel the “Join Selectivity”. Secondly, Oracle can’t do arithmetic (or, removing tongue from cheek) the value it’s reported for the join selectivity is reported at only 6 decimal places, but stored to far more. What is the Join Selectivity, though ? It’s the figure we derive from our cardinality query divided by the cardinality of the cartesian join of the two tables – i.e. 188,114,543 / (100,000 * 100,000).

With the clue from the first trace file, can we work out why the second and third plans show 190K and 165 rows respectively. How about this – multiply the filtered cardinalities of the two separate tables, then multiply the result by the join selectivity:

  • 1a)   n0990 = 20: gives us 1 row in every 990.    100,000 / 990 = 101.010101…    (echoing the rounded execution plan estimate).
  • 1b)   100,000 * (100,000/990) * 0.0188114543 = 190,014.69898989…    (which is in the ballpark of the plan and needs confirmation from the trace file).

 

  • 2a)   n1150 = 25: gives us 1 row in every 1,150.    100,000 / 1,150 = 86.9565217…    (echoing the rounded execution plan estimate)
  • 2b)   (100,000/990) * (100,000/1,150) * 0.0188114543 = 165.2301651..    (echoing the rounded execution plan estimate).

Cross-checking against extracts from the 10053 trace files:


Join Card:  190014.689899 = outer (101.010101) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 190015 Computed: 190014.689899

Join Card:  165.230165 = outer (86.956522) * inner (101.010101) * sel (0.018811)
Join Card - Rounded: 165 Computed: 165.230165

Conclusion.

Remembering that we’re still looking at very simple examples with perfect frequency histograms: it looks as if we can work out a “Join Selectivity” (Jsel) – the selectivity of a “pure” unfiltered join of the two tables – by querying the histogram data then use the resulting value to calculate cardinalities for simple two-table equi-joins by multiplying together the individual (filtered) table cardinality estimates and scaling by the Join Selectivity.

Acknowledgements

Most of this work is based on a document written by Chinar Aliyev in 2016 and presented at the Hotsos Symposium the same year. I am most grateful to him for responding to a recent post of mine and getting me interested in spending some time to get re-acquainted with the topic. His original document is a 35 page pdf file, so there’s plenty more material to work through, experiment with, and write about.

 

October 3, 2018

Join Cardinality

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 12:01 pm BST Oct 3,2018

Following up my “Hacking for Skew” article from a couple of days ago, Chinar Aliyev has written an article about a method for persuading the optimizer to calculate the correct cardinality estimate without using any undocumented, or otherwise dubious, mechanisms. His method essentially relies on the optimizer’s mechanism for estimating join cardinality when there are histograms at both ends of the join so I thought I’d write a short note describing the simplest possible example of the calculation – an example where the query is a single column equi-join with no nulls in either column and a perfect frequency histograms at both ends of the join.  (For a detailed description of more general cases I always refer to the work done by Alberto Dell’Era a few years ago). We start with two data sets that exhibit a strong skew in their data distributions:

rem
rem     Script:         freq_hist_join_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 254'
        );
end;
/


I’ve generated two tables of 10,000 randomly generated values using the dbms_random.normal() function, but I’ve scaled the value up by a factor of three and taken the absolute value – which has given me a range of 12 distinct integer values with a nicely skewed distribution. Then I’ve gathered stats requesting histograms of up to 254 buckets. Since I’ve tested this only on versions from 11.2.0.4 onwards this means I’ll get a perfect histogram on the n1 columns on both tables.

Now I’m going run a query that reports the values and frequencies from the two tables by querying user_tab_histograms using a variant of an analytic query I published a long time ago to convert the cumulative frequencies recorded as the endpoint values into simple frequencies. If, for some reason, this query doesn’t run very efficiently in your tests you could always /*+ materialize */ the two factored subqueries (CTEs – common table expressions):


prompt  =======================================================================
prompt  Multiply and sum matching frequencies. An outer join is NOT needed
prompt  because rows that don't match won't contributed to the join cardinality
prompt  =======================================================================

break on report skip 1
compute sum of product on report
column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         4        856        845      723,320
         5        513        513      263,169
         6        294        249       73,206
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                18,746,698

As you can see, the two columns do have a highly skewed data distribution. The pattern of the two data sets is similar though the frequencies aren’t identical, of course. The total I get from this calculation is (I claim) the cardinality (rows) estimate that the optimizer will produce for doing an equi-join on these two tables – so let’s see the test:


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the resulting output:

Session altered.
Session altered.


  COUNT(*)
----------
  18746698


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:03.23 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:03.23 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     18M|     18M|00:00:02.96 |      40 |  2616K|  2616K| 2098K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

As we can see, the estimate for the hash join is “18M” which is in the right ballpark but, in its current format, isn’t entirely helpful which is why I’ve enabled the 10053 trace to get an exact figure from the trace file, and this is what we see:


***********************
Best so far:  Table#: 0  cost: 4.352468  card: 9487.000000  bytes: 28461.000000
              Table#: 1  cost: 378.482370  card: 18467968.000000  bytes: 110807808.000000
***********************

The optimizer’s estimate is exactly the sum of the products of the frequencies of matching values from the (frequency) histogram data. There is a simple rationale for this – it gets the right answer.

For each row in t1 with value ‘X’ the (frequency) histogram on t2 tells Oracle how many rows will appear in the join, so multiplying the frequency of ‘X’ in t1 by the frequency of ‘X’ in t2 tells Oracle how many rows the ‘X’s will contribute to the join. Repeat for every distinct value that appears in both (frequency) histograms and sum the results.

As a refinement on this (very simple) example, let’s delete data from the two tables so that we have rows in t1 that won’t join to anything in t2, and vice versa – then re-gather stats, query the histograms, and check the new prediction. We want to check whether a value that appears in the t1 histogram contributes to the join cardinality estimate even if there are no matching values in the t2 histogram (and vice versa):


delete from t1 where n1 = 4;
delete from t2 where n1 = 6;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 254', no_invalidate=>false)
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 254', no_invalidate=>false)

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the output – with a little cosmetic tidying:


856 rows deleted.
249 rows deleted.

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         5        513        513      263,169
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                17,950,172


Session altered.
Session altered.


  COUNT(*)
----------
  17950172


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:02.89 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:02.89 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     17M|     17M|00:00:02.61 |      40 |  2616K|  2616K| 2134K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |   9144 |   9144 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   9751 |   9751 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")


From the 10053 trace file:
***********************
Best so far:  Table#: 0  cost: 4.340806  card: 9144.000000  bytes: 27432.000000
              Table#: 1  cost: 368.100010  card: 17950172.000000  bytes: 107701032.000000
***********************

You can see from the frequency histogram report that we “lost” values 4 and 6 from the report; then the total from the report matches the actual number of rows returned by the query, and the cardinality estimate in the plan is again in the right ballpark – with the trace file showing an exact match.

I’ve run this test on 11.2.0.4,  12.1.0.2,  12.2.0.1 and  18.3.0.0 (which generated a different set of random values) – and there’s an anomaly that appears in 11.2.0.4 (though maybe that should be “disappeared from”): the optimizer’s estimate for the cardinality was a little larger than the value generated in the query against user_tab_histograms. [Edit: Now explained (probably), see below]

Conclusion:

For an incredibly simple class of queries with perfect frequency histograms there’s a very simple way to calculate the cardinality estimate that the optimizer will predict. Match up rows from the two frequency histograms, multiply the corresponding frequencies (making sure you don’t multiply the cumulative frequencies) and sum.

This is, of course, only a tiny step in the direction of seeing how Oracle uses histograms and covers only a type of query that is probably too simple to appear in a production system, but it’s a basis on which I may build in future notes over the next few weeks.

Update (5th Oct)

The “error” in the 11g calculation irritated me a little, and I woke up this morning with an idea about the solution. In 10.2.0.4 Oracle changed the way the optimizer calculated for a predicate that used a value that did not appear in the frequency histogram: it did the arithmetic for  “half the least frequently occurring value”. So I thought I’d run up a test where for my “sum of products” query I emulated this model. I had to change my query to an “ANSI”-style full outer join, and here it is:

with f1 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'N1'
),
f2 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'N1'
)
select
        f1.value, f2.value,
        nvl(f1.frequency, 0)                t1_frequency,
        nvl(f2.frequency, 0)                t2_frequency,
        nvl(f1.frequency, &t1_least / 2) *
        nvl(f2.frequency, &t2_least / 2)    product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;

Running this code, and noting that the lowest frequency in t1 was 4, while the lowest frequency in t2 was 2, I got the following results (with the 10053 trace file summary following the output)


     VALUE      VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ---------- ------------ ------------ ------------
         0          0         2658         2532    6,730,056
         1          1         2341         2428    5,683,948
         2          2         1828         1968    3,597,504
         3          3         1305         1270    1,657,350
                    4            0          845        1,690
         5          5          513          513      263,169
         6                     294            0          294
         7          7          133          117       15,561
         8          8           40           54        2,160
         9          9           23           17          391
        10         10            5            5           25
        11         11            4            2            8
                      ------------ ------------ ------------
sum                           9144         9751   17,952,156


Join Card:  17952157.000000 = outer (9751.000000) * inner (9144.000000) * sel (0.201341)
Join Card - Rounded: 17952157 Computed: 17952157.00
 

That’s a pretty good match to the trace file result – and the difference of 1 may simply be a rounding error (despite the trace files text suggesting it is accurate to 6 d.p.). Maybe one day I’ll wake up with an inspired guess about that difference – but since it’s relevant only to 11g I’m not going to worry about it anymore.

Footnote

Following an exchange of email with Chinar Aliyev, it’s fairly clear that the “half the least frequency” can actually be derived as “table.num_rows * column.density”.

 

July 12, 2018

Cardinality Puzzle

Filed under: Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 12:57 pm BST Jul 12,2018

One of the difficulties of being a DBA and being required to solve performance problems is that you probably never have enough time to think about how you got to a solution and why the solution works; and if you don’t learn about the process itself , you just don’t get better at it. That’s why I try (at least some of the time) to write articles and books (as I did with CBO Fundamentals) that

  1. explain simple details that can be used as background facts
  2. walk through the steps of solving a problem

So here’s an example from a question on the ODC database forum asking about the cause and workaround for a bad cardinality estimate that is producing a poorly performing execution plan. It’s actually a type of problem that comes up quite frequently on large data sets and explains why a simple “gather stats” is almost guaranteed to leave you with a few headaches (regardless of whether or not you choose to include histograms as part of the process). I’m not going to offer “the answer” – I’m just going to talk about the inferences we can make from the facts supplied and where we have to go from there.

The DBA has a table holding 80,000,000,000 rows. It is list/hash partitioned with 2 partitions and 1,024 sub-partitions (per partition) but neither of the partitioning key columns appears in the query. The query runs parallel and the optimizer (presumably thanks to the specific settings of various parameters related to parallel execution uses dynamic sampling at level 3).

There is an inline view defined in the query and the DBA has isolated this as a key component of the problem and supplied a query and plan (from “explain plan”) against that view.


select * from TAB2 T
WHERE T.DT = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');
 
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                          |   479M|    76G|  1756K (14)| 05:51:14 |       |       |        |      |            |
|   1 |  PX COORDINATOR              |                          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000                 |   479M|    76G|  1756K (14)| 05:51:14 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL     |                          |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  1024 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS STORAGE FULL| TAB1                     |   479M|    76G|  1756K (14)| 05:51:14 |     1 |  2048 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))
       filter(COALESCE("TB1"."DFG",'N')='N' AND TRUNC(INTERNAL_FUNCTION("TB1"."DT_TM"))=TO_DATE(:B1,'MM/DD/YYYY
              HH24:MI:SS'))

Note
-----
   - dynamic sampling used for this statement (level=3)

The DBA’s problem is that if the estimated cardinality of this extract goes over roughly 500M the optimizer chooses a bad plan for the overall query – and on occasion this extract has given an estimate of 5 billion rows. Moreover, the actual number of rows returned by this extract is typically in the order of 40M, so the estimate is a long way off even when it’s “good enough”.

So where do we start looking to work out what’s going wrong? You’ll note, of course, that after text expansion the user’s single predicate has changed, and an extra predicate (previously hidden inside the view) has appeared; instead of just T.DT = to_date(:b1,’MM/DD/YYYY HH24:MI:SS’) we now have (cosmetically adjusted):

        COALESCE(DFG,'N')='N' 
AND     TRUNC(DT_TM)=TO_DATE(:B1,'MM/DD/YYYY HH24:MI:SS')

There are two immediately obvious threats here – first that the combination of predicates means Oracle is likely to make a mistake because it will check the individual selectivities and multiply them together to get the combined selectivity, second that the appearance of predicates of the form “function(column) = constant” means that Oracle will guess 1% as the individual selectivities.

Without checking more details we might assume that a possible quick fix (that would require no changes to existing code) would be to create a couple of virtual columns (or extended stats) to represent the two expressions and gather stats on the resulting columns – though it is a restriction of extended stats that you can’t “double up” and create a column group on the two column expressions, so there’s still some scope for a cardinality estimate that is still sufficiently bad even with this approach. We also note that if we can change the coalesce(DFG,’N’) that must have been hidden in the view to nvl(DFG,’N’) then Oracle would be able to “or expand” the nvl() and use a more appropriate selectivity for that part of the predicate.

However, the points I’ve covered so far tend to produce estimates that are too small and often much too small. So maybe the key to the problem is in the Note section that tells us that Oracle has (successfully) used dynamic sampling for this statement. In other words, all the theory of how the optimizer calculates selectivity may be irrelevant – the estimate will be based on the luck of the sample.

So let’s take a look at the (slightly edited) table stats we’ve been given:

column_name data_type num_distinct low_value      high_value     density   num_null  histogram
DT_TM       DATE           6179571 78740B1E0A383C 7876020B01262B 1.6182E-7 0         NONE
DFG         VARCHAR2             1 4E             4E             1         0         NONE

Notice that the DFG (apparently) has the value ‘N’ for every row in the table (low_value = high_value = 0x4E, num_nulls = 0). The date range is 30-Nov-2016 to 11-Feb-2018, with no histogram but 6.18M distinct values for 80 Billion rows. Neither column has a histogram.

A little arithmetic tells us that (on average) there ought to be about 182M (= 80B / 438 days) rows for any one day – and that’s worth thinking about for three separate reasons.

First, an estimate of 479M against an average of 182M isn’t too surprising if it’s based on a fairly small sample, it’s only out by a factor of 2.6. On the other hand, getting an an estimate of 5 billion – which can happen on bad days – is extremely unlikely if the data is uniformly distributed across dates.

Secondly, the DBA supplied us with some data from the recent past with an aggregate query for “trunc(dt_tm)”, with the following results:

TRUNC(DT_TM)   COUNT(*)
------------ ----------
01-FEB-18    44,254,425
02-FEB-18    46,585,349
03-FEB-18    43,383,099
04-FEB-18    32,748,364
05-FEB-18    37,993,126
06-FEB-18    39,708,994
07-FEB-18    38,696,777
08-FEB-18    41,871,780
09-FEB-18    46,702,852
10-FEB-18    42,744,870
11-FEB-18    34,971,845
12-FEB-18    37,165,983

Recent data seems to follow an average of around 40M rows per day, so the estimate of 182M that we can derive from the stored statistics is a long way off: the present is behaving very differently from the past and that’s a relatively common problem with very large data sets – though it’s more usual for rolling averages to increase from the past to the present because the data is often representing the growth of a business over time. Can we create a hypothesis to explain the discrepancy, and could that hypothesis also account for the sample producing some very strange estimates ?

Finally, slightly more subtle and only included for completeness, if this column is supposed to hold date and time to the nearest second – which is what you might expect from an Oracle date type – there are 38 million possible values (438 x 86,400) it could be holding, and that’s more than the actual number of distinct values by a factor of 6. We can also work out that 80 billion rows over 438 days is 2,000 rows per second (on average). Averages are often misleading, of course, many systems have a pattern where a working day shows most of the data created in a 12 – 16 hour window with a couple of hours of more intense activity. For reference, though: average rows per second for the recent data is roughly 40M/86400 = 460; while the average we derive from the stored statistics is 80B / 6M = 13000 rows per second; this unlikely pattern needs a “non-uniform” explanation.

How do these three thoughts help us to understand or, to be more accurate, to make a sensible guess about why the optimizer can use dynamic sampling and get a wildly variable estimate which can be 1 or 2 orders of magnitude wrong. (varying between 479M and 5,000M compared to the recent actual 40M)?

Here’s one simple idea: extrapolate the 40M rows per day over 80B rows: that’s 2,000 days (possibly rather more since businesses tend to grow). What if the dt_tm is the timestamp for the moment the row was loaded into the database, and a couple of years ago (maybe around “30th Nov 2016”) the data was restructured and the existing five years of data was loaded over a very short period of time – let’s say one week. This would leave you with 17B rows of “new” data with a dt_tm spread at 40M rows per day for most of 438 days, and 63B rows of “historic” data packed into 7 days (at 9B rows per day).

I don’t know how Oracle would have randomly selected its sample from an extremely large table with 2,048 physical data segments but it’s totally believable that a small, widely scattered sample could end up with an extremely unrepresentative subset of the data. A completely random sample of the data would produce an estimate of around 500M rows for the predicate; but it would only take a fairly small variation in the sample (taking a few too many “historic” rows) to produce a large enough change in the estimate to change the execution plan, and a rare, but not extreme variation could easily take the estimate up to 5B.

Next Steps

It would be at this point in a performance assignment that I’d be asking around to find out if my guess about a massive data load operation was correct – if I couldn’t get the answer by talking to people I’d run a query against the whole data set to check the hypothesis, because there’s clearly some sort of skew in the data that’s causing a problem. I’d also run the critical part of the query a couple of times with events 10046/level 4 and 10053 set (but only fetching the first few rows) to find out from the trace file how large a sample Oracle was using, and then run the sampling query a few times to see what the sampled results looked like. Depending on the results I’d either find a way to stop Oracle from sampling for this query or I might create a virtual column (or just extended stats since it’s 11g) on just the trunc(dt_tm), possibly with a histogram in place (maybe coded by hand) if that could isolate the special dates and leave Oracle with a better estimate of the typical date. I might find I had to change the coalesce() to an nvl() as well – or create a virtual  column – to stop the sampling.

Finally, it’s worth noting that in 11g it’s possible to create pending (table preference “PUBLISH” = FALSE) stats for testing purposes; it’s also worth noting that the default histogram on trunc(dt_tm) would be a height-balanced histogram while we could create a frequency histogram in 12c since 12c allows us to specify up to 2,048 buckets.

Footnote

If you check the ODC thread you’ll see that the OP has marked as correct a suggestion to change:

    TRUNC (TB1.DT_TM)  = to_date(:b1,'MM/DD/YYYY HH24:MI:SS');  

to

    dt_tm >= trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))
and dt_tm <  trunc(to_date(:b1,'MM/DD/YYYY HH24:MI:SS'))+1

Note that that’s “greater than or equal to” at one end and “strictly less than” at the other when using “date + 1”.

This has the effect of giving the optimizer a chance of using the low/high values of the column to produce a better (though perhaps still overlarge) and consistent estimate of the rows in the date range; and it may also stop the optimizer from doing dynamic sampling at level 3 (the “I’m guessing, let’s check” level) though it’s possible that the sampling would be disabled only if the coalesce() were changed to an nvl() as well.

Of course, from the information supplied, this looks like the OP would have to change a view definition and the run-time code to achieve the result. But in an ideal world doing things that avoid confusing the optimizer is usually the sensible strategy provided it doesn’t take an extreme amount of coding and testing.

 

June 17, 2016

Cardinality trick

Filed under: CBO,Oracle — Jonathan Lewis @ 1:02 pm BST Jun 17,2016

In the absence of a virtual column or function-based index, the optimizer uses a basic selectivity guess of 1% for a predicate of the form: “function(column) = constant”; but there is (at least) one special case where it gets clever; simple type conversion:


create table t1 nologging
as
select  cast(
                case
                        when mod(rownum,1000) = 0 then 0
                        when mod(rownum,100)  = 0 then 1
                                                  else 9
                end as varchar2(1)
        ) v1
from
        all_objects
where   rownum <= 50000  -- > comment to avoid WordPress format issue
;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace on explain

select count(*) from t1 where v1 = 9;
select count(*) from t1 where sign(v1) = 1;

set autotrace off

If you think about the table creation script you’ll agree that there are 49,500 rows where v1 = ‘9’ so the first query could (in theory) produce an estimated cardinality of 49,500. However I’ve got a datatype error in the predicate and I haven’t created a histogram – and that’s not very helpful in two different ways. The optimizer will rewrite the predicate as to_number(v1) = 9, and if it followed the general rule for function(column) = constant it would then use a guessed selectivity of 1%, which would make it report an estimated cardinality of 500 for my sample query; but in this case Oracle uses the number of distinct values for the column (i.e. 3) to infer a value for the number of distinct values for the function and then uses that in the first query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 16667 | 33334 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_NUMBER("V1")=9)

On the other hand, while the optimizer “knows” that the number of distinct values for the varchar2 will match the number of distinct numerical equivalents (not that that’s actually true), it has no idea how many of the varchar2 values will equate to negative, zero, or positive values, so the 1% selectivity re-appears for the second query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SIGN(TO_NUMBER("V1"))=1)

It shouldn’t surprise you to see that you would also get 500 as the estimated cardinality if the predicate were to read “sign(v1) = 2” — a value that the sign() function can’t take. The optimizer is using a generic rule, it doesn’t know the specifics of the function you’re using.

Footnote:

If you’re wondering when the number of distinct character values doesn’t match the number of distinct numeric values (and assuming all the character values are valid for conversion to numeric) just remember that the same number can be represented in different ways, for example you might change the original cast() that I used the generate the data to:

        cast(
                case
                        when mod(rownum,1000) = 0 then '0'
                        when mod(rownum, 100) = 0 then '1'
                        when mod(rownum,   2) = 0 then '9'
                                                  else '09'
                end as varchar2(2)
        ) v1

Now we have 4 distinct character values (so the optimizer’s estimate would drop to 15,000) but only 3 distinct numeric equivalents.

This, by the way, is why the optimizer transforms a predicate like “character_column = {numeric value}” into “to_number(character_column) = {numeric value}”, rather than converting it to “character_column = to_char({numeric value})”. A character string can only represent one numeric value while a numeric value can be displayed as an infinite number of different character strings (assuming the availability of the appropriate number of typing monkeys).

 

December 8, 2014

Cardinality Change

Filed under: CBO,Oracle — Jonathan Lewis @ 9:35 pm GMT Dec 8,2014

Here’s an entertaining little change across versions of Oracle, brought to my attention by Tony Hasler during UKOUG Tech 14. It’s a join cardinality estimate, so here are a couple of tables to demonstrate the issue – the only columns needed are the alpha_06 columns, but I reused some code from other demonstrations to create my test case, so there are lots of irrelevant columns in the create table script:


create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        rownum                                          id,
        mod(rownum-1,200)                               mod_200,
        trunc(dbms_random.value(0,300))                 rand_300,
        mod(rownum-1,10000)                             mod_10000,
        trunc(sysdate) +
                trunc(dbms_random.value(0,1000))        date_1000,
        dbms_random.string('l',6)                       alpha_06,
        dbms_random.string('l',20)                      alpha_20
from
        generator,
        generator
where
        rownum <= 1e6
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

create table t2 nologging as select * from t1;
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1')

I’m going to join t1 to t2 with a predicate based on the alpha_06 columns – using a LIKE predicate. Before I do so I’ll point out that there are are 1,000,000 rows in the table, and (checking the column stats) 985,920 distinct values for alpha_06. Here’s my query, with the execution plan I got from 11.1.0.7:


select
        count(*)
from
        t1, t2
where
        t2.alpha_06 like t1.alpha_06
;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1122M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |    50G|   651G|  1122M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1123   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   | 50000 |   341K|  1122   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The 50,000 cardinality estimate for t2 looks like the standard 5% guess for “column >= {unknown value}”, following which the join cardinality of 50G is the same 5% guess applied to the Cartesian join between t1 and t2 (1M * 1M * 0.05). It’s not a good estimate in my case because the right answer happens to be close to 1M rows, specifically 1,003,176. So let’s upgrade to 11.2.0.4 and see what we get instead:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1050M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |  2014K|    26M|  1050M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1051   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   |     2 |    14 |  1050   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The estimate has dropped from 50 Billion rows down to 2 Million – a factor of about 25,000: possibly an indicator that the algorithm has changed, and that a few people might find execution plans changing as they upgrade to a newer version of Oracle. The change occurred at 11.2.0.2 as revealed by fix control 9303766 which has the description: “use 1/NDV+1/NROWS for col1 LIKE col2 selectivities”.

Just as a quick check on the arithmetic: there are 1 million rows in table t2, with (as noted above) 985,920 distinct values in the column, so the selectivity should be: 1/1000000 + 1/985920 = 2.014281 * e-6. Multiply the selectivity by 1e6 and you get 2, the cardinality estimate for t2; multiply the selectivity by 1M*1M (the Cartesian join) and you get 2,014,281, the cardinality estimate of the join. QED.

There are workarounds, of course. One would be to reverse out the fix control, either as an initialisation parameter or in a session logon trigger, another might be to modify the SQL – I think the following would be equivalent:


select
        *
from    t1, t2
where
        t2.alpha_06 like substr(t1.alpha_06,1,length(t1.alpha_06))||'%'
and     t1.alpha_06 is not null
and     t2.alpha_06 is not null

This changes the critical predicate from the form “col1 like col2” to “col1 like {unknown value from function}” i.e. back to a case where the optimizer uses the 5% guess, and the cardinality estimates go back the original values.

November 5, 2014

Cardinality Feedback

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 6:43 pm GMT Nov 5,2014

A fairly important question, and a little surprise, appeared on Oracle-L a couple of days ago. Running 11.2.0.3 a query completed quickly on the first execution then ran very slowly on the second execution because Oracle had used cardinality feedback to change the plan. This shouldn’t really be entirely surprising – if you read all the notes that Oracle has published about cardinality feedback – but it’s certainly a little counter-intuitive.

Of course there are several known bugs related to cardinality feedback that could cause this anomaly to appear – a common complaint seems to relate to views on the right-hand (inner table) side of nested loop joins, and cardinality feedback being used on a table inside the view; but there’s an inherent limitation to cardinality feedback that makes it fairly easy to produce an example of a query doing more work on the second execution.

The limitation is that cardinality feedback generally can’t be used (sensibly) on all the tables where better information is needed. This blog describes the simplest example I can come up with to demonstrate the point. Inevitably it’s a little contrived, but it captures the type of guesswork and mis-estimation that can make the problem appear in real data sets. Here’s the query I’m going to use:


select
        t1.n1, t1.n2, t2.n1, t2.n2
from
        t1, t2
where
        t1.n1 = 0
and     t1.n2 = 1000
and     t2.id = t1.id
and     t2.n1 = 0
and     t2.n2 = 400
;

You’ll notice that I’ve got two predicates on both tables so, in the absence of “column-group” extended stats the optimizer will enable cardinality feedback as the query runs to check whether or not its “independent columns” treatment of the predicates gives a suitably accurate estimate of cardinality and a reasonable execution plan. If the estimates are bad enough the optimizer will use information it has gathered as the query ran as an input to re-optimising the query on the next execution.

So here’s the trick.  I’m going to set up the data so that there seem to be only two sensible plans:  (a) full scan of t1, with nested loop unique index access to t2; (b) full scan of t2, with nested loop unique index access to t1. But I’m going to make sure that the optimizer thinks that (a) is more efficient than (b) by making making the stats look as if (on average) the predicates on t1 should return 100 rows while the predicates on t2 return 200 rows.

On the other hand I’ve set the data up so that (for this specific set of values) t1 returns 1,000 rows which means Oracle will decide that its estimate was so far out that it will re-optimize with 1,000 as the estimated single table access cardinality for t1 – and that means it will decide to do the nested loop from t2 to t1. But what the optimizer doesn’t know (and hasn’t been able to find out by running the first plan) is that with this set of predicates t2 will return 20,000 rows to drive the nested loop into t1 – and the new execution plan will do more buffer gets and use more CPU (and time) than the old plan. Since cardinality feedback is applied only once, the optimizer won’t be able to take advantage of the second execution to change the plan again, or even to switch back to the first plan.

Here’s the setup so you can test the behaviour for yourselves:

rem
rem     Script:         cardinality_feedback_trap.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2014
rem
rem     Last tested 
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        mod(rownum,2)           n1,
        mod(rownum,2000)        n2,     -- 200 rows for each value on average
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4e5 -- > comment to avoid wordpress format issue
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        mod(rownum,2)           n1,
        2 * mod(rownum,1000)    n2,     -- 400 rows for each value on average, same range as t1
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4e5 -- > comment to avoid wordpress format issue
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

--
-- Now update both tables to put the data out of sync with the statistics
-- We need a skewed value in t1 that is out by a factor of at least 8 (triggers use of CF)
-- We need a skewed value in t2 that is so bad that the second plan is more resource intensive than the first
--

update t1 set n2 = 1000 where n2 between 1001 and 1019;
update t2 set n2 =  400 where n2 between 402 and 598;
commit;

Here are the execution plans for the first and second executions (with rowsource execution statistics enabled, and the “allstats last” option used in a call to dbms_xplan.display_cursor()).


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1049 (100)|      0 |00:00:00.36 |   11000 |   6588 |
|   1 |  NESTED LOOPS                |       |      1 |    100 |  1049   (3)|      0 |00:00:00.36 |   11000 |   6588 |
|   2 |   NESTED LOOPS               |       |      1 |    100 |  1049   (3)|   2000 |00:00:00.35 |    9000 |   6552 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |    100 |   849   (4)|   2000 |00:00:00.30 |    6554 |   6551 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |   2000 |      1 |     1   (0)|   2000 |00:00:00.02 |    2446 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2    |   2000 |      1 |     2   (0)|      0 |00:00:00.01 |    2000 |     36 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1249 (100)|      0 |00:00:00.66 |   32268 |   1246 |
|   1 |  NESTED LOOPS                |       |      1 |    200 |  1249   (3)|      0 |00:00:00.66 |   32268 |   1246 |
|   2 |   NESTED LOOPS               |       |      1 |    200 |  1249   (3)|  20000 |00:00:00.56 |   12268 |    687 |
|*  3 |    TABLE ACCESS FULL         | T2    |      1 |    200 |   849   (4)|  20000 |00:00:00.12 |    6559 |    686 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |  20000 |      1 |     1   (0)|  20000 |00:00:00.19 |    5709 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1    |  20000 |      1 |     2   (0)|      0 |00:00:00.15 |   20000 |    559 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))

Note
-----
   - cardinality feedback used for this statement

The second plan does fewer reads because of the buffering side effects from the first plan – but that’s not what the optimizer is looking at. The key feature is that the first plan predicts 100 rows for t1, with 100 starts for the index probe, but discovers 2,000 rows and does 2,000 probes. Applying cardinality feedback the optimizer decides that fetching 200 rows from t2 and probing t1 200 times will be lower cost than running the join the other way round with the 2,000 rows it now knows it will get – but at runtime Oracle actually gets 20,000 rows, does three times as many buffer gets, and spends twice as much time as it did on the first plan.

Hinting

Oracle hasn’t been able to learn (in time) that t2 will supply 20,000 rows – but if you knew this would happen you could use the cardinality() hint to tell the optimizer the truth about both tables /*+ cardinality(t1 2000) cardinality(t2 20000) */ this is the plan you would get:

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |  1698 (100)|      0 |00:00:00.06 |   13109 |  13105 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   2000 |  1698   (4)|      0 |00:00:00.06 |   13109 |  13105 |  1696K|  1696K| 1647K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2000 |   849   (4)|   2000 |00:00:00.05 |    6554 |   6552 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T2   |      1 |  20000 |   849   (4)|  20000 |00:00:00.09 |    6555 |   6553 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   2 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Unfortunately, unless you have used hints, it doesn’t matter how many times you re-run the query after cardinality feedback has pushed you into the bad plan – it’s not going to change again (unless you mess around flushing the shared pool or using dbms_shared_pool.purge() to kick out the specific statement).

Upgrade

If you upgrade to 12c the optimizer does a much better job of handling this query – it produces an adaptive execution plan (starting with the nested loop join but dynamically switching to the hash join as the query runs). Here’s the full adaptive plan pulled from memory after the first execution – as you can see both the t1/t2 nested loop and hash joins were considered, then the nested loop was discarded in mid-execution. Checking the 10053 trace file I found that Oracle has set the inflexion point (cross-over from NLJ to HJ) at 431 rows.


----------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT              |       |      1 |        |  1063 (100)|      0 |00:00:00.06 |   13113 |  13107 |       |       |          |
|  *  1 |  HASH JOIN                    |       |      1 |    100 |  1063   (3)|      0 |00:00:00.06 |   13113 |  13107 |  1519K|  1519K| 1349K (0)|
|-    2 |   NESTED LOOPS                |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.11 |    6556 |   6553 |       |       |          |
|-    3 |    NESTED LOOPS               |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.10 |    6556 |   6553 |       |       |          |
|-    4 |     STATISTICS COLLECTOR      |       |      1 |        |            |   2000 |00:00:00.09 |    6556 |   6553 |       |       |          |
|  *  5 |      TABLE ACCESS FULL        | T1    |      1 |    100 |   863   (4)|   2000 |00:00:00.08 |    6556 |   6553 |       |       |          |
|- *  6 |     INDEX UNIQUE SCAN         | T2_PK |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|- *  7 |    TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  *  8 |   TABLE ACCESS FULL           | T2    |      1 |      1 |     2   (0)|  20000 |00:00:00.07 |    6557 |   6554 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   6 - access("T2"."ID"="T1"."ID")
   7 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   8 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

Footnote:

For reference, here are a couple of the bug (or patch)  numbers associated with cardinality feedback:

  • Patch 13454409: BAD CARDINALITY FROM FEEDBACK (CFB) ON VIEW RHS OF NLJ
  • Bug 16837274 (fixed in 12.2): Bad cost estimate for object on RHS of NLJ
  • Bug 12557401: The table that is being incorrectly adjusted is in the right hand side of a nested loops.
  • Bug 8521689: Bad cardinality feedback estimate for view on right-hand side of NLJ

 

November 29, 2009

Cardinality

Filed under: CBO — Jonathan Lewis @ 7:49 pm GMT Nov 29,2009

I received an email recently which said:

I have a question about Page 54 of Cost Based Oracle.

10g Update
As with equalities, 10.1.0.4 suddenly changes what happens when you fall outside the low/high limits. The arithmetic, or picture, used for ranges outside the limits is just the same as the new mechanism we saw for equalities.

I can’t work out how the formula provides a cardinality of 82 for the values ” month_no between 14 and 17″. Can you please elaborate on the out of bounds formula ?

The example in the book was only intended to show the general pattern of behaviour, and I didn’t attempt to explain what I thought the optimizer was doing – which is an odd little oversight.

(more…)

May 11, 2009

Cardinality feedback

Filed under: CBO,Execution plans,Statistics,Troubleshooting — Jonathan Lewis @ 8:09 pm BST May 11,2009

One of my faviourite presentations in Collaborate 09 was a quick tip (30 minutes) on Tuning by Cardinality Feedback – i.e. comparing the optimizer’s predictions of cardinality with the actual rowcounts returned when you run the query.

The strategy is one that I first saw demonstrated in a presentation by Wolfgang Breitling a few years ago at one of the Hotsos seminars – but this latest example demonstrated an important variation on the theme in a short, precise, lesson.

(more…)

April 25, 2008

Cardinality Change

Filed under: CBO,Execution plans,Histograms,Statistics,Troubleshooting — Jonathan Lewis @ 8:13 pm BST Apr 25,2008

Here’s an interesting point that’s just been raised on the Oracle-L list-server:

Metalink bug: 5483301.8

If a column has a frequency histogram and a query uses an equality predicate over the column looking for a value that does not exist in the histogram then the CBO was estimating a cardinality of 1. This could favor Nested Loops too much.

The fix introduces a notable change in behavior: instead of estimating a cardinality of 1 then with this fix CBO estimates a cardinality of (0.5 * the lowest cardinality found in the histogram). This can result in plan changes.

(more…)

April 13, 2022

Adaptive Joins

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 1:53 pm BST Apr 13,2022

There’s a question on the Oracle Forums at the moment about a query that’s taking a long time to parse. Even after being reduced to one prebuilt (currently remote) table with two non-correlated outer joins to it the parse time is several hundred seconds. This seems fairly bizarre – I have seen some very long parse times from Oracle, but 10 minutes for 3 tables is well over the top; it did remind me, though of a note I started a few years ago of a 4 table join taking 4 seconds to parse, so I thought I’d present the query, the plan, and a little chat on debugging. Here’s the query:

select
        /*+ parallel(t4,3) */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

I’m expecting a simple cascade of hash joins, with t1, t2 and t3 – the “small” tables – turning into “build” tables, then t4 – the “large” table – passing through each of them in turn until the penultimate rowsource is aggregated.

Here’s the execution plan — which looks pretty much as I expected it to – but there’s something wrong about it that isn’t visible in the output. Why is the query (plan) saying it took 0.07 seconds to complete (A-time), returning only 60 rows, when my SQL*Plus session didn’t return any data for 4 seconds

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   300 (100)|          |        |      |            |     60 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR                |          |      1 |        |            |          |        |      |            |     60 |00:00:00.07 |       5 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)          | :TQ10004 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,04 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    HASH GROUP BY               |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |     60 |00:00:00.01 |       0 |      0 |  1394K|  1394K|     3/0/0|
|   4 |     PX RECEIVE                 |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,04 | PCWP |            |    180 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND HASH              | :TQ10003 |      0 |      1 |   300   (4)| 00:00:01 |  Q1,03 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY            |          |      3 |      1 |   300   (4)| 00:00:01 |  Q1,03 | PCWP |            |    180 |00:00:00.14 |    6114 |   6018 |  1394K|  1394K|     3/0/0|
|*  7 |        HASH JOIN               |          |      3 |   8460 |   299   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.14 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|   8 |         JOIN FILTER CREATE     | :BF0000  |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|   9 |          PX RECEIVE            |          |      3 |     22 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     75 |00:00:00.02 |       0 |      0 |       |       |          |
|  10 |           PX SEND BROADCAST    | :TQ10000 |      0 |     22 |     2   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  11 |            PX SELECTOR         |          |      3 |        |            |          |  Q1,00 | SCWC |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 12 |             TABLE ACCESS FULL  | T3       |      1 |     22 |     2   (0)| 00:00:01 |  Q1,00 | SCWP |            |     25 |00:00:00.01 |       3 |      0 |       |       |          |
|* 13 |         HASH JOIN              |          |      3 |  27322 |   297   (4)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  14 |          JOIN FILTER CREATE    | :BF0001  |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  15 |           PX RECEIVE           |          |      3 |     21 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     72 |00:00:00.01 |       0 |      0 |       |       |          |
|  16 |            PX SEND BROADCAST   | :TQ10001 |      0 |     21 |     2   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  17 |             PX SELECTOR        |          |      3 |        |            |          |  Q1,01 | SCWC |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 18 |              TABLE ACCESS FULL | T2       |      1 |     21 |     2   (0)| 00:00:01 |  Q1,01 | SCWP |            |     24 |00:00:00.01 |       3 |      0 |       |       |          |
|* 19 |          HASH JOIN             |          |      3 |  92953 |   294   (3)| 00:00:01 |  Q1,03 | PCWP |            |  12600 |00:00:00.10 |    6114 |   6018 |  2171K|  2171K|     3/0/0|
|  20 |           JOIN FILTER CREATE   | :BF0002  |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  21 |            PX RECEIVE          |          |      3 |     19 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |            |     63 |00:00:00.01 |       0 |      0 |       |       |          |
|  22 |             PX SEND BROADCAST  | :TQ10002 |      0 |     19 |     2   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  23 |              PX SELECTOR       |          |      3 |        |            |          |  Q1,02 | SCWC |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|* 24 |               TABLE ACCESS FULL| T1       |      1 |     19 |     2   (0)| 00:00:01 |  Q1,02 | SCWP |            |     21 |00:00:00.01 |       3 |      0 |       |       |          |
|  25 |           JOIN FILTER USE      | :BF0000  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  26 |            JOIN FILTER USE     | :BF0001  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  27 |             JOIN FILTER USE    | :BF0002  |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|  28 |              PX BLOCK ITERATOR |          |      3 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWC |            |  19517 |00:00:00.09 |    6114 |   6018 |       |       |          |
|* 29 |               TABLE ACCESS FULL| T4       |     48 |    343K|   289   (2)| 00:00:01 |  Q1,03 | PCWP |            |  19517 |00:00:00.06 |    6114 |   6018 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
  12 - SEL$1 / T3@SEL$1
  18 - SEL$1 / T2@SEL$1
  24 - SEL$1 / T1@SEL$1
  29 - SEL$1 / T4@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_fix_control' '16923858:5')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "T4"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T3"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T4"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$1")
      USE_HASH(@"SEL$1" "T4"@"SEL$1")
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$1" "T3"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T4"@"SEL$1" BROADCAST NONE)
      PX_JOIN_FILTER(@"SEL$1" "T4"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T2"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T3"@"SEL$1" NONE BROADCAST)
      PX_JOIN_FILTER(@"SEL$1" "T3"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T2"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "T3"@"SEL$1")
      GBY_PUSHDOWN(@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T3"."ID"="T4"."ID3")
  12 - filter((TO_NUMBER("T3"."SMALL_VC")=1 OR TO_NUMBER("T3"."SMALL_VC")=2 OR TO_NUMBER("T3"."SMALL_VC")=3 OR TO_NUMBER("T3"."SMALL_VC")=4 OR TO_NUMBER("T3"."SMALL_VC")=5))
  13 - access("T2"."ID"="T4"."ID2")
  18 - filter((TO_NUMBER("T2"."SMALL_VC")=1 OR TO_NUMBER("T2"."SMALL_VC")=2 OR TO_NUMBER("T2"."SMALL_VC")=3 OR TO_NUMBER("T2"."SMALL_VC")=4))
  19 - access("T1"."ID"="T4"."ID1")
  24 - filter((TO_NUMBER("T1"."SMALL_VC")=1 OR TO_NUMBER("T1"."SMALL_VC")=2 OR TO_NUMBER("T1"."SMALL_VC")=3))
  29 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0002,"T4"."ID1"),SYS_OP_BLOOM_FILTER(:BF0001,"T4"."ID2"),SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID3")))


One possible trouble-shooting step is simply to re-run the query, taking a snapshot of the session activity stats (v$mystat) and the session events (v$session_event) to see what they tell you (if anything) – here’s the critical extract from the session stats:

Name                                      Value
----                                      -----
CPU used when call started                  423
CPU used by this session                    429
parse time cpu                              411
parse time elapsed                          419

Most of the time is parse time, spent on the CPU. (If this had been a much larger scale problem and had occurred in the recent past I might have looked at ASH (v$active_session_hsitory) for any samples for the correct SQL_ID, and seen the problem revealed in the in_parse, in_hard_parse columns.

So let’s enable event 10053 and run the query again – but since it’s “only” 4 seconds, let’s tweak the timer option to report any step that took longer than 0.1 seconds. The default timer setting is a whole second (10^6 microseconds), so we set the fix-control to 5 to get 0.1 seconds (10^5 microseconds).

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

-- run the query, find the trace file

 grep TIMER or19_ora_23370.trc 

Here’s the output from the call to grep: it looks like group by placement (GBP) is causing a problem.

TIMER:  GBP: costing SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER: GBP: iteration (#1) SEL$1 cpu: 0.303 sec elapsed: 0.309 sec
TIMER:  GBP: costing SEL$1565E019 cpu: 0.293 sec elapsed: 0.298 sec
TIMER: GBP: iteration (#2) SEL$1565E019 cpu: 0.294 sec elapsed: 0.299 sec
TIMER:  GBP: costing SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER: GBP: iteration (#3) SEL$23EAFE84 cpu: 0.528 sec elapsed: 0.533 sec
TIMER:  GBP: costing SEL$B5D97CA0 cpu: 0.533 sec elapsed: 0.540 sec
TIMER: GBP: iteration (#4) SEL$B5D97CA0 cpu: 0.534 sec elapsed: 0.540 sec
TIMER:  GBP: costing SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.531 sec
TIMER: GBP: iteration (#5) SEL$6C9B46B6 cpu: 0.531 sec elapsed: 0.532 sec
TIMER:  GBP: costing SEL$ED1298E3 cpu: 0.522 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#8) SEL$ED1298E3 cpu: 0.523 sec elapsed: 0.524 sec
TIMER:  GBP: costing SEL$5968095A cpu: 0.523 sec elapsed: 0.523 sec
TIMER: GBP: iteration (#9) SEL$5968095A cpu: 0.524 sec elapsed: 0.523 sec
TIMER:  GBP: costing SEL$4CA81688 cpu: 0.525 sec elapsed: 0.527 sec
TIMER: GBP: iteration (#12) SEL$4CA81688 cpu: 0.526 sec elapsed: 0.528 sec
TIMER: Group-By Placement SEL$1 cpu: 3.766 sec elapsed: 3.793 sec
TIMER: Cost-Based Transformations (Overall) SEL$1 cpu: 3.769 sec elapsed: 3.795 sec
TIMER: Access Path Analysis (Final) SEL$1 cpu: 0.288 sec elapsed: 0.289 sec
TIMER: SQL Optimization (Overall) SEL$1 cpu: 4.072 sec elapsed: 4.108 sec

If you check further up the page, though, you’ll see in the Outline Information that Oracle has not used group by placement (it has done a “group by pushdown” but that’s different, and relates to aggregation in parallel execution. So one quick hack we could try is to add the hint /*+ no_place_group_by(@sel$1) */ to the query just to see what happens – and here’s the effect on the parse time:

Name                                      Value
----                                      -----
parse time cpu                               33
parse time elapsed                           34

Problem solved – provided we can get the hint into the code (by hand, or SQL Patch, etc.) But the question still remains: where did the time go? The trace file was fairly long (375,000 lines for the original, compared to 32,000 for the hinted) but a rapid scan seemed in order – and something very quickly caught my attention. It was pretty easy to spot because something big and nasty had happened 8 times.

The answer was in “Adaptive Plans”, which (mostly) get flagged with the label “AP:” in the 10053 trace file, for example:

AP: Computing costs for inflection point at min value 0.00
AP: Using binary search for inflection point search
AP: Costing Join for ADM inflection point at card 0.00
AP: Costing Join for ADM inflection point at card 0.00

You can see here that the optimizer is searching for an “inflection point”, that means it’s “thinking about” an adaptive join, and searching for the number of rows where a switch between a nested loop join and a hash join makes sense.

Notice particularly the comment about “using binary search”. After calculating the cost of the “best” join using the current estimates of cardinality for the incoming rowsource the optimizer starts calculating the possible costs (nested loop or hash, it ignores merge) for a set of alternative cardinalities until it finds the cardinality where the nested loop join and hash join have the same cost. That’s the number of rows that Oracle will use at run time to decide whether it should switch from its initial selection of join method to the alternative.

If the initial join selection was a hash join (i.e. large volume startegy) Oracle will use an alternative cardinality of 1 to start its list of estimates – on the assumption that that would produce a nested loop plan and then keep doubling the estimate until the resulting plan switched to a hash join, then close in on the break point by halving and doubling the remaining gap between the NL estimate and the HJ estimate.

If the initial join selection was a nested loop join Oracle will use a “worst case scenario” for the incoming estimate (acting, broadly speaking, as if every filter at that point had had a selectivity of 1 – i.e. 100% of the available data), and start by halving the estimate. This is fine when the starting estimate is (for example) for the first table in the join order and the esimate is just a few million rows.

Here’s the first estimate in my case (you’ll need a wide screen, or very small print):

6772:AP: Computing costs for inflection point at max value 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00 

After you’ve halved that number about 1,000 times you’re down to values in the region of a few thousand. And when you have to cost a couple of joins every time you halve, and when you’ve gone through the whole process 8 times that’s a lot of CPU.

In fact, with the no_place_group_by hint there was still one point where the optimizer did this adaptive join work – which probably accounts for most of the final 0.33 CPU seconds – but it didn’t show up in any separately reported timer events.

Of course the final little test of this parse time hypothesis is to add the hint /*+ no_adaptive_plan */ – so I did that, and the parse time was still about 0.3 seconds! Investigation of the 10053 trace showed that even with the hint in place the optimizer still went through that one huge binary chop – but when it had worked out the inflection point it printed the message:

AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

According to the hint report the hint was valid, however, so that behaviour looks a little buggy. It then occurred to me that maybe I could have hinted /*+ no_adaptive_plan(@sel$1) */ – and that worked properly with the trace reporting:

AP: Checking validity for query block SEL$1, sqlid=7fjtvwktcmsgq
AP: Adaptive plans bypassed for query block SEL$1 due to disabled by hint (no_adaptive_plan)
AP: Adaptive joins bypassed for query block SEL$1 due to adaptive plans disabled

Had I not realised that this was possible I have fallen back on the hint /*+ opt_param() */ to set the parameter optimizer_adaptive_plans to false for the query in which case the following message (and no other “AP:” message) appeared near the start of the trace:

AP: Adaptive joins bypassed for query block SEL$1 due to disabled by adaptive join parameter

If you want to experiment on your own system here’s the script to create the data – the script name reflects the fact that I found this example by accident while working on something completely different:

rem
rem     Script:         12c_vector_transform_c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2015
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem             12.1.0.2

drop table t1;
drop table t2;
drop table t3;
drop table t4;
purge recyclebin;

create table t1 
as
select
        rownum          id,
        to_char(mod(rownum,10)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(mod(rownum,12)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(mod(rownum,14)) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

spool 12c_vector_transform_c

set timing on
set linesize 255
set trimspool on
set pagesize 90
set arraysize 100
set serveroutput off

execute snap_my_stats.start_snap

alter session set statistics_level = all;

alter session set "_fix_control"='16923858:5';
alter session set events '10053 trace name context forever';

select
        /*+ 
--              opt_param('optimizer_adaptive_plans' 'false')
--              no_adaptive_plan
--              no_adaptive_plan(@sel$1)
--              no_place_group_by(@sel$1)
                parallel(t4,3)
        */
        t1.small_vc,
        t2.small_vc,
        t3.small_vc,
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
and     t1.small_vc in (1,2,3)
and     t2.small_vc in (1,2,3,4)
and     t3.small_vc in (1,2,3,4,5)
group by
        t1.small_vc,
        t2.small_vc,
        t3.small_vc
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats all note -bytes'));

alter session set events '10053 trace name context off';

set serveroutput on
execute snap_my_stats.end_snap

spool off

March 7, 2022

v$sql_hint

Filed under: Hints,Oracle — Jonathan Lewis @ 9:50 am GMT Mar 7,2022

Here’s a quick and dirty script I thought I’d published a very long time ago. It has to be run as SYS and queries v$sql_hint to produce a list of all the things which (at least in theory) could be used as hints somewhere in the code.

I’ve ordered the list by the columns version then version_outline. I believe the version is supposed to be the version in which the hint first appeared (even if it wasn’t documented) and the version_outline is the version at which the hint could appear as part of a query’s “Outline Information” (which can then be used to generate a Stored Outline or SQL Plan Baseline – or SQL Patch). You’ll notice that a lot of the hints don’t have a version_outline.

One of the conveniences of this report is the appearance of the inverse column. For many hints there is an “opposite” hint. Hints often come in pairs (especially in recent versions of Oracle) following the pattern XXXX / NO_XXXX though there are many hints that don’t have an inverse and a few that don’t follow the pattern (e.g. use_concat / no_expand).

Perhaps one of the most useful parts of the report comes from the target_level which is a bitmap that can be decoded to show the query level(s) that the hint can apply to. There are 4 levels (corresponding to bits 0 to 3): statement, query block, object and join and that’s a useful piece of information to know when you’re trying to make best use of hints. For example, the cardinality hint is one that is not well-known and most of the people who knew anything about it seemed to assume that it could apply only to a table when in fact it can also be used for a query block (which can be extremely useful with subquery factoring) or a multi-table join (although that option is quite hard to use effectively except in the simplest cases).

rem
rem     Script:         sql_hints.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2007
rem

set linesize 255
set pagesize 100
set trimspool on
set tab off

column  name            format a32
column  inverse         format a32
column  sql_feature     format a30
column  version         format a8
column  outline         format a8
column  class           format a35
column  target          format 999
column  prop            format 999

column  bit0            format a6
column  bit1            format a6
column  bit2            format a6
column  bit3            format a6

column  bit4            format a3
column  bit5            format a3
column  bit6            format a3
column  bit7            format a3
column  bit8            format a3

break on version skip 1
compute count of version on version

spool sql_hints

select
        version,
        version_outline         outline,
        name,
        inverse,
        sql_feature,
        class,
        decode(bitand(target_level,1),1,'State')        bit0,
        decode(bitand(target_level,2),2,'QBlock')       bit1,
        decode(bitand(target_level,4),4,'Object')       bit2,
        decode(bitand(target_level,8),8,'Join')         bit3,
        target_level                                    target,
        decode(bitand(property,16),16,'16')     bit4,
        decode(bitand(property,32),32,'32')     bit5,
        decode(bitand(property,64),64,'64')     bit6,
        decode(bitand(property,128),128,'128')  bit7,
        decode(bitand(property,256),256,'256')  bit8,
        property                                prop
from
        v$sql_hint
order by
        to_number(substr(version,1,3)),
        version,
        to_number(substr(version_outline,1,3)) nulls last,
        version_outline nulls last,
        name
;

spool off

Footnote

There’s also a property column, which seems to be another bitmap but (at present) only using bits 4 to 8 of “something”. That points to the possibility that the level and property might be derived from the same couple of bytes but a quick check on x$qksht (the x$ structure underneath gv$sql_hint) shows two different columns (level_qkshtsyrow, props_qkshtsyrow) – with a 4 byte difference in their offsets. (This may be a fake, of course, since the address (addr) of every single “row” in the structure is the same, which means that when you’re looking at the x$ it’s been dynamically constructed from somewhere else.)

I haven’t yet worked out a consistent pattern for the bits in the property column, even though there are a number “coincidences” that look almost convincing, but I keep reporting the property – broken down by bits as well as the actual value – in this report in case one day I spot some clear correlation between the bits and the function of the hints.

If you’re interested in the results and don’t have access to v$sql_hint, here’s the output from 21.3.0.0. Of the 388 hints listed only about 120 are (officially) documented anywhere – there are a lot of hints you’re not supposed to use:

VERSION  OUTLINE  NAME                             INVERSE                          SQL_FEATURE                    CLASS                               BIT0   BIT1   BIT2   BIT3   TARGET BIT BIT BIT BIT BIT PROP
-------- -------- -------------------------------- -------------------------------- ------------------------------ ----------------------------------- ------ ------ ------ ------ ------ --- --- --- --- --- ----
8.0.0    8.1.7    CLUSTER                                                           QKSFM_CBO                      ACCESS                                            Object             4 16              256  272
         8.1.7    INDEX                            NO_INDEX                         QKSFM_INDEX                    ACCESS                                            Object             4 16  32          256  304
         8.1.7    ROWID                                                             QKSFM_CBO                      ACCESS                                            Object             4 16              256  272
         10.1.0   NO_MERGE                         MERGE                            QKSFM_CVM                      MERGE                                      QBlock Object             6 16                    16
                  EXPR_CORR_CHECK                                                   QKSFM_CBO                      EXPR_CORR_CHECK                     State                            1                        0
                  MERGE_CONST_ON                                                    QKSFM_CBO                      MERGE_CONST_ON                      State                            1                        0
                  NL_AJ                                                             QKSFM_JOIN_METHOD              ANTIJOIN                                   QBlock                    2 16                    16
                  NL_SJ                                                             QKSFM_JOIN_METHOD              SEMIJOIN                                   QBlock                    2 16                    16
                  NO_MONITORING                                                     QKSFM_ALL                      NO_MONITORING                       State                            1                        0
                  NO_ORDER_ROLLUPS                                                  QKSFM_TRANSFORMATION           NO_ORDER_ROLLUPS                           QBlock                    2                        0
                  NO_STATS_GSETS                                                    QKSFM_ALL                      NO_STATS_GSETS                             QBlock                    2                        0
                  ORDERED_PREDICATES                                                QKSFM_CBO                      ORDERED_PREDICATES                         QBlock                    2 16                    16
                  QUEUE_CURR                                                        QKSFM_CBO                      ACCESS                                            Object             4                 256  256
                  QUEUE_ROWP                                                        QKSFM_CBO                      ACCESS                                            Object             4                 256  256
--------
      14

8.1.0    8.1.5    BITMAP                                                            QKSFM_CBO                      BITMAP                                     QBlock                    2                 256  256
         8.1.5    NO_PUSH_PRED                     PUSH_PRED                        QKSFM_FILTER_PUSH_PRED         PUSH_PRED                                  QBlock Object             6 16                    16
         8.1.5    PUSH_PRED                        NO_PUSH_PRED                     QKSFM_FILTER_PUSH_PRED         PUSH_PRED                                  QBlock Object             6 16                    16
         8.1.5    RULE                                                              QKSFM_RBO                      MODE                                State                            1 16                    16
         8.1.7    AND_EQUAL                                                         QKSFM_AND_EQUAL                ACCESS                                            Object             4 16  32          256  304
         8.1.7    DRIVING_SITE                                                      QKSFM_ALL                      DRIVING_SITE                                      Object             4                 256  256
         8.1.7    FACT                             NO_FACT                          QKSFM_STAR_TRANS               FACT                                              Object             4 16              256  272
         8.1.7    FULL                                                              QKSFM_FULL                     ACCESS                                            Object             4 16              256  272
         8.1.7    HASH                                                              QKSFM_ALL                      ACCESS                                            Object             4 16              256  272
         8.1.7    HASH_AJ                                                           QKSFM_JOIN_METHOD              ANTIJOIN                                   QBlock                    2 16                    16
         8.1.7    HASH_SJ                                                           QKSFM_JOIN_METHOD              SEMIJOIN                                   QBlock                    2 16                    16
         8.1.7    INDEX_COMBINE                                                     QKSFM_INDEX_COMBINE            ACCESS                                            Object             4 16  32      128 256  432
         8.1.7    INDEX_DESC                       NO_INDEX                         QKSFM_INDEX_DESC               ACCESS                                            Object             4 16  32          256  304
         8.1.7    INDEX_FFS                                                         QKSFM_INDEX_FFS                ACCESS                                            Object             4 16  32          256  304
         8.1.7    MERGE_AJ                                                          QKSFM_JOIN_METHOD              ANTIJOIN                                   QBlock                    2 16                    16
         8.1.7    MERGE_SJ                                                          QKSFM_JOIN_METHOD              SEMIJOIN                                   QBlock                    2 16                    16
         8.1.7    NO_EXPAND                        USE_CONCAT                       QKSFM_USE_CONCAT               OR_EXPAND                                  QBlock                    2 16                    16
         8.1.7    NO_FACT                          FACT                             QKSFM_STAR_TRANS               FACT                                              Object             4 16              256  272
         8.1.7    ORDERED                                                           QKSFM_CBO                      ORDERED                                    QBlock                    2 16                    16
         8.1.7    SEMIJOIN_DRIVER                                                   QKSFM_CBO                      SEMIJOIN_DRIVER                            QBlock                    2 16                    16
         8.1.7    STAR_TRANSFORMATION              NO_STAR_TRANSFORMATION           QKSFM_STAR_TRANS               STAR_TRANSFORMATION                        QBlock Object             6 16                    16
         8.1.7    SWAP_JOIN_INPUTS                 NO_SWAP_JOIN_INPUTS              QKSFM_CBO                      SWAP_JOIN_INPUTS                                  Object             4 16              256  272
         8.1.7    USE_CONCAT                       NO_EXPAND                        QKSFM_USE_CONCAT               OR_EXPAND                                  QBlock                    2 16                    16
         8.1.7    USE_HASH                         NO_USE_HASH                      QKSFM_USE_HASH                 JOIN                                              Object             4 16      64  128 256  464
         8.1.7    USE_MERGE                        NO_USE_MERGE                     QKSFM_USE_MERGE                JOIN                                              Object             4 16      64      256  336
         8.1.7    USE_NL                           NO_USE_NL                        QKSFM_USE_NL                   JOIN                                              Object             4 16      64      256  336
         10.1.0   MERGE                            NO_MERGE                         QKSFM_CVM                      MERGE                                      QBlock Object             6 16                    16
         10.2.0.1 ALL_ROWS                                                          QKSFM_ALL_ROWS                 MODE                                State                            1 16                    16
         10.2.0.1 FIRST_ROWS                                                        QKSFM_FIRST_ROWS               MODE                                State                            1 16                    16
         10.2.0.5 PUSH_SUBQ                        NO_PUSH_SUBQ                     QKSFM_TRANSFORMATION           PUSH_SUBQ                                  QBlock                    2 16                    16
                  APPEND                           NOAPPEND                         QKSFM_CBO                      APPEND                              State                            1                        0
                  CACHE                            NOCACHE                          QKSFM_EXECUTION                CACHE                                             Object             4                 256  256
                  CHOOSE                                                            QKSFM_CHOOSE                   MODE                                State                            1 16                    16
                  DEREF_NO_REWRITE                                                  QKSFM_ALL                      DEREF_NO_REWRITE                    State                            1                        0
                  INDEX_ASC                        NO_INDEX                         QKSFM_INDEX_ASC                ACCESS                                            Object             4 16  32          256  304
                  NESTED_TABLE_GET_REFS                                             QKSFM_ALL                      NESTED_TABLE_GET_REFS               State                            1                        0
                  NOAPPEND                         APPEND                           QKSFM_CBO                      APPEND                              State                            1                        0
                  NOCACHE                          CACHE                            QKSFM_EXECUTION                CACHE                                             Object             4                 256  256
                  NOPARALLEL                       SHARED                           QKSFM_PARALLEL                 SHARED                              State         Object             5                 256  256
                  NO_PARALLEL_INDEX                PARALLEL_INDEX                   QKSFM_PQ                       PARALLEL_INDEX                                    Object             4     32          256  288
                  PARALLEL_INDEX                   NO_PARALLEL_INDEX                QKSFM_PQ                       PARALLEL_INDEX                                    Object             4     32          256  288
                  PIV_GB                                                            QKSFM_ALL                      PIV_GB                                     QBlock                    2                        0
                  PIV_SSF                                                           QKSFM_ALL                      PIV_SSF                                    QBlock                    2                        0
                  REMOTE_MAPPED                                                     QKSFM_ALL                      REMOTE_MAPPED                              QBlock                    2 16              256  272
                  SHARED                           NO_PARALLEL                      QKSFM_PARALLEL                 SHARED                              State         Object             5                 256  256
                  STAR                                                              QKSFM_STAR_TRANS               STAR                                       QBlock                    2 16                    16
                  TIV_GB                                                            QKSFM_ALL                      PIV_GB                                     QBlock                    2                        0
                  TIV_SSF                                                           QKSFM_ALL                      PIV_SSF                                    QBlock                    2                        0
                  USE_ANTI                                                          QKSFM_CBO                      USE_ANTI                                          Object             4 16              256  272
                  USE_SEMI                                                          QKSFM_CBO                      USE_SEMI                                          Object             4 16              256  272
--------
      50

8.1.5    8.1.7    NO_ACCESS                                                         QKSFM_ALL                      NO_ACCESS                                         Object             4                 256  256
         8.1.7    NO_INDEX                         INDEX                            QKSFM_INDEX                    NO_INDEX                                          Object             4 16  32          256  304
         8.1.7    NO_REWRITE                       REWRITE                          QKSFM_TRANSFORMATION           REWRITE                                    QBlock                    2 16                    16
         8.1.7    PQ_DISTRIBUTE                                                     QKSFM_PQ_DISTRIBUTE            PQ_DISTRIBUTE                                     Object             4 16              256  272
         8.1.7    REWRITE                          NO_REWRITE                       QKSFM_TRANSFORMATION           REWRITE                                    QBlock                    2 16                    16
         10.1.0.3 INDEX_JOIN                                                        QKSFM_INDEX_JOIN               ACCESS                                            Object             4 16  32          256  304
         10.2.0.1 DOMAIN_INDEX_NO_SORT             DOMAIN_INDEX_SORT                QKSFM_CBO                      DOMAIN_INDEX_SORT                          QBlock                    2                        0
         10.2.0.1 DOMAIN_INDEX_SORT                DOMAIN_INDEX_NO_SORT             QKSFM_CBO                      DOMAIN_INDEX_SORT                          QBlock                    2                        0
                  BUFFER                           NO_BUFFER                        QKSFM_CBO                      BUFFER                                     QBlock                    2                        0
                  BYPASS_UJVC                                                       QKSFM_CBO                      BYPASS_UJVC                                QBlock                    2                        0
                  CACHE_CB                         NOCACHE                          QKSFM_CBO                      CACHE_CB                                          Object             4                 256  256
                  CUBE_GB                                                           QKSFM_CBO                      CUBE_GB                                    QBlock                    2                        0
                  NESTED_TABLE_SET_SETID                                            QKSFM_ALL                      NESTED_TABLE_SET_SETID              State                            1                        0
                  NO_BUFFER                        BUFFER                           QKSFM_CBO                      BUFFER                                     QBlock                    2                        0
                  RESTORE_AS_INTERVALS                                              QKSFM_CBO                      RESTORE_AS_INTERVALS                       QBlock                    2                        0
                  SAVE_AS_INTERVALS                                                 QKSFM_CBO                      SAVE_AS_INTERVALS                          QBlock                    2                        0
                  SCN_ASCENDING                                                     QKSFM_ALL                      SCN_ASCENDING                       State                            1                        0
--------
      17

8.1.6    10.1.0   NO_UNNEST                        UNNEST                           QKSFM_UNNEST                   UNNEST                                     QBlock                    2 16                    16
         10.1.0   UNNEST                           NO_UNNEST                        QKSFM_UNNEST                   UNNEST                                     QBlock                    2 16                    16
         10.1.0.3 LEADING                                                           QKSFM_JOIN_ORDER               LEADING                                                  Join        8 16              256  272
                  SYS_PARALLEL_TXN                                                  QKSFM_CBO                      SYS_PARALLEL_TXN                           QBlock                    2                        0
--------
       4


VERSION  OUTLINE  NAME                             INVERSE                          SQL_FEATURE                    CLASS                               BIT0   BIT1   BIT2   BIT3   TARGET BIT BIT BIT BIT BIT PROP
-------- -------- -------------------------------- -------------------------------- ------------------------------ ----------------------------------- ------ ------ ------ ------ ------ --- --- --- --- --- ----
9.0.0    10.2.0.1 INDEX_SS                         NO_INDEX_SS                      QKSFM_INDEX_SS                 ACCESS                                            Object             4 16  32          256  304
         10.2.0.1 INDEX_SS_DESC                    NO_INDEX_SS                      QKSFM_INDEX_SS_DESC            ACCESS                                            Object             4 16  32          256  304
         10.2.0.1 PQ_MAP                           PQ_NOMAP                         QKSFM_PQ_MAP                   PQ_MAP                                            Object             4 16              256  272
         10.2.0.1 PQ_NOMAP                         PQ_MAP                           QKSFM_PQ_MAP                   PQ_MAP                                            Object             4 16              256  272
         18.1.0   INLINE                           MATERIALIZE                      QKSFM_TRANSFORMATION           INLINE                                     QBlock                    2 16                    16
         18.1.0   MATERIALIZE                      INLINE                           QKSFM_TRANSFORMATION           INLINE                                     QBlock                    2 16                    16
                  ANTIJOIN                                                          QKSFM_TRANSFORMATION           ANTIJOIN                                   QBlock                    2 16                    16
                  BYPASS_RECURSIVE_CHECK                                            QKSFM_ALL                      BYPASS_RECURSIVE_CHECK                     QBlock                    2                        0
                  CARDINALITY                                                       QKSFM_STATS                    CARDINALITY                                QBlock Object Join       14 16              256  272
                  CPU_COSTING                      NO_CPU_COSTING                   QKSFM_CPU_COSTING              CPU_COSTING                                QBlock                    2 16                    16
                  CURSOR_SHARING_EXACT                                              QKSFM_CBO                      CURSOR_SHARING_EXACT                       QBlock                    2                        0
                  DML_UPDATE                                                        QKSFM_CBO                      DML_UPDATE                          State                            1                        0
                  GBY_CONC_ROLLUP                                                   QKSFM_TRANSFORMATION           GBY_CONC_ROLLUP                            QBlock                    2                        0
                  HWM_BROKERED                                                      QKSFM_CBO                      HWM_BROKERED                               QBlock                    2                        0
                  INDEX_RRS                                                         QKSFM_CBO                      ACCESS                                            Object             4 16  32          256  304
                  INDEX_SS_ASC                     NO_INDEX_SS                      QKSFM_INDEX_SS_ASC             ACCESS                                            Object             4 16  32          256  304
                  LOCAL_INDEXES                                                     QKSFM_CBO                      LOCAL_INDEXES                              QBlock                    2                        0
                  MV_MERGE                                                          QKSFM_TRANSFORMATION           MV_MERGE                                   QBlock                    2                        0
                  NO_CPU_COSTING                   CPU_COSTING                      QKSFM_CPU_COSTING              CPU_COSTING                                QBlock                    2 16                    16
                  NO_PRUNE_GSETS                                                    QKSFM_TRANSFORMATION           NO_PRUNE_GSETS                             QBlock                    2                        0
                  NO_SEMIJOIN                      SEMIJOIN                         QKSFM_TRANSFORMATION           SEMIJOIN                                   QBlock                    2 16                    16
                  OVERFLOW_NOMOVE                                                   QKSFM_CBO                      OVERFLOW_NOMOVE                            QBlock                    2                        0
                  SEMIJOIN                         NO_SEMIJOIN                      QKSFM_TRANSFORMATION           SEMIJOIN                                   QBlock                    2 16                    16
                  SKIP_EXT_OPTIMIZER                                                QKSFM_CBO                      SKIP_EXT_OPTIMIZER                         QBlock                    2 16                    16
                  SQLLDR                                                            QKSFM_CBO                      SQLLDR                              State                            1                        0
                  USE_TTT_FOR_GSETS                                                 QKSFM_TRANSFORMATION           USE_TTT_FOR_GSETS                          QBlock                    2                        0
--------
      26

9.2.0    10.1.0   EXPAND_GSET_TO_UNION             NO_EXPAND_GSET_TO_UNION          QKSFM_TRANSFORMATION           EXPAND_GSET_TO_UNION                       QBlock                    2                        0
         10.1.0   NO_EXPAND_GSET_TO_UNION          EXPAND_GSET_TO_UNION             QKSFM_TRANSFORMATION           EXPAND_GSET_TO_UNION                       QBlock                    2                        0
         10.2.0.5 NO_PUSH_SUBQ                     PUSH_SUBQ                        QKSFM_TRANSFORMATION           PUSH_SUBQ                                  QBlock                    2 16                    16
         11.1.0.6 FORCE_XML_QUERY_REWRITE          NO_XML_QUERY_REWRITE             QKSFM_XML_REWRITE              FORCE_XML_QUERY_REWRITE             State                            1                        0
         11.1.0.6 NO_XML_QUERY_REWRITE             FORCE_XML_QUERY_REWRITE          QKSFM_XML_REWRITE              FORCE_XML_QUERY_REWRITE             State                            1                        0
                  DYNAMIC_SAMPLING                                                  QKSFM_DYNAMIC_SAMPLING         DYNAMIC_SAMPLING                           QBlock Object             6 16              256  272
                  DYNAMIC_SAMPLING_EST_CDN                                          QKSFM_DYNAMIC_SAMPLING_EST_CDN DYNAMIC_SAMPLING_EST_CDN                          Object             4 16              256  272
                  IGNORE_WHERE_CLAUSE                                               QKSFM_ALL                      IGNORE_WHERE_CLAUSE                 State                            1                        0
                  NO_QKN_BUFF                                                       QKSFM_CBO                      NO_QKN_BUFF                                QBlock                    2                        0
                  NO_REF_CASCADE                   REF_CASCADE_CURSOR               QKSFM_CBO                      REF_CASCADE_CURSOR                  State                            1                        0
                  REF_CASCADE_CURSOR               NO_REF_CASCADE                   QKSFM_CBO                      REF_CASCADE_CURSOR                  State                            1                        0
                  SYS_DL_CURSOR                                                     QKSFM_CBO                      SYS_DL_CURSOR                       State                            1                        0
                  SYS_RID_ORDER                                                     QKSFM_ALL                      SYS_RID_ORDER                              QBlock                    2                        0
--------
      13

10.1.0.3 10.1.0.3 NO_BASETABLE_MULTIMV_REWRITE     REWRITE                          QKSFM_ALL                      REWRITE                                    QBlock                    2 16                    16
         10.1.0.3 NO_INDEX_FFS                     INDEX_FFS                        QKSFM_INDEX_FFS                NO_INDEX_FFS                                      Object             4 16  32          256  304
         10.1.0.3 NO_INDEX_SS                      INDEX_SS                         QKSFM_INDEX_SS                 NO_INDEX_SS                                       Object             4 16  32          256  304
         10.1.0.3 NO_MULTIMV_REWRITE               REWRITE                          QKSFM_ALL                      REWRITE                                    QBlock                    2 16                    16
         10.1.0.3 NO_SET_TO_JOIN                   SET_TO_JOIN                      QKSFM_SET_TO_JOIN              SET_TO_JOIN                                QBlock                    2 16                    16
         10.1.0.3 NO_STAR_TRANSFORMATION           STAR_TRANSFORMATION              QKSFM_STAR_TRANS               STAR_TRANSFORMATION                        QBlock Object             6 16                    16
         10.1.0.3 NO_SWAP_JOIN_INPUTS              SWAP_JOIN_INPUTS                 QKSFM_CBO                      SWAP_JOIN_INPUTS                                  Object             4 16              256  272
         10.1.0.3 NO_USE_HASH                      USE_HASH                         QKSFM_USE_HASH                 NO_USE_HASH                                       Object             4 16      64      256  336
         10.1.0.3 NO_USE_MERGE                     USE_MERGE                        QKSFM_USE_MERGE                NO_USE_MERGE                                      Object             4 16      64      256  336
         10.1.0.3 NO_USE_NL                        USE_NL                           QKSFM_USE_NL                   NO_USE_NL                                         Object             4 16      64      256  336
         10.1.0.3 SET_TO_JOIN                      NO_SET_TO_JOIN                   QKSFM_SET_TO_JOIN              SET_TO_JOIN                                QBlock                    2 16                    16
         10.2.0.1 IGNORE_OPTIM_EMBEDDED_HINTS                                       QKSFM_ALL                      IGNORE_OPTIM_EMBEDDED_HINTS         State                            1                        0
         10.2.0.1 OPTIMIZER_FEATURES_ENABLE                                         QKSFM_ALL                      OPTIMIZER_FEATURES_ENABLE           State                            1 16              256  272
                  COLUMN_STATS                                                      QKSFM_STATS                    TABLE_STATS                         State                            1 16              256  272
                  FBTSCAN                                                           QKSFM_CBO                      FBTSCAN                             State                            1                        0
                  GATHER_PLAN_STATISTICS                                            QKSFM_GATHER_PLAN_STATISTICS   GATHER_PLAN_STATISTICS              State                            1                        0
                  INCLUDE_VERSION                                                   QKSFM_ALL                      INCLUDE_VERSION                     State                            1                        0
                  INDEX_STATS                                                       QKSFM_STATS                    TABLE_STATS                         State                            1 16              256  272
                  MODEL_DONTVERIFY_UNIQUENESS                                       QKSFM_TRANSFORMATION           MODEL_DONTVERIFY_UNIQUENESS                QBlock                    2                        0
                  MODEL_MIN_ANALYSIS                                                QKSFM_TRANSFORMATION           MODEL_MIN_ANALYSIS                         QBlock                    2                        0
                  MODEL_NO_ANALYSIS                                                 QKSFM_ALL                      MODEL_MIN_ANALYSIS                         QBlock                    2                        0
                  MODEL_PUSH_REF                   NO_MODEL_PUSH_REF                QKSFM_TRANSFORMATION           MODEL_PUSH_REF                             QBlock                    2                        0
                  NESTED_TABLE_FAST_INSERT                                          QKSFM_ALL                      NESTED_TABLE_FAST_INSERT            State                            1                        0
                  NO_MODEL_PUSH_REF                MODEL_PUSH_REF                   QKSFM_ALL                      MODEL_PUSH_REF                             QBlock                    2                        0
                  NO_PARALLEL                      SHARED                           QKSFM_CBO                      SHARED                              State         Object             5                 256  256
                  NO_PARTIAL_COMMIT                                                 QKSFM_CBO                      NO_PARTIAL_COMMIT                   State                            1                        0
                  NO_QUERY_TRANSFORMATION                                           QKSFM_TRANSFORMATION           NO_QUERY_TRANSFORMATION             State                            1 16                    16
                  OPAQUE_TRANSFORM                                                  QKSFM_TRANSFORMATION           OPAQUE_TRANSFORM                    State                            1                        0
                  OPAQUE_XCANONICAL                                                 QKSFM_TRANSFORMATION           OPAQUE_XCANONICAL                   State                            1                        0
                  OPT_ESTIMATE                                                      QKSFM_OPT_ESTIMATE             OPT_ESTIMATE                               QBlock Object Join       14 16              256  272
                  QB_NAME                                                           QKSFM_ALL                      QB_NAME                                    QBlock                    2                 256  256
                  RESTRICT_ALL_REF_CONS                                             QKSFM_ALL                      RESTRICT_ALL_REF_CONS               State                            1                        0
                  REWRITE_OR_ERROR                                                  QKSFM_TRANSFORMATION           REWRITE                                    QBlock                    2                        0
                  SKIP_UNQ_UNUSABLE_IDX                                             QKSFM_CBO                      SKIP_UNQ_UNUSABLE_IDX               State                            1                        0
                  STREAMS                                                           QKSFM_CBO                      STREAMS                             State                            1                        0
                  TABLE_STATS                                                       QKSFM_STATS                    TABLE_STATS                         State                            1 16              256  272
                  TRACING                                                           QKSFM_EXECUTION                TRACING                             State                            1                        0
                  USE_NL_WITH_INDEX                NO_USE_NL                        QKSFM_USE_NL_WITH_INDEX        USE_NL_WITH_INDEX                                 Object             4 16  32          256  304
                  USE_WEAK_NAME_RESL                                                QKSFM_ALL                      USE_WEAK_NAME_RESL                  State                            1                        0
                  VECTOR_READ                                                       QKSFM_CBO                      VECTOR_READ                         State                            1                        0
                  VECTOR_READ_TRACE                                                 QKSFM_CBO                      VECTOR_READ_TRACE                   State                            1                        0
                  X_DYN_PRUNE                                                       QKSFM_CBO                      X_DYN_PRUNE                                QBlock                    2                        0
--------
      42

10.2.0.1 10.2.0.1 BITMAP_TREE                                                       QKSFM_BITMAP_TREE              ACCESS                                            Object             4 16  32          256  304
         10.2.0.1 ELIMINATE_JOIN                   NO_ELIMINATE_JOIN                QKSFM_TABLE_ELIM               ELIMINATE_JOIN                                    Object             4 16                    16
         10.2.0.1 ELIMINATE_OBY                    NO_ELIMINATE_OBY                 QKSFM_OBYE                     ELIMINATE_OBY                              QBlock                    2 16                    16
         10.2.0.1 NO_ELIMINATE_JOIN                ELIMINATE_JOIN                   QKSFM_TABLE_ELIM               ELIMINATE_JOIN                                    Object             4 16                    16
         10.2.0.1 NO_ELIMINATE_OBY                 ELIMINATE_OBY                    QKSFM_OBYE                     ELIMINATE_OBY                              QBlock                    2 16                    16
         10.2.0.1 NO_PULL_PRED                     PULL_PRED                        QKSFM_PULL_PRED                PULL_PRED                                         Object             4 16                    16
         10.2.0.1 OLD_PUSH_PRED                                                     QKSFM_OLD_PUSH_PRED            OLD_PUSH_PRED                              QBlock Object             6 16                    16

VERSION  OUTLINE  NAME                             INVERSE                          SQL_FEATURE                    CLASS                               BIT0   BIT1   BIT2   BIT3   TARGET BIT BIT BIT BIT BIT PROP
-------- -------- -------------------------------- -------------------------------- ------------------------------ ----------------------------------- ------ ------ ------ ------ ------ --- --- --- --- --- ----
10.2.0.1 10.2.0.1 OPT_PARAM                                                         QKSFM_ALL                      OPT_PARAM                           State                            1 16              256  272
         10.2.0.1 OUTLINE                                                           QKSFM_ALL                      OUTLINE                                    QBlock                    2                        0
         10.2.0.1 OUTLINE_LEAF                                                      QKSFM_ALL                      OUTLINE_LEAF                               QBlock                    2                        0
         10.2.0.1 PULL_PRED                        NO_PULL_PRED                     QKSFM_PULL_PRED                PULL_PRED                                         Object             4 16                    16
         10.2.0.1 RBO_OUTLINE                                                       QKSFM_RBO                      RBO_OUTLINE                         State                            1                        0
         10.2.0.5 NO_USE_HASH_AGGREGATION          USE_HASH_AGGREGATION             QKSFM_ALL                      USE_HASH_AGGREGATION                       QBlock                    2                        0
         10.2.0.5 USE_HASH_AGGREGATION             NO_USE_HASH_AGGREGATION          QKSFM_ALL                      USE_HASH_AGGREGATION                       QBlock                    2                        0
         11.1.0.6 NO_PX_JOIN_FILTER                PX_JOIN_FILTER                   QKSFM_PX_JOIN_FILTER           PX_JOIN_FILTER                                    Object             4 16      64      256  336
         11.1.0.6 NO_XML_DML_REWRITE                                                QKSFM_XML_REWRITE              NO_XML_DML_REWRITE                  State                            1                        0
         11.1.0.6 PX_JOIN_FILTER                   NO_PX_JOIN_FILTER                QKSFM_PX_JOIN_FILTER           PX_JOIN_FILTER                                    Object             4 16      64      256  336
                  DBMS_STATS                                                        QKSFM_DBMS_STATS               DBMS_STATS                          State                            1                        0
                  INLINE_XMLTYPE_NT                                                 QKSFM_ALL                      INLINE_XMLTYPE_NT                   State                            1                        0
                  MODEL_COMPILE_SUBQUERY                                            QKSFM_TRANSFORMATION           MODEL_COMPILE_SUBQUERY                     QBlock                    2                        0
                  MODEL_DYNAMIC_SUBQUERY                                            QKSFM_TRANSFORMATION           MODEL_DYNAMIC_SUBQUERY                     QBlock                    2                        0
                  NO_CARTESIAN                                                      QKSFM_ALL                      NO_CARTESIAN                                      Object             4 16      64      256  336
                  NO_SQL_TUNE                                                       QKSFM_ALL                      NO_SQL_TUNE                         State                            1                        0
                  PRECOMPUTE_SUBQUERY                                               QKSFM_TRANSFORMATION           PRECOMPUTE_SUBQUERY                        QBlock                    2                        0
                  PRESERVE_OID                                                      QKSFM_ALL                      PRESERVE_OID                        State                            1                        0
--------
      25

10.2.0.2 10.2.0.2 CONNECT_BY_COST_BASED            NO_CONNECT_BY_COST_BASED         QKSFM_TRANSFORMATION           CONNECT_BY_COST_BASED                      QBlock                    2 16                    16
         10.2.0.2 CONNECT_BY_FILTERING             NO_CONNECT_BY_FILTERING          QKSFM_ALL                      CONNECT_BY_FILTERING                       QBlock                    2 16                    16
         10.2.0.2 NO_CONNECT_BY_COST_BASED         CONNECT_BY_COST_BASED            QKSFM_TRANSFORMATION           CONNECT_BY_COST_BASED                      QBlock                    2 16                    16
         10.2.0.2 NO_CONNECT_BY_FILTERING          CONNECT_BY_FILTERING             QKSFM_ALL                      CONNECT_BY_FILTERING                       QBlock                    2 16                    16
--------
       4

10.2.0.3 10.2.0.3 NATIVE_FULL_OUTER_JOIN           NO_NATIVE_FULL_OUTER_JOIN        QKSFM_ALL                      NATIVE_FULL_OUTER_JOIN                     QBlock                    2 16                    16
         10.2.0.3 NO_NATIVE_FULL_OUTER_JOIN        NATIVE_FULL_OUTER_JOIN           QKSFM_ALL                      NATIVE_FULL_OUTER_JOIN                     QBlock                    2 16                    16
         10.2.0.3 NUM_INDEX_KEYS                                                    QKSFM_CBO                      ACCESS                                            Object             4 16  32          256  304
--------
       3

10.2.0.4 10.2.0.4 CONNECT_BY_COMBINE_SW            NO_CONNECT_BY_COMBINE_SW         QKSFM_ALL                      CONNECT_BY_COMBINE_SW                      QBlock                    2 16                    16
         10.2.0.4 NO_CONNECT_BY_COMBINE_SW         CONNECT_BY_COMBINE_SW            QKSFM_ALL                      CONNECT_BY_COMBINE_SW                      QBlock                    2 16                    16
--------
       2

10.2.0.5 10.2.0.5 CONNECT_BY_CB_WHR_ONLY           NO_CONNECT_BY_CB_WHR_ONLY        QKSFM_TRANSFORMATION           CONNECT_BY_CB_WHR_ONLY                     QBlock                    2 16                    16
         10.2.0.5 GBY_PUSHDOWN                     NO_GBY_PUSHDOWN                  QKSFM_ALL                      GBY_PUSHDOWN                               QBlock                    2 16                    16
         10.2.0.5 NO_CONNECT_BY_CB_WHR_ONLY        CONNECT_BY_CB_WHR_ONLY           QKSFM_TRANSFORMATION           CONNECT_BY_CB_WHR_ONLY                     QBlock                    2 16                    16
         10.2.0.5 NO_GBY_PUSHDOWN                  GBY_PUSHDOWN                     QKSFM_ALL                      GBY_PUSHDOWN                               QBlock                    2 16                    16
--------
       4

11.1.0.6 11.1.0.6 COST_XML_QUERY_REWRITE           NO_COST_XML_QUERY_REWRITE        QKSFM_COST_XML_QUERY_REWRITE   COST_XML_QUERY_REWRITE              State                            1                        0
         11.1.0.6 DB_VERSION                                                        QKSFM_ALL                      DB_VERSION                          State                            1 16              256  272
         11.1.0.6 DOMAIN_INDEX_FILTER              NO_DOMAIN_INDEX_FILTER           QKSFM_CBO                      DOMAIN_INDEX_FILTER                               Object             4 16  32          256  304
         11.1.0.6 INDEX_RS_ASC                                                      QKSFM_INDEX_RS_ASC             ACCESS                                            Object             4 16  32          256  304
         11.1.0.6 INDEX_RS_DESC                                                     QKSFM_INDEX_RS_DESC            ACCESS                                            Object             4 16  32          256  304
         11.1.0.6 NLJ_BATCHING                     NO_NLJ_BATCHING                  QKSFM_EXECUTION                ACCESS                                            Object             4 16              256  272
         11.1.0.6 NLJ_PREFETCH                     NO_NLJ_PREFETCH                  QKSFM_EXECUTION                NLJ_PREFETCH                                      Object             4 16              256  272
         11.1.0.6 NO_COST_XML_QUERY_REWRITE        COST_XML_QUERY_REWRITE           QKSFM_COST_XML_QUERY_REWRITE   NO_COST_XML_QUERY_REWRITE           State                            1                        0
         11.1.0.6 NO_DOMAIN_INDEX_FILTER           DOMAIN_INDEX_FILTER              QKSFM_CBO                      NO_DOMAIN_INDEX_FILTER                            Object             4 16  32          256  304
         11.1.0.6 NO_NLJ_BATCHING                  NLJ_BATCHING                     QKSFM_EXECUTION                ACCESS                                            Object             4 16              256  272
         11.1.0.6 NO_NLJ_PREFETCH                  NLJ_PREFETCH                     QKSFM_EXECUTION                NLJ_PREFETCH                                      Object             4 16              256  272
         11.1.0.6 NO_OUTER_JOIN_TO_INNER           OUTER_JOIN_TO_INNER              QKSFM_OUTER_JOIN_TO_INNER      OUTER_JOIN_TO_INNER                        QBlock Object             6 16                    16
         11.1.0.6 NO_PLACE_GROUP_BY                PLACE_GROUP_BY                   QKSFM_PLACE_GROUP_BY           PLACE_GROUP_BY                             QBlock                    2 16                    16
         11.1.0.6 NO_SUBQUERY_PRUNING              SUBQUERY_PRUNING                 QKSFM_CBO                      SUBQUERY_PRUNING                                  Object             4 16              256  272
         11.1.0.6 NO_USE_INVISIBLE_INDEXES         USE_INVISIBLE_INDEXES            QKSFM_INDEX                    USE_INVISIBLE_INDEXES               State                            1                        0
         11.1.0.6 NO_XMLINDEX_REWRITE              XMLINDEX_REWRITE                 QKSFM_XMLINDEX_REWRITE         XMLINDEX_REWRITE                    State                            1                        0
         11.1.0.6 NO_XMLINDEX_REWRITE_IN_SELECT    XMLINDEX_REWRITE_IN_SELECT       QKSFM_XMLINDEX_REWRITE         XMLINDEX_REWRITE                    State                            1                        0
         11.1.0.6 OUTER_JOIN_TO_INNER              NO_OUTER_JOIN_TO_INNER           QKSFM_OUTER_JOIN_TO_INNER      OUTER_JOIN_TO_INNER                        QBlock Object             6 16                    16
         11.1.0.6 PLACE_GROUP_BY                   NO_PLACE_GROUP_BY                QKSFM_PLACE_GROUP_BY           PLACE_GROUP_BY                             QBlock                    2 16                    16
         11.1.0.6 SUBQUERY_PRUNING                 NO_SUBQUERY_PRUNING              QKSFM_CBO                      SUBQUERY_PRUNING                                  Object             4 16              256  272
         11.1.0.6 USE_INVISIBLE_INDEXES            NO_USE_INVISIBLE_INDEXES         QKSFM_INDEX                    USE_INVISIBLE_INDEXES               State                            1                        0
         11.1.0.6 USE_MERGE_CARTESIAN                                               QKSFM_USE_MERGE_CARTESIAN      JOIN                                              Object             4 16      64      256  336
         11.1.0.6 XMLINDEX_REWRITE                 NO_XMLINDEX_REWRITE              QKSFM_XMLINDEX_REWRITE         XMLINDEX_REWRITE                    State                            1                        0
         11.1.0.6 XMLINDEX_REWRITE_IN_SELECT       NO_XMLINDEX_REWRITE_IN_SELECT    QKSFM_XMLINDEX_REWRITE         XMLINDEX_REWRITE                    State                            1                        0
         11.1.0.6 XML_DML_RWT_STMT                                                  QKSFM_XML_REWRITE              XML_DML_RWT_STMT                    State                            1                        0
                  CHECK_ACL_REWRITE                NO_CHECK_ACL_REWRITE             QKSFM_CHECK_ACL_REWRITE        CHECK_ACL_REWRITE                   State                            1                        0
                  MONITOR                          NO_MONITOR                       QKSFM_ALL                      MONITOR                             State                            1                        0
                  NO_CHECK_ACL_REWRITE             CHECK_ACL_REWRITE                QKSFM_CHECK_ACL_REWRITE        NO_CHECK_ACL_REWRITE                State                            1                        0
                  NO_LOAD                                                           QKSFM_EXECUTION                NO_LOAD                             State                            1                        0
                  NO_MONITOR                       MONITOR                          QKSFM_ALL                      MONITOR                             State                            1                        0
                  NO_RESULT_CACHE                  RESULT_CACHE                     QKSFM_EXECUTION                RESULT_CACHE                               QBlock                    2                        0
                  RESULT_CACHE                     NO_RESULT_CACHE                  QKSFM_EXECUTION                RESULT_CACHE                               QBlock                    2                        0
--------
      32

11.1.0.7          BIND_AWARE                       NO_BIND_AWARE                    QKSFM_CURSOR_SHARING           BIND_AWARE                          State                            1                        0
                  CHANGE_DUPKEY_ERROR_INDEX                                         QKSFM_DML                      CHANGE_DUPKEY_ERROR_INDEX                         Object             4     32          256  288
                  IGNORE_ROW_ON_DUPKEY_INDEX                                        QKSFM_DML                      IGNORE_ROW_ON_DUPKEY_INDEX                        Object             4     32          256  288
                  NO_BIND_AWARE                    BIND_AWARE                       QKSFM_CURSOR_SHARING           BIND_AWARE                          State                            1                        0
                  RETRY_ON_ROW_CHANGE                                               QKSFM_DML                      RETRY_ON_ROW_CHANGE                 State                            1                        0
--------
       5

11.2.0.1 11.2.0.1 COALESCE_SQ                      NO_COALESCE_SQ                   QKSFM_COALESCE_SQ              COALESCE_SQ                                QBlock                    2 16                    16
         11.2.0.1 CONNECT_BY_ELIM_DUPS             NO_CONNECT_BY_ELIM_DUPS          QKSFM_ALL                      CONNECT_BY_ELIM_DUPS                       QBlock                    2 16                    16
         11.2.0.1 EXPAND_TABLE                     NO_EXPAND_TABLE                  QKSFM_TABLE_EXPANSION          EXPAND_TABLE                                      Object             4 16                    16
         11.2.0.1 FACTORIZE_JOIN                   NO_FACTORIZE_JOIN                QKSFM_JOINFAC                  FACTORIZE_JOIN                             QBlock                    2 16                    16
         11.2.0.1 NO_COALESCE_SQ                   COALESCE_SQ                      QKSFM_COALESCE_SQ              COALESCE_SQ                                QBlock                    2 16                    16
         11.2.0.1 NO_CONNECT_BY_ELIM_DUPS          CONNECT_BY_ELIM_DUPS             QKSFM_ALL                      CONNECT_BY_ELIM_DUPS                       QBlock                    2 16                    16
         11.2.0.1 NO_EXPAND_TABLE                  EXPAND_TABLE                     QKSFM_TABLE_EXPANSION          EXPAND_TABLE                                      Object             4 16                    16
         11.2.0.1 NO_FACTORIZE_JOIN                FACTORIZE_JOIN                   QKSFM_JOINFAC                  FACTORIZE_JOIN                             QBlock                    2 16                    16

VERSION  OUTLINE  NAME                             INVERSE                          SQL_FEATURE                    CLASS                               BIT0   BIT1   BIT2   BIT3   TARGET BIT BIT BIT BIT BIT PROP
-------- -------- -------------------------------- -------------------------------- ------------------------------ ----------------------------------- ------ ------ ------ ------ ------ --- --- --- --- --- ----
11.2.0.1 11.2.0.1 NO_PLACE_DISTINCT                PLACE_DISTINCT                   QKSFM_DIST_PLCMT               PLACE_DISTINCT                             QBlock                    2 16                    16
         11.2.0.1 NO_TRANSFORM_DISTINCT_AGG        TRANSFORM_DISTINCT_AGG           QKSFM_TRANSFORMATION           TRANSFORM_DISTINCT_AGG                     QBlock                    2                        0
         11.2.0.1 PLACE_DISTINCT                   NO_PLACE_DISTINCT                QKSFM_DIST_PLCMT               PLACE_DISTINCT                             QBlock                    2 16                    16
         11.2.0.1 TRANSFORM_DISTINCT_AGG           NO_TRANSFORM_DISTINCT_AGG        QKSFM_TRANSFORMATION           TRANSFORM_DISTINCT_AGG                     QBlock                    2                        0
         21.1.0.1 NO_DST_UPGRADE_INSERT_CONV       DST_UPGRADE_INSERT_CONV          QKSFM_ALL                      DST_UPGRADE_INSERT_CONV             State                            1                        0
                  APPEND_VALUES                    NOAPPEND                         QKSFM_CBO                      APPEND_VALUES                       State                            1                        0
                  DST_UPGRADE_INSERT_CONV          NO_DST_UPGRADE_INSERT_CONV       QKSFM_ALL                      DST_UPGRADE_INSERT_CONV             State                            1                        0
                  NO_STATEMENT_QUEUING             STATEMENT_QUEUING                QKSFM_PARALLEL                 STATEMENT_QUEUING                   State                            1                        0
                  NO_SUBSTRB_PAD                                                    QKSFM_EXECUTION                NO_SUBSTRB_PAD                      State                            1                        0
                  STATEMENT_QUEUING                NO_STATEMENT_QUEUING             QKSFM_PARALLEL                 STATEMENT_QUEUING                   State                            1                        0
                  XMLINDEX_SEL_IDX_TBL                                              QKSFM_ALL                      XMLINDEX_SEL_IDX_TBL                State                            1                        0
--------
      19

11.2.0.2 11.2.0.2 NO_TABLE_LOOKUP_BY_NL            TABLE_LOOKUP_BY_NL               QKSFM_TABLE_LOOKUP_BY_NL       TABLE_LOOKUP_BY_NL                                Object             4 16                    16
         11.2.0.2 NO_USE_HASH_GBY_FOR_PUSHDOWN     USE_HASH_GBY_FOR_PUSHDOWN        QKSFM_ALL                      USE_HASH_GBY_FOR_PUSHDOWN                  QBlock                    2                        0
         11.2.0.2 TABLE_LOOKUP_BY_NL               NO_TABLE_LOOKUP_BY_NL            QKSFM_TABLE_LOOKUP_BY_NL       TABLE_LOOKUP_BY_NL                                Object             4 16                    16
         11.2.0.2 USE_HASH_GBY_FOR_PUSHDOWN        NO_USE_HASH_GBY_FOR_PUSHDOWN     QKSFM_ALL                      USE_HASH_GBY_FOR_PUSHDOWN                  QBlock                    2                        0
                  NO_XDB_FASTPATH_INSERT           XDB_FASTPATH_INSERT              QKSFM_ALL                      XDB_FASTPATH_INSERT                 State                            1                        0
                  XDB_FASTPATH_INSERT              NO_XDB_FASTPATH_INSERT           QKSFM_ALL                      XDB_FASTPATH_INSERT                 State                            1                        0
--------
       6

11.2.0.3 11.2.0.3 FULL_OUTER_JOIN_TO_OUTER         NO_FULL_OUTER_JOIN_TO_OUTER      QKSFM_CBO                      FULL_OUTER_JOIN_TO_OUTER                          Object             4 16              256  272
         11.2.0.3 NO_FULL_OUTER_JOIN_TO_OUTER      FULL_OUTER_JOIN_TO_OUTER         QKSFM_CBO                      FULL_OUTER_JOIN_TO_OUTER                          Object             4 16              256  272
         11.2.0.3 NO_OUTER_JOIN_TO_ANTI            OUTER_JOIN_TO_ANTI               QKSFM_CBO                      OUTER_JOIN_TO_ANTI                                Object             4 16              256  272
         11.2.0.3 NO_SEMI_TO_INNER                 SEMI_TO_INNER                    QKSFM_CBO                      NO_SEMI_TO_INNER                                  Object             4 16              256  272
         11.2.0.3 OUTER_JOIN_TO_ANTI               NO_OUTER_JOIN_TO_ANTI            QKSFM_CBO                      OUTER_JOIN_TO_ANTI                                Object             4 16              256  272
         11.2.0.3 SEMI_TO_INNER                    NO_SEMI_TO_INNER                 QKSFM_CBO                      SEMI_TO_INNER                                     Object             4 16              256  272
--------
       6

11.2.0.4          DISABLE_PARALLEL_DML             ENABLE_PARALLEL_DML              QKSFM_DML                      ENABLE_PARALLEL_DML                 State                            1                        0
                  ENABLE_PARALLEL_DML              DISABLE_PARALLEL_DML             QKSFM_DML                      ENABLE_PARALLEL_DML                 State                            1                        0
--------
       2

12.1.0.1 12.1.0.1 BATCH_TABLE_ACCESS_BY_ROWID      NO_BATCH_TABLE_ACCESS_BY_ROWID   QKSFM_EXECUTION                BATCH_TABLE_ACCESS_BY_ROWID                       Object             4 16              256  272
         12.1.0.1 BITMAP_AND                                                        QKSFM_BITMAP_TREE              BITMAP_AND                                        Object             4 16  32                48
         12.1.0.1 CLUSTERING                       NO_CLUSTERING                    QKSFM_CLUSTERING               CLUSTERING                          State                            1                        0
         12.1.0.1 CLUSTER_BY_ROWID                 NO_CLUSTER_BY_ROWID              QKSFM_CBO                      CLUSTER_BY_ROWID                                  Object             4 16              256  272
         12.1.0.1 CUBE_AJ                                                           QKSFM_JOIN_METHOD              ANTIJOIN                                   QBlock                    2 16                    16
         12.1.0.1 CUBE_SJ                                                           QKSFM_JOIN_METHOD              SEMIJOIN                                   QBlock                    2 16                    16
         12.1.0.1 DATA_SECURITY_REWRITE_LIMIT      NO_DATA_SECURITY_REWRITE         QKSFM_DATA_SECURITY_REWRITE    DATA_SECURITY_REWRITE_LIMIT         State                            1                        0
         12.1.0.1 DECORRELATE                      NO_DECORRELATE                   QKSFM_DECORRELATE              DECORRELATE                                QBlock                    2 16                    16
         12.1.0.1 NO_BATCH_TABLE_ACCESS_BY_ROWID   BATCH_TABLE_ACCESS_BY_ROWID      QKSFM_EXECUTION                BATCH_TABLE_ACCESS_BY_ROWID                       Object             4 16              256  272
         12.1.0.1 NO_CLUSTERING                    CLUSTERING                       QKSFM_CLUSTERING               CLUSTERING                          State                            1                        0
         12.1.0.1 NO_CLUSTER_BY_ROWID              CLUSTER_BY_ROWID                 QKSFM_CBO                      CLUSTER_BY_ROWID                                  Object             4 16              256  272
         12.1.0.1 NO_DATA_SECURITY_REWRITE         DATA_SECURITY_REWRITE_LIMIT      QKSFM_DATA_SECURITY_REWRITE    DATA_SECURITY_REWRITE_LIMIT         State                            1                        0
         12.1.0.1 NO_DECORRELATE                   DECORRELATE                      QKSFM_DECORRELATE              DECORRELATE                                QBlock                    2 16                    16
         12.1.0.1 NO_PARTIAL_JOIN                  PARTIAL_JOIN                     QKSFM_PARTIAL_JOIN             PARTIAL_JOIN                                      Object             4 16              256  272
         12.1.0.1 NO_PARTIAL_ROLLUP_PUSHDOWN       PARTIAL_ROLLUP_PUSHDOWN          QKSFM_PQ                       PARTIAL_ROLLUP_PUSHDOWN                    QBlock                    2 16                    16
         12.1.0.1 NO_PQ_CONCURRENT_UNION           PQ_CONCURRENT_UNION              QKSFM_PQ                       PQ_CONCURRENT_UNION                 State  QBlock                    3                        0
         12.1.0.1 NO_PQ_REPLICATE                  PQ_REPLICATE                     QKSFM_PQ_REPLICATE             PQ_REPLICATE                                      Object             4 16              256  272
         12.1.0.1 NO_PQ_SKEW                       PQ_SKEW                          QKSFM_PQ                       PQ_SKEW                                           Object             4 16              256  272
         12.1.0.1 NO_PX_FAULT_TOLERANCE            PX_FAULT_TOLERANCE               QKSFM_PQ                       PX_FAULT_TOLERANCE                  State                            1                        0
         12.1.0.1 NO_USE_CUBE                      USE_CUBE                         QKSFM_USE_CUBE                 JOIN                                              Object             4 16      64      256  336
         12.1.0.1 NO_ZONEMAP                       ZONEMAP                          QKSFM_ZONEMAP                  ZONEMAP                                           Object             4                 256  256
         12.1.0.1 PARTIAL_JOIN                     NO_PARTIAL_JOIN                  QKSFM_PARTIAL_JOIN             PARTIAL_JOIN                                      Object             4 16              256  272
         12.1.0.1 PARTIAL_ROLLUP_PUSHDOWN          NO_PARTIAL_ROLLUP_PUSHDOWN       QKSFM_PQ                       PARTIAL_ROLLUP_PUSHDOWN                    QBlock                    2 16                    16
         12.1.0.1 PQ_CONCURRENT_UNION              NO_PQ_CONCURRENT_UNION           QKSFM_PQ                       PQ_CONCURRENT_UNION                 State  QBlock                    3                        0
         12.1.0.1 PQ_DISTRIBUTE_WINDOW                                              QKSFM_PQ                       PQ_DISTRIBUTE_WINDOW                       QBlock                    2 16                    16
         12.1.0.1 PQ_FILTER                                                         QKSFM_PQ                       PQ_FILTER                                  QBlock                    2                        0
         12.1.0.1 PQ_REPLICATE                     NO_PQ_REPLICATE                  QKSFM_PQ_REPLICATE             PQ_REPLICATE                                      Object             4 16              256  272
         12.1.0.1 PQ_SKEW                          NO_PQ_SKEW                       QKSFM_PQ                       PQ_SKEW                                           Object             4 16              256  272
         12.1.0.1 PX_FAULT_TOLERANCE               NO_PX_FAULT_TOLERANCE            QKSFM_PQ                       PX_FAULT_TOLERANCE                  State                            1                        0
         12.1.0.1 USE_CUBE                         NO_USE_CUBE                      QKSFM_USE_CUBE                 JOIN                                              Object             4 16      64      256  336
         12.1.0.1 ZONEMAP                          NO_ZONEMAP                       QKSFM_ZONEMAP                  ZONEMAP                                           Object             4                 256  256
                  AUTO_REOPTIMIZE                  NO_AUTO_REOPTIMIZE               QKSFM_AUTO_REOPT               AUTO_REOPTIMIZE                     State                            1                        0
                  GATHER_OPTIMIZER_STATISTICS      NO_GATHER_OPTIMIZER_STATISTICS   QKSFM_DBMS_STATS               GATHER_OPTIMIZER_STATISTICS         State                            1                        0
                  NO_AUTO_REOPTIMIZE               AUTO_REOPTIMIZE                  QKSFM_AUTO_REOPT               AUTO_REOPTIMIZE                     State                            1                        0
                  NO_GATHER_OPTIMIZER_STATISTICS   GATHER_OPTIMIZER_STATISTICS      QKSFM_DBMS_STATS               GATHER_OPTIMIZER_STATISTICS         State                            1                        0
                  USE_HIDDEN_PARTITIONS                                             QKSFM_PARTITION                USE_HIDDEN_PARTITIONS                      QBlock                    2                        0
                  WITH_PLSQL                                                        QKSFM_ALL                      WITH_PLSQL                          State                            1                        0
--------
      37

12.1.0.2 12.1.0.2 ADAPTIVE_PLAN                    NO_ADAPTIVE_PLAN                 QKSFM_ADAPTIVE_PLAN            ADAPTIVE_PLAN                       State                            1 16                    16
         12.1.0.2 ANSI_REARCH                      NO_ANSI_REARCH                   QKSFM_ANSI_REARCH              ANSI_REARCH                                QBlock                    2 16                    16
         12.1.0.2 ELIM_GROUPBY                     NO_ELIM_GROUPBY                  QKSFM_TRANSFORMATION           ELIM_GROUPBY                               QBlock                    2 16                    16
         12.1.0.2 INMEMORY                         NO_INMEMORY                      QKSFM_EXECUTION                INMEMORY                                   QBlock Object             6         64            64
         12.1.0.2 INMEMORY_PRUNING                 NO_INMEMORY_PRUNING              QKSFM_EXECUTION                INMEMORY_PRUNING                           QBlock Object             6         64            64
         12.1.0.2 NO_ADAPTIVE_PLAN                 ADAPTIVE_PLAN                    QKSFM_ADAPTIVE_PLAN            ADAPTIVE_PLAN                       State                            1 16                    16
         12.1.0.2 NO_ANSI_REARCH                   ANSI_REARCH                      QKSFM_ANSI_REARCH              ANSI_REARCH                                QBlock                    2 16                    16
         12.1.0.2 NO_ELIM_GROUPBY                  ELIM_GROUPBY                     QKSFM_TRANSFORMATION           ELIM_GROUPBY                               QBlock                    2 16                    16
         12.1.0.2 NO_INMEMORY                      INMEMORY                         QKSFM_EXECUTION                INMEMORY                                   QBlock Object             6         64            64
         12.1.0.2 NO_INMEMORY_PRUNING              INMEMORY_PRUNING                 QKSFM_EXECUTION                INMEMORY_PRUNING                           QBlock Object             6         64            64
         12.1.0.2 NO_USE_VECTOR_AGGREGATION        USE_VECTOR_AGGREGATION           QKSFM_VECTOR_AGG               USE_VECTOR_AGGREGATION                     QBlock                    2 16                    16
         12.1.0.2 NO_VECTOR_TRANSFORM              VECTOR_TRANSFORM                 QKSFM_VECTOR_AGG               VECTOR_TRANSFORM                           QBlock                    2 16                    16
         12.1.0.2 NO_VECTOR_TRANSFORM_DIMS         VECTOR_TRANSFORM_DIMS            QKSFM_VECTOR_AGG               VECTOR_TRANSFORM_DIMS                             Object             4 16      64            80
         12.1.0.2 NO_VECTOR_TRANSFORM_FACT         VECTOR_TRANSFORM_FACT            QKSFM_VECTOR_AGG               VECTOR_TRANSFORM_FACT                             Object             4 16      64            80
         12.1.0.2 USE_VECTOR_AGGREGATION           NO_USE_VECTOR_AGGREGATION        QKSFM_VECTOR_AGG               USE_VECTOR_AGGREGATION                     QBlock                    2 16                    16
         12.1.0.2 VECTOR_TRANSFORM                 NO_VECTOR_TRANSFORM              QKSFM_VECTOR_AGG               VECTOR_TRANSFORM                           QBlock                    2 16                    16
         12.1.0.2 VECTOR_TRANSFORM_DIMS            NO_VECTOR_TRANSFORM_DIMS         QKSFM_VECTOR_AGG               VECTOR_TRANSFORM_DIMS                             Object             4 16      64            80
         12.1.0.2 VECTOR_TRANSFORM_FACT            NO_VECTOR_TRANSFORM_FACT         QKSFM_VECTOR_AGG               VECTOR_TRANSFORM_FACT                             Object             4 16      64            80
                  RESERVOIR_SAMPLING                                                QKSFM_EXECUTION                RESERVOIR_SAMPLING                  State                            1                        0
--------

VERSION  OUTLINE  NAME                             INVERSE                          SQL_FEATURE                    CLASS                               BIT0   BIT1   BIT2   BIT3   TARGET BIT BIT BIT BIT BIT PROP
-------- -------- -------------------------------- -------------------------------- ------------------------------ ----------------------------------- ------ ------ ------ ------ ------ --- --- --- --- --- ----
      19

12.2.0.1 12.2.0.1 BUSHY_JOIN                       NO_BUSHY_JOIN                    QKSFM_BUSHY_JOIN               BUSHY_JOIN                                 QBlock                    2 16                    16
         12.2.0.1 CONTAINERS                                                        QKSFM_ALL                      CONTAINERS                          State                            1                        0
         12.2.0.1 DIST_AGG_PROLLUP_PUSHDOWN        NO_DIST_AGG_PROLLUP_PUSHDOWN     QKSFM_PQ                       DIST_AGG_PROLLUP_PUSHDOWN                  QBlock                    2 16                    16
         12.2.0.1 ELIMINATE_SQ                     NO_ELIMINATE_SQ                  QKSFM_ELIMINATE_SQ             ELIMINATE_SQ                               QBlock                    2 16                    16
         12.2.0.1 NO_BUSHY_JOIN                    BUSHY_JOIN                       QKSFM_BUSHY_JOIN               BUSHY_JOIN                                 QBlock                    2 16                    16
         12.2.0.1 NO_DIST_AGG_PROLLUP_PUSHDOWN     DIST_AGG_PROLLUP_PUSHDOWN        QKSFM_PQ                       DIST_AGG_PROLLUP_PUSHDOWN                  QBlock                    2 16                    16
         12.2.0.1 NO_ELIMINATE_SQ                  ELIMINATE_SQ                     QKSFM_ELIMINATE_SQ             ELIMINATE_SQ                               QBlock                    2 16                    16
         12.2.0.1 NO_OR_EXPAND                     OR_EXPAND                        QKSFM_CBQT_OR_EXPANSION        OR_EXPAND                                  QBlock                    2 16                    16
         12.2.0.1 NO_USE_DAGG_UNION_ALL_GSETS      USE_DAGG_UNION_ALL_GSETS         QKSFM_GROUPING_SET_XFORM       DAGG_OPTIM_GSETS                           QBlock                    2                        0
         12.2.0.1 NO_USE_HASH_GBY_FOR_DAGGPSHD     USE_HASH_GBY_FOR_DAGGPSHD        QKSFM_ALL                      USE_HASH_GBY_FOR_DAGGPSHD                  QBlock                    2                        0
         12.2.0.1 NO_USE_PARTITION_WISE_DISTINCT   USE_PARTITION_WISE_DISTINCT      QKSFM_PARTITION                USE_PARTITION_WISE_DISTINCT                QBlock                    2                        0
         12.2.0.1 NO_USE_PARTITION_WISE_GBY        USE_PARTITION_WISE_GBY           QKSFM_PARTITION                USE_PARTITION_WISE_GBY                     QBlock                    2                        0
         12.2.0.1 ORDER_SUBQ                                                        QKSFM_TRANSFORMATION           ORDER_SUBQ                                 QBlock                    2 16                    16
         12.2.0.1 OR_EXPAND                        NO_OR_EXPAND                     QKSFM_CBQT_OR_EXPANSION        OR_EXPAND                                  QBlock                    2 16                    16
         12.2.0.1 USE_DAGG_UNION_ALL_GSETS         NO_USE_DAGG_UNION_ALL_GSETS      QKSFM_GROUPING_SET_XFORM       DAGG_OPTIM_GSETS                           QBlock                    2                        0
         12.2.0.1 USE_HASH_GBY_FOR_DAGGPSHD        NO_USE_HASH_GBY_FOR_DAGGPSHD     QKSFM_ALL                      USE_HASH_GBY_FOR_DAGGPSHD                  QBlock                    2                        0
         12.2.0.1 USE_PARTITION_WISE_DISTINCT      NO_USE_PARTITION_WISE_DISTINCT   QKSFM_PARTITION                USE_PARTITION_WISE_DISTINCT                QBlock                    2                        0
         12.2.0.1 USE_PARTITION_WISE_GBY           NO_USE_PARTITION_WISE_GBY        QKSFM_PARTITION                USE_PARTITION_WISE_GBY                     QBlock                    2                        0
                  DATA_VALIDATE                                                     QKSFM_EXECUTION                DATA_VALIDATE                       State                            1                        0
                  FRESH_MV                                                          QKSFM_MVIEWS                   FRESH_MV                            State                            1                        0
                  SQL_SCOPE                                                         QKSFM_COMPILATION              SQL_SCOPE                           State                            1                        0
                  XMLTSET_DML_ENABLE                                                QKSFM_ALL                      XMLTSET_DML_ENABLE                  State                            1                        0
--------
      22

18.1.0   18.1.0   ANSWER_QUERY_USING_STATS         NO_ANSWER_QUERY_USING_STATS      QKSFM_ANSWER_QUERY_USING_STATS ANSWER_QUERY_USING_STATS                   QBlock                    2 16                    16
         18.1.0   NO_ANSWER_QUERY_USING_STATS      ANSWER_QUERY_USING_STATS         QKSFM_ANSWER_QUERY_USING_STATS ANSWER_QUERY_USING_STATS                   QBlock                    2 16                    16
         18.1.0   NO_PUSH_HAVING_TO_GBY            PUSH_HAVING_TO_GBY               QKSFM_EXECUTION                PUSH_HAVING_TO_GBY                         QBlock                    2                        0
         18.1.0   NO_REORDER_WIF                   REORDER_WIF                      QKSFM_PARTITION                REORDER_WIF                                QBlock                    2                        0
         18.1.0   NO_USE_PARTITION_WISE_WIF        USE_PARTITION_WISE_WIF           QKSFM_PARTITION                USE_PARTITION_WISE_WIF                     QBlock                    2                        0
         18.1.0   PUSH_HAVING_TO_GBY               NO_PUSH_HAVING_TO_GBY            QKSFM_EXECUTION                PUSH_HAVING_TO_GBY                         QBlock                    2                        0
         18.1.0   REORDER_WIF                      NO_REORDER_WIF                   QKSFM_PARTITION                REORDER_WIF                                QBlock                    2                        0
         18.1.0   USE_PARTITION_WISE_WIF           NO_USE_PARTITION_WISE_WIF        QKSFM_PARTITION                USE_PARTITION_WISE_WIF                     QBlock                    2                        0
                  AV_CACHE                                                          QKSFM_EXECUTION                AV_CACHE                                   QBlock                    2                        0
                  CURRENT_INSTANCE                                                  QKSFM_ALL                      CURRENT_INSTANCE                    State                            1                        0
                  MEMOPTIMIZE_WRITE                                                 QKSFM_EXECUTION                MEMOPTIMIZE_WRITE                   State                            1                        0
                  PDB_LOCAL_ONLY                                                    QKSFM_DML                      PDB_LOCAL_ONLY                      State                            1                        0
                  SKIP_PROXY                                                        QKSFM_ALL                      SKIP_PROXY                          State                            1                        0
                  SUPPRESS_LOAD                                                     QKSFM_DDL                      SUPPRESS_LOAD                       State                            1                        0
                  SYSTEM_STATS                                                      QKSFM_ALL                      SYSTEM_STATS                        State                            1 16              256  272
--------
      15

19.1.0   19.1.0   NO_PQ_EXPAND_TABLE               PQ_EXPAND_TABLE                  QKSFM_TABLE_EXPANSION          PQ_EXPAND_TABLE                                   Object             4 16                    16
         19.1.0   NO_USE_SCALABLE_GBY_INVDIST      USE_SCALABLE_GBY_INVDIST         QKSFM_PQ                       USE_SCALABLE_GBY_INVDIST                   QBlock                    2                        0
         19.1.0   PQ_EXPAND_TABLE                  NO_PQ_EXPAND_TABLE               QKSFM_TABLE_EXPANSION          PQ_EXPAND_TABLE                                   Object             4 16                    16
         19.1.0   USE_SCALABLE_GBY_INVDIST         NO_USE_SCALABLE_GBY_INVDIST      QKSFM_PQ                       USE_SCALABLE_GBY_INVDIST                   QBlock                    2                        0
                  JSON_LENGTH                                                       QKSFM_EXECUTION                JSON_LENGTH                         State                            1                        0
                  QUARANTINE                                                        QKSFM_EXECUTION                QUARANTINE                          State                            1                        0
--------
       6

20.1.0   20.1.0   FORCE_JSON_TABLE_TRANSFORM       NO_JSON_TABLE_TRANSFORM          QKSFM_JSON_REWRITE             FORCE_JSON_TABLE_TRANSFORM          State                            1                        0
         20.1.0   NO_JSON_TABLE_TRANSFORM          FORCE_JSON_TABLE_TRANSFORM       QKSFM_JSON_REWRITE             FORCE_JSON_TABLE_TRANSFORM          State                            1                        0
         20.1.0   NO_SET_GBY_PUSHDOWN              SET_GBY_PUSHDOWN                 QKSFM_ALL                      SET_GBY_PUSHDOWN                           QBlock                    2 16                    16
         20.1.0   SET_GBY_PUSHDOWN                 NO_SET_GBY_PUSHDOWN              QKSFM_ALL                      SET_GBY_PUSHDOWN                           QBlock                    2 16                    16
                  ANALYTIC_VIEW_SQL                                                 QKSFM_COMPILATION              ANALYTIC_VIEW_SQL                          QBlock                    2                        0
                  DENORM_AV                                                         QKSFM_COMPILATION              DENORM_AV                                  QBlock                    2                        0
--------
       6

21.1.0   21.1.0   DAGG_OPTIM_GSETS                 NO_DAGG_OPTIM_GSETS              QKSFM_GROUPING_SET_XFORM       DAGG_OPTIM_GSETS                           QBlock                    2                        0
         21.1.0   HASHSET_BUILD                                                     QKSFM_EXECUTION                HASHSET_BUILD                              QBlock                    2 16                    16
         21.1.0   NO_DAGG_OPTIM_GSETS              DAGG_OPTIM_GSETS                 QKSFM_GROUPING_SET_XFORM       DAGG_OPTIM_GSETS                           QBlock                    2                        0
         21.1.0   NO_OBY_GBYPD_SEPARATE            OBY_GBYPD_SEPARATE               QKSFM_PQ                       OBY_GBYPD_SEPARATE                         QBlock                    2 16                    16
         21.1.0   NO_PQ_NONLEAF_SKEW               PQ_NONLEAF_SKEW                  QKSFM_PQ                       PQ_NONLEAF_SKEW                                   Object             4 16              256  272
         21.1.0   OBY_GBYPD_SEPARATE               NO_OBY_GBYPD_SEPARATE            QKSFM_PQ                       OBY_GBYPD_SEPARATE                         QBlock                    2 16                    16
         21.1.0   ORDER_KEY_VECTOR_USE                                              QKSFM_VECTOR_AGG               ORDER_KEY_VECTOR_USE                       QBlock                    2 16              256  272
         21.1.0   OSON_GET_CONTENT                                                  QKSFM_JSON                     OSON_GET_CONTENT                    State                            1                        0
         21.1.0   PQ_NONLEAF_SKEW                  NO_PQ_NONLEAF_SKEW               QKSFM_PQ                       PQ_NONLEAF_SKEW                                   Object             4 16              256  272
--------
       9


388 rows selected

March 2, 2022

Generated Predicates

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 11:25 am GMT Mar 2,2022

A question arrived on the MOS Community forum yesterday (needs an account if you want to see the original) that reported a couple of fragments of a CBO trace (10053) file:

----- Current SQL Statement for this session (sql_id=4c85twjpdg8g9) -----
select /*+ 123456 */ count(*) from gl_detail where prepareddatev='2022-01-22 15:00:00'

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "NC63"."GL_DETAIL" "GL_DETAIL" 
WHERE "GL_DETAIL"."PREPAREDDATEV"='2022-01-22 15:00:00' 
AND SUBSTR("GL_DETAIL"."PREPAREDDATEV",1,10)='2022-01-22'

The question was:

Why after transformations ,oracle add condition SUBSTR(“GL_DETAIL”.”PREPAREDDATEV”,1,10)=’2022-01-22′

Mark Powell asked for the execution plan and information about indexes (normal and function-based) and histograms, as well as asking for the Oracle version. I asked about constraints and virtual columns and, in particular, the possibility of a virtual column being used as a partition key.

We didn’t get explicit answers to all our questions, but we did get “no constraints, no virtual columns, no partitioning”, and we also got the full 10053 trace file which, given the simplicity of the query, was mercifully short .. a mere 95KB and 2,800 lines.

The key aid to reading 10053 trace files is knowing what you’re expecting to see before you start looking. And with a generated predicate there was likely to be something that would tell me about about the “column” that caused the predicate to appear and the arithmetic that was the consequence of that predicate coming into existence. So I started with the section headed “SINGLE TABLE ACCESS PATH” where the cardinality estimate (for each individual table) would be calculated. This showed two columns being considered for the single table in the query:

  Column (#77): 
    NewDensity:0.000000, OldDensity:0.000035 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:8314506
  Column (#77): PREPAREDDATEV(
 
  Column (#88): 
    NewDensity:0.000188, OldDensity:0.000583 BktCnt:75, PopBktCnt:11, PopValCnt:1, NDV:4551
  Column (#88): SYS_NC00088$(

Check the name of column #88 – sys_nc00088$ – that’s an internally generated virtual column which may well be to be associated with a function-based index, so let’s back up a bit to the “BASIC STATISTICAL INFORMATION” and (thirteen sets of) index stats for the table where we find:

  Index: I_GL_DETAIL_7  Col#: 88
    LVLS: 3  #LB: 433301  #DK: 4551  LB/K: 95.00  DB/K: 5922.00  CLUF: 26953639.00

The obvious first guess is that column #88 is the invisible virtual column underpinning an index that has been created on substr(prepareddatev,1,10) and here’s a quick and dirty test script to demonstrate that this could be the correct guess.

create table t1 (v1 varchar2(20), v2 varchar2(1));
create index t1_i1 on t1(substr(v1,1,10));

select column_name, virtual_column, hidden_column from user_tab_cols where table_name = 'T1';
select * from user_ind_expressions where table_name = 'T1';

insert into t1 values('2022-03-02 09:01:00', 'x');
commit;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace traceonly explain

select /*+ full(t1) */  * from t1 where v1 = '2022-03-02 09:01:00';

set autotrace off

And here’s the output cut and pasted from an SQL*Plus session running 11.2.0.4 (which is the version the CBO trace file came from).

Table created.


Index created.


COLUMN_NAME          VIR HID
-------------------- --- ---
V1                   NO  NO
V2                   NO  NO
SYS_NC00003$         YES YES

3 rows selected.


INDEX_NAME           TABLE_NAME                COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- ------------------------- ---------------------------------------- ---------------
T1_I1                T1                        SUBSTR("V1",1,10)                                      1

1 row selected.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("V1"='2022-03-02 09:01:00' AND
              SUBSTR("V1",1,10)='2022-03-02')

We see the “extra” predicate and a column with a name of the form sys_ncXXXXX$. The results from more recent versions of Oracle should be the same. I think there’s a pretty good chance that if the OP runs suitable queries against XXX_tab_cols and XXX_ind_expressions they’ll see similar results that explain the predicate that surprised them.

Footnote

There are various notes on the blog about constraints and transitive closure generating extra predicates, and how the optimizer can use function-based indexes that have definitions that are “good enough” though not perfect matches for user-supplied predicates. This is just another little detail in how the optimizer tries to find as much helpful information as it can from the data dictionary. The earliest note I can find on my blog about this at present is about partition elimination and generated predicates – which prompted various comments about function-based indexes and predicate generation.

Next Page »

Website Powered by WordPress.com.