Oracle Scratchpad

November 9, 2020

I wish

Filed under: CBO,Execution plans,Oracle,sorting,Wishlist — Jonathan Lewis @ 12:01 pm GMT Nov 9,2020

Here’s a lovely little mechanism new to Postgres 13 that can minimise sorting costs: the “incremental sort”. It would be nice to see it in Oracle as well as it could make an enormous difference to “fetch first N” queries.

The concept is simple – if a rowsource moving up a plan is known to be in “partially sorted” order when it reaches a sort operation the optimizer can choose whether or not to sort the entire rowsource in one go or to sort it in batches as it arrives.

For example if you have a query which has “order by cola, colb” as its final clause and the plan has used an index range scan on an index on just (cola) then the “sort order by” operation can accept the rows for the first value of cola, sort them by colb and pass them on, then accept the rows for the second value of cola, sort them by colb and pass them on, and so on.

Better still, if you have an “order by table1.colA, table2,colB” and the optimizer used an indexed access path on table1.colA and a nested loop into table2, then the optimizer will still recognize that the generated data is already partially sorted, and sort batches for table1.colA to order them by table2.colB. (Oracle has a mechanism for dealing with sorted hash clusters that is roughly similar.)

Obviously the benefit is that you avoid doing a very large sort that might spill to disc; slightly less obviously is that you might avoid sorting the whole data set if you have a “fetch first N” query. Here’s an Oracle model setting up a demonstration of the principle:

rem
rem     Script:         fetch_first_postgres_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem

create table t1 as
with generator(id) as (
        select 1 from dual
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator
/

alter table t1 modify n1 not null;

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

create index t1_i1 on t1(n1);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

drop index t1_i1;
create index t1_i1 on t1(n1, n2);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

I’ve created a table with 100,000 rows where the value of n1 is repeated 10 times and well-clustered, while for each value of n1, the n2 column has 10 distinct values (not necessarily in order thanks to the mod(,13)). Using this data set I’ve executed the same query three times – selecting the rows for 6 consecutive values of n1, ordering by n1, n2.

The first test will have to do a tablescan, the second can use the index I’ve created on (n1) but will have to do a sort (after visiting the table), the third can walk the index I’ve created on (n1,n2) and complete without sorting.

Here, in order, are the three execution plans:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    70 |  4830 |   145   (9)| 00:00:01 |
|   1 |  SORT ORDER BY     |      |    70 |  4830 |   145   (9)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    70 |  4830 |   144   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"<=30 AND "N1">=25)


----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY                       |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    70 |  4830 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1">=25 AND "N1"<=30)


-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    70 |  4830 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    70 |  4830 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1">=25 AND "N1"<=30)

Having created the Oracle model I asked Phil Florent (who had prompted this note by emailing me with a question about why Postgres 13 was executing a “Fetch First” so much faster than Oracle 19 (the basic answer is at this URL) if he would run it under Postgres and send me the execution plans.

The code required two changes – the first to handle the change in dialect, the second to supply a change in scale because my model produced such a small output that Postgres didn’t bother to use the new feature. Here’s the modified SQL to generate the original test data:

create table t1 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)       small_vc,
        rpad('x',50,'x')        padding
from
        generator;

And the three plans (so that you can compare the content and style of output with Oracle plans) that went with this small data set – first the no-index plan, then the plan for the (n1) index, then the plan with (n1,n2) indexed:

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
Sort  (cost=2835.85..2836.00 rows=62 width=74) (actual time=19.534..19.600 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Seq Scan on t1  (cost=0.00..2834.00 rows=62 width=74) (actual time=0.067..19.417 rows=60 loops=1)
         Filter: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))
         Rows Removed by Filter: 99940

Planning Time: 0.351 ms
Execution Time: 19.703 ms


                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sort  (cost=11.50..11.66 rows=62 width=74) (actual time=0.224..0.289 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Index Scan using t1_i1 on t1  (cost=0.42..9.66 rows=62 width=74) (actual time=0.024..0.113 rows=60 loops=1)
         Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.665 ms
Execution Time: 0.391 ms



                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Scan using t1_i1 on t1  (cost=0.42..114.66 rows=62 width=74) (actual time=0.022..0.155 rows=60 loops=1)
   Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.690 ms
Execution Time: 0.259 ms
 

As you can see, there’s very little difference between Oracle’s plans and Postgres’ plans in this example. (Though it’s rather nice to see what extra details appear in the Postgres plans, which were generated with the equivalent of Oracle’s “statistics_level = all” setting.

The middle plan shows us that Postgres didn’t use the “incremental sort” – but it’s useful to see it anyway so that we can compare the structure of the plan when we increase the volume of data – which the following script is for:

create table t2 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 10000000
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator;
 
create index t2_i1 on t2(n1);

explain analyze
select  *
from    t2
where   n1 between 25000 and 30000
order by
        n1, n2
;

analyze t2;

Now we have 10M rows, still with 10 rows per value of n1, and our query requests 5,001 values of n1, so 50,010 rows in total. With that much data the optimizer decided to use the incremental sort rather than sorting the whole result set in one go:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Incremental Sort  (cost=0.49..4532.57 rows=52677 width=74) (actual time=0.160..164.125 rows=50010 loops=1)
   Sort Key: n1, n2
   Presorted Key: n1
   Full-sort Groups: 1251  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.055..61.663 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.197 ms
Execution Time: 204.490 ms

Note, particularly, the “Presorted Key” line leading to the “Sort Key” line. We can also check the plan without the incremental sort with a set command to disable the feature:

set enable_incremental_sort = off;

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=8812.64..8944.33 rows=52677 width=74) (actual time=181.487..227.735 rows=50010 loops=1)
   Sort Key: n1, n2
   Sort Method: external merge  Disk: 4128kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.070..74.975 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.239 ms
Execution Time: 266.325 ms

Comparing the times for these two queries there isn’t really much difference – but you will notice that the old approach has had to dump 4MB to disc while the incremental sorts gets the job done in 30Kb of memory, which may be an important difference in other circumstances.

More importantly than a tiny time difference in this trivial example is the fact that Phil Florent’s original question was:

“Why is this two-table join with ‘fetch first 5’ taking 28 seconds to complete in Oracle when it takes less than one millisecond to complete in Postgres?”

The answer was:

  • Oracle is doing a hash join that produces a 25M row result set, sorting it to do an analytic row_number() call to get the first 5 rows.
  • Postgres is doing a nested loop join, fetching the first row(s) from the first table in the right order by index range scan then fetching the matching rows from the second table – then doing an incremental sort on those rows; then fetching the next row(s) and repeating the process – stopping after it’s got through a total of 5 rows instead of generating and sorting 25M rows.

Here’s the plan (with obfuscated table names):

                                                    QUERY PLAN                                                    

-------------------------------------------------------------------------------------------------------------------

Limit  (cost=284.36..287.21 rows=5 width=10)
   ->  Incremental Sort  (cost=284.36..14261803.18 rows=25000000 width=10)
         Sort Key: table1.col1 DESC, table2.col2 DESC
         Presorted Key: table1.col1
         ->  Nested Loop  (cost=1.00..13294209.39 rows=25000000 width=10)
               ->  Index Scan using table1_col1_idx on table1  (cost=0.56..1300593.47 rows=25000000 width=8)
               ->  Index Scan using table2.pk_col on table2 (cost=0.43..0.48 rows=1 width=6)
                     Index Cond: (id = table1.id)

You’ll notice in this exanple that the incremental sort can take advantage of the optimizer’s knowledge of the index definitions whether the sort is ascending or descending.

You’ll also notice that Postgres has the same problem as Oracle when it comes to coping with Fetch First (or, in Oracle’s case, rownum <= N and optimizer_mode = first_rows_N). Even when it “knows” that a query is going to stop fetching data very early the plan still reports 25M rows as the expected volume of data.

Summary

Postgres 13 has a wonderful mechanism for optimising sorts that can make a huge difference to “first rows” queries and even basic “order by” clauses involving join query result sets.

Footnote

Browsing the Internet for documentation and comment on the incremental sort I found the following. Given my level of ignorance about Postgres I can’t comment on the completeness or correctness of the information, but it looked good to me, and I found its comments about this feature very informative.

October 10, 2020

Interval Oddity

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 2:51 pm BST Oct 10,2020

Interval partitioning is a popular strategy for partitioning date-based data. It’s an enhanced variant of range partitioning that allows you to define a starting partition and an interval that should be used to derive the high values for all subsequent partitions – and Oracle doesn’t even have to create intervening partitions if you insert data that goes far beyond the current partition, it automatically creates exactly the right partition (with the correct high_value and correctly inferred lower boundary) for the incoming data and behaves as if the intervening partitions will become available when they’re needed at some later point in time. So no need for DBAs to work out actual partition high_values, no need to ensure that all the partitions you need out into the future have been pre-created, no more batch processes crashing with Oracle error ORA-14400: inserted partition key does not map to any partition.

But there’s a surprising difference between traditional range partitioning and the newer interval partitioning that will increase CPU usage in some cases and may (though I haven’t yet investigated this in sufficient detail to create an example) lead to variations in execution plans.

To demonstrate the difference I’m going to create two tables with the same structure and content then run the same query against them and show you the resulting execution plans. Here’s the code to create and populate the tables:

rem
rem     Script:         interval_or_range.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem


create table t_interval(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
interval (numtoyminterval(1,'MONTH'))
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy'))
        )
;

create table t_range(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy')),
                partition start_p2 values less than (to_date('01-Feb-2020','dd-mon-yyyy')),
                partition start_p3 values less than (to_date('01-Mar-2020','dd-mon-yyyy')),
                partition start_p4 values less than (to_date('01-Apr-2020','dd-mon-yyyy')),
                partition start_p5 values less than (to_date('01-May-2020','dd-mon-yyyy')),
                partition start_p6 values less than (to_date('01-Jun-2020','dd-mon-yyyy')),
                partition start_p7 values less than (to_date('01-Jul-2020','dd-mon-yyyy')),
                partition start_p8 values less than (to_date('01-Aug-2020','dd-mon-yyyy')),
                partition start_p9 values less than (to_date('01-Sep-2020','dd-mon-yyyy'))
        )
;

insert into t_range select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue  
;

insert into t_interval select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_RANGE',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_INTERVAL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve used composite partitioned tables in this example but the same anomaly appears with simple partitioning – which you can test by commenting out the “subpartion by ..” lines. The two tables have partitions defined to hold a month’s data. I’ve inserted a few rows into most of the partitions, and ensured that I haven’t accidentally attempted to insert data that falls outside the legal range of the table with the predefined partitions .

To show that the tables are nominally identical here’s the list of partitions with their high values and number of rows:

column table_name       format a15
column partition_name   format a15
column high_value       format a80

break on table_name skip 1

select
        table_name, partition_name, num_rows, high_value
from
        user_tab_partitions
where
        table_name in ('T_INTERVAL','T_RANGE')
order by
        table_name, partition_name
;


ABLE_NAME      PARTITION_NAME    NUM_ROWS HIGH_VALUE
--------------- --------------- ---------- --------------------------------------------------------------------------------
T_INTERVAL      START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10722              30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10723              29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10724              31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10725              30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10726              31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10727              30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10728              31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10729              28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

T_RANGE         START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P2                30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P3                29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P4                31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P5                30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P6                31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P7                30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P8                31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P9                28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA


18 rows selected.

With this setup we can now run a simple query against the two tables using a where clause that is an exact match for the partition definition for the July data:

set serveroutput off

select 
        count(*) 
from 
        t_range 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

select 
        count(*) 
from 
        t_interval 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

I’ve left in two versions of the date predicates – the “ANSI” style, and a strictly formatted “to_char()” style that is still the approached used most commonly in Oracle systems. The effect on the execution plans is the same for both variations of the date declaration, and here are the two execution plans – reported from an instance of 19.3.0.0:

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |         |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|   3 |    TABLE ACCESS FULL    | T_RANGE |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
---------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|            |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|*  3 |    TABLE ACCESS FULL    | T_INTERVAL |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("ORDER_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

It’s quite possible that you won’t notice the difference these plans, even though I’ve put them directly one below the other; and it’s very likely that most people wouldn’t think about the difference if they didn’t have the two plans available at the same time to compare.

If you can’t spot the difference it’s because I’ve deliberately arranged them in a way that might fool you into not looking carefully enough.

If you’ve spotted the difference it’s probably because you’ve examined the Predicate Information section. There isn’t one for the the t_range example but there is one for the t_interval example – and it was a little sneaky of me to make it easy for you to assume that the one set of predicates I showed was common to the two plans.

Since the plans have been pulled from memory (v$sql_plan) it seems likely that they are truthful and the query based on the interval partitioning is actually checking every row in the partition against two date values. In my example that’s not going to make much difference to performance, but in a datawarehouse with millions of rows per partition there’s scope for a noticeable increase in CPU between the two queries – especially since the block reads are likely to be direct path, bypassing much of the usual (CPU -intensive) buffer cache activity.

Interestingly when I checked the 10053 trace files for the two queries the CPU cost for the pair was identical. Normally you expect to see at least a little CPU cost (even if it’s only tens of units out of millions) for checking a predicate. This raises the question – does the optimizer allow a cost for the t_range table for an event that isn’t going to happen, or is it reporting an event that isn’t going to happen for the t_interval table. [Edit: easy enough to check by doing the test on a simple heap table that clones the data from that one partition, and checking the calculated cost with and without the predicate]

There’s a further ramification to this anomaly, relating to the question on the Oracle-L list server that prompted the investigation. What happens on Exadata where the predicates can be offloaded to storage?

The original question wasn’t actually about the appearance (or not) of the predicates, it was about an unexpected cardinality estimate for a query involving two equi-partitioned tables, so this note has gone completely off-topic from the question; but another strange detail about the predicates showed up when I suggested the creation of a column group on the join columns. A (redundant) predicate that had been offloaded to storage stopped being offloaded; here, from the original posting with the original tables, are the two different Predicate Information sections that appeared – the first without the column group, the second after the column group had been created and its stats collected:

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Predicate Information (identified by operation id):
---------------------------------------------------
   4 -
access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Notice how the storage() predicate that appears at operation 8 in the first set has disappeared from the second even though operation 6 manages to retain its storage() predicate throughout.

I’ve modelled a further example of odd behaviour using two pairs of tables – one pair using range/hash partitioning the other interval/hash partitioning. I won’t reproduce it here but the problem of redundant predicates appearing and then disappearing gets even stranger.

I haven’t yet produced an example where the unexpected predicate behaviour has affected the costs or cardinalities – but I’ve only spent a couple of hours playing around with well-formed examples: it’s possible that in badly formed examples (e.g. with statistical inconsistencies) the side effect could mean that two notionally identical queries produce different numbers and different plans because they end up with different predicates in the final transformed query.

Footnote

Following an email from David Kurtz, it occurred to me that I should have made it clear that the disappearance of predicates on the partition key is expected behaviour when the predicates are clearly synchronised with the partition boundaries. The behaviour for the interval partitioning is the oddity, the behaviour for the “normal” range partitioning is the standard.

October 9, 2020

Inline Hint

Filed under: 18c,CBO,Execution plans,Hints,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 12:46 pm BST Oct 9,2020

If you’ve ever used subquery factoring (“with” subqueries or common table expressions (CTEs) as they are often called) then you’re probably aware of the (undocumented) hints /*+ materialize */ , which forces Oracle to create a local temporary table to hold the result of the subquery for subsequent use, and /*+ inline */, which forces the optimizer to copy the text of the subquery into the body of the query before starting the optimisation phase.

There’s a small, but important, enhancement to these hints that appeared in Oracle 18. Like so many other hints in Oracle they can now have a query block name as a “parameter”, so you can use them at the top level of your query. Here’s some code to demonstrate:

rem
rem     Script:         inline_hint.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1  -- hints don't have any effect
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
/

create index t1_i1 on t1(object_id);

create table t2
as
select  *
from    t1
/

create index t2_i1 on t2(object_id);

spool inline_hint.lst


explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                inline(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and     exists (
                select
                        null
                from
                        v1      v1b
                where
                        v1b.object_id = t1.object_id
                and     v1b.object_name like 'WRI%'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                materialize(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

The first of these two queries uses the factored subquery twice so, by default, it will create a “cursor duration memory” temporary table to hold the results of the subquery and then use that temporary table twice in the execution plan.

Conversely the second query uses the factored subquery just once, so the optimizer’s default action will be to copy the text into the body of the main query and optimize the whole thing as a single query block.

To reverse the default behaviour in versions of Oracle up to 12.2.0.1 (though later patch sets may include the 18c enhancements) you could add the /*+ inline */ or /*+ materialize */ hints respectively to the factored subqueries; but my demonstration you can see that I’ve given the factored subquery a query block name and added the relevant hint to the main query block passing in the query block name of the factored subquery – hence /*+ inline(@cte) */ and /*+ materialize(@cte) */.

Here – from 19.3 – are the resulting execution plans (with some cosmetic editing) – first the plan with the inline() hint.

------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |     1 |    63 |     9  (12)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                     |       |     1 |    63 |     9  (12)| 00:00:01 |
|   2 |   NESTED LOOPS                         |       |     1 |    50 |     7  (15)| 00:00:01 |
|   3 |    SORT UNIQUE                         |       |     1 |    25 |     4   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |    25 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T2_I1 |    48 |       |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED | T1    |     1 |    25 |     2   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN                   | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  8 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |     1 |    13 |     2   (0)| 00:00:01 |
|*  9 |    INDEX RANGE SCAN                    | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$06B48120
   4 - SEL$06B48120 / GTT1@CTE
   5 - SEL$06B48120 / GTT1@CTE
   6 - SEL$06B48120 / T1@MAIN
   7 - SEL$06B48120 / T1@MAIN
   8 - SEL$06B48120 / GTT1@CTE
   9 - SEL$06B48120 / GTT1@CTE

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OBJECT_NAME" LIKE 'WRI%')
   5 - access("OBJECT_ID">=100 AND "OBJECT_ID"<=200)
   7 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("T1"."OBJECT_ID"<=200 AND "T1"."OBJECT_ID">=100)
   8 - filter("OBJECT_TYPE"='TABLE')
   9 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("OBJECT_ID"<=200 AND "OBJECT_ID">=100)

As you can see Oracle has copied the subquery text into the main body of the text and then optimized to produce a three-table join. One of the subqueries has been unnested into an aggregate view (operations 3,4,5), the other has been transformed into a semi-join.

In passing you’ll also notice that the optimizer has used transitive closure to add the range predicate on t1 to both occurrences of the t2 table.

And here’s the plan for the query with the single use of the subquery and materialize() hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  2448 |    39   (8)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6611_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  2448 |    13  (16)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED   | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                     | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  7 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|   8 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6611_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A3F38ADC
   2 - CTE
   3 - CTE          / GTT1@CTE
   5 - SEL$A3F38ADC / T1@MAIN
   6 - SEL$A3F38ADC / T1@MAIN
   7 - SEL$AA28F105 / V1A@SEL$1
   8 - SEL$AA28F105 / T1@SEL$AA28F105

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   6 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   7 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

In this plan the optimizer has created an in-memory temporary table and then used it in the existence subquery – which it has then transformed into a semi-join, so we have a query block with the name SEL$A3F38ADC; but we also see that the query block CTE still exists, labelling the operations that Oracle used to populate the temporary table.

It is an interesting (and irritating) detail that when we look at object aliases we see (operation 8) that Oracle has given the temporary table the alias of t1 – which is just a little confusing since I actually have a table called t1!

Next Steps

Being able to nominate a query block for the inline() and materialize() hints may be of great help in some cases (there’s a recent example on the Oracle Developer Forum (may need a MOS login) where it might make a huge difference to the performance of a particular query without requiring a rewrite of the SQL).

But there are a couple of details to investigate. First, I had a query block name built into my factored subquery – what happens if the author of the SQL didn’t include a query block name?

Before I’d added the inline() hint and query block names in the first example above this is what the plan looked like:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  6240 |    48  (11)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6612_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  6240 |    21  (15)| 00:00:01 |
|*  5 |    HASH JOIN SEMI                        |                            |    48 |  4992 |    13  (16)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED  | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                    | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  8 |     VIEW                                 |                            | 10000 |   771K|     8  (13)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
|* 10 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|  11 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A317D234
   2 - SEL$1
   3 - SEL$1        / GTT1@SEL$1
   6 - SEL$A317D234 / T1@SEL$2
   7 - SEL$A317D234 / T1@SEL$2
   8 - SEL$D67CB2D2 / V1B@SEL$4
   9 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
  10 - SEL$D67CB2D3 / V1A@SEL$3
  11 - SEL$D67CB2D3 / T1@SEL$D67CB2D3

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - access("V1B"."OBJECT_ID"="T1"."OBJECT_ID")
   7 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   8 - filter("V1B"."OBJECT_NAME" LIKE 'WRI%' AND "V1B"."OBJECT_ID">=100 AND "V1B"."OBJECT_ID"<=200)
  10 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

As you can see, the factored subquery (operations 2 and 3) has the query block name of sel$1 and the main query (operations 6 an 7 where the real t1 is used) has the query block name sel$2. So without giving the subquery a name I could have used the hint /*+ inline(@sel$1) */ in the main query block.

This takes us on to the second point that needs investigation. If you’ve looked at the example on the Oracle Developer Forum you will have seen that there’s an SQL statement that references a stored view and the factored subquery of interest is defined in the view. This means we might be able to edit the query that calls the view to include a hint referencing the query block inside the view – but then what do we do if we can’t edit the main query itself?

To be investigated (1) – would the inline() hint with nominated query block work if the factored subquery was inside a stored view that we were using in our query?

To be investigated(2) – if (1) works, could we achieve the same result by using an SQL Patch to attach the hint to the main query text without editing the main query?

Update (Oct 2020)

It turns out that I discovered this enhancement a few months ago while doing some experimentation with recursive subquery factoring.

Update Nov 2020

A blog note from Nenad Noveljic warns of a surprising ORA-07445 if you get too trigger-happy with the inline() and materialize() hints.

October 6, 2020

Index FFS Cost 2

Filed under: Bugs,CBO,Indexing,Oracle,Statistics — Jonathan Lewis @ 12:58 pm BST Oct 6,2020

Here’s a little puzzle, highlighting a “bug that’s not a bug” that was “fixed but not fixed” some time in the 10.2 timeline. (If you want specifics about exactly when the fix became available and what patches might be available they’re in MOS – Bug 5099019 : DBMS_STATS DOESN’T COUNT LEAF_BLOCKS CORRECTLY.

Running 19.3.0.0, with the system statistics as shown:


begin
        dbms_stats.set_system_stats('MBRC',16);
        dbms_stats.set_system_stats('MREADTIM',10);
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',500);
end;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                estimate_percent => 100,
                method_opt       => 'for all columns size 1'
        );
end;
/

select
        index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
;

alter system flush buffer_cache;

set autotrace traceonly 

select
        count(small_vc)
from    t1
where   small_vc is not null
;

set autotrace off

For my test run there are no uncommitted transactions, the gathering of table stats with cascade to indexes means all the blocks are clean so there’s no undo activity needed to produce an answer for the final query, and all that the query is going to do is run an index fast full scan to count the number of rows in the table because there’s an index on just (small_vc).

The system stats tell us that Oracle is going to cost the index fast full scan by taking the leaf block count, dividing by 16 (the MBRC) and multiplying by 2 ( mreadtim / sreadtim) and adding a bit for CPU usage. So here are the results from running the query and generating the plan with autotrace:


PL/SQL procedure successfully completed.

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1_V1                        153

1 row selected.

System altered.

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 274579903

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |    22   (5)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_V1 |  9999 |   107K|    22   (5)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SMALL_VC" IS NOT NULL)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1530  consistent gets
       1522  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Points to notice:

  • The cost is (as predicted):  ceiling(153 leaf_blocks / 16 MBRC ) * 2 + a bit:  = 20 + bit.
  • The number of physical blocks read is 1522, not the 153 leaf blocks reported in index stats
  • No recursive SQL, and I did say that there were no undo / read consistency issues to worry about

There is clearly an inconsistency between the size of the index and the (100% gathered) leaf_block count that the optimizer is using, and this is a point I made many years ago in “Cost Based Oracle – Fundamentals”.  To gather the leaf block count Oracle has looked at the number of leaf blocks that, after cleanout – and here’s how I prepared this demo:


rem
rem     Script:         index_ffs_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2012
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid wordpress format issue
;

create index t1_v1 on t1(small_vc) pctfree 80;

delete
        from t1
where
        id between 5000 and 95000
;

commit;

When I gathered stats on the index I had just deleted 90% of the data from the table – so 90% of the leaf blocks in the index were empty but still in the index structure, and only 10% of the leaf blocks held any current index entries. This is why Oracle reported leaf_blocks = 153 while the index fast full scan had to read through nearly 1530 blocks.

This is one of those contradictory problems – for an index fast full scan you need to know the size of the segment that will be scanned because that’s the number of blocks you will have to examine; but in most cases the number of populated index leaf blocks is the number you need to know about when you’re trying to cost an index range scan. Of course in most cases the nature of Oracle’s implementation of B-tree indexes will mean that the two counts will be within one or two percent of each other. But there are a few extreme cases where you could get an index into a state where the segment size is large and the data set is small and you don’t want the optimizer to think that an index fast full scan will be low-cost.

Oracle produced a mechanism for getting the segment block count captured as the leaf_blocks statistic late in 10.2, but it’s not implemented by default, and it’s not something you can tweak into a query with the opt_param() hint. Fix control 509019 has the description: “set leaf blocks to the number of blocks in the index extent map”, but it’s not a run-time fix, it’s a fix that has to be in place when you gather the  index stats – thus:

alter session set "_fix_control"='5099019:1';

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                estimate_percent => 100,
                method_opt       => 'for all columns size 1'
        );
end;
/

select
        index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
;

alter system flush buffer_cache;

set autotrace traceonly 

select
        count(small_vc)
from    t1
where   small_vc is not null
;

set autotrace off

===================================================

Session altered.

PL/SQL procedure successfully completed.

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1_V1                       1555

1 row selected.

System altered.

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 274579903

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   201   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_V1 |  9999 |   107K|   201   (3)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SMALL_VC" IS NOT NULL)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1530  consistent gets
       1522  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        439  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Session altered.

As you can see Oracle has now set leaf_blocks = 1555. This is actually the number of blocks below the highwater mark in the segment. A further check with the dbms_space and dbms_space_usage packages showed that the number of blocks in the index structure was actually 1,523 with a further 32 blocks that we could infer were space management blocks. Of the 1,523 blocks in the index structure 157 were reported as “FULL” while 1364 were reported as FS2 which possibly ought to mean  “available for re-use” (though still in the structure), although this didn’t quite seem to be the case a few years ago.

Although Oracle has supplied a fix to a problem I highlighted in CBO-F, I can understand why it’s not enabled by default, and I don’t think I’d want to take advantage of it in a production system given the way it’s a session setting at stats gathering time. The number of times it likely to matter I’d probably add hints to the SQL to stop the optimizer from using the index incorrectly, or do something at a stats-gathering moment to call dbms_stats.set_index_stats().

October 1, 2020

Index FFS Cost

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 11:46 am BST Oct 1,2020

There are a number of unexpected issues with the optimizer’s treatment of the index fast full scan, the access path where Oracle ignores the structure of the B-tree and uses multiblock reads to do a brute-force segment scan as if the index were a “skinny table” with a few blocks of irrelevant garbage (i.e. the branch blocks) that could be ignored.

A recent comment made a few days on a blog about the optimizer’s “index-join” access path reminded me that I had a few notes to finish and publish that might help some people address performance issues relating to a couple of little-known fixes for optimizer oddities in this area. Today’s “fix” is one that appeared in 10g (or maybe very late in 9i) but never became the default optimizer behaviour (possibly because it’s one of those “not a bug” types of bug with no “always correct” strategy).

The problem revolves around the optimizer_index_cost_adj parameter (which is another reason why the fix might have been kept out of the standard code base). For index-only execution plans the parameter will apply to the index range scan, full scan, or skip scan, but not to the index fast full scan. Here’s a model to demonstrate this:


rem
rem     Script:         oica_iffs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(trunc(dbms_random.value(0,10000)),10)      v1,
        rpad('x',20)                                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(v1);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                method_opt       => 'for all columns size 1'
        );
end;
.

select blocks from user_tables where table_name = 'T1';
select leaf_blocks from user_indexes where index_name = 'T1_I1';

I’ve created a fairly narrow table with 1M rows, and created an index on that table on the v1 column. In my test there were 5,219 blocks in the table and 3,068 blocks in the index.

I’m now going to run a query to count the rows in the table where v1 is not null and report its execution plan. The first run will be unhinted, the second run will include a hint that forces an index full scan, and the third run will include a hint to force an index fast full scan.


prompt  ==============
prompt  Unhinted query
prompt  ==============

select 
        count(*) 
from
        t1
where
        v1 is not null
;

prompt  ==========
prompt  Index hint
prompt  ==========

select 
        /*+ index(t1) */
        count(*) 
from
        t1
where
        v1 is not null
;

prompt  ==============
prompt  index_ffs hint
prompt  ==============

select 
        /*+ index_ffs(t1) */
        count(*) 
from
        t1
where
        v1 is not null
;

Here are the three execution plans. As you might have predicted the default plan (for my Oracle 19.3) is the index fast full scan:


==============
Unhinted query
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==========
Index hint
==========
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |  3159   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|  3159   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==============
index_ffs hint
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

Since the size of the index is less than the size of the table the cost of the index fast full scan will be lower than the cost of a tablescan; and since an index fast full scan does multiblock reads while the index full scan is assumed to use single block reads the cost of the index fast full scan will be lower than the cost of the index full scan.

Now we set the optimizer_index_cost_adj to 1 (as happens in a number of “legacy” commercial products) and repeat the experiment. Here are the three plans (reduced to just the body of the plan):

==================
OICA = 1, Unhinted
==================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

====================
OICA = 1, index hint
====================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

========================
OICA = 1, index_ffs hint
========================
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

The default plan is now the index full scan, not the index fast full scan; and when we look at the two hinted plans we can see why. The full scan cost has been reduced to 1% of the original cost (as dictated by the setting for optimizer_index_cost_adj); the fast full scan cost has not been adjusted.

So is this a good thing or a bad thing? Different people may answer that question differently. There are clearly likely to be cases where changing the setting for the optimizer_index_cost_adj (which you shouldn’t have been doing anyway for the last 20 years or so) will result in some plans switching from fast full scans to full scans with unfortunate consequences;

On the other hand when you do a fast full scan it may do direct path reads, while the full scan will do cached reads, and the cache may be holding a lot of the index leaf blocks already, so some people might prefer the current behaviour;

On the other, other hand an Exadata system can use storage indexes when doing direct path segment scans (even when the segment is an index) so an Exadata fast full scan might be far more desirable than a partially cached full scan.

So no generic right answer, only specific answers for particular sets of circumstances.

Solutions

This issue was addressed some time in the 10g timeline- my notes say the earliest successful tests I did of the fix were on 10.2.0.4, and that it wasn’t available in 9.2.0.8. There are two possibilities – a fix control, and an event.

  • Fix control: 4483286 – descibed as “Discount FFS cost using optimizer_index_cost_adj”
  • Event: 38085 at level one. I don’t think this event has ever appeared in the oraus.msg file, but it is referenced by bug 4483286

Of the two, setting the fix control is probably the safer option since the event is (apparently) non-existent and there’s always the possibility that the number will be hi-jacked in a future release to do something completely different. It’s also worth noting that the fix control can be applied to an individual statement through the /*+ opt_param() */ hint (there’s an example of the syntax towards the end of another article of mine).

A little demonstration:


-- alter session set events '38085 trace name context forever, level 1';
alter session set "_fix_control"='4483286:1';
alter session set optimizer_index_cost_adj = 1;

set autotrace traceonly explain

prompt  ==============
prompt  Unhinted query
prompt  ==============

select 
        count(*) 
from
        t1
where
        v1 is not null
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With the event or _fix_control set the cost of the index fast full scan drops to 1%, and reappears as the default (unhinted) path.

 

September 9, 2020

Bloom Upgrade

Filed under: 18c,CBO,Joins,Oracle,Tuning — Jonathan Lewis @ 5:18 pm BST Sep 9,2020

It’s a common pattern of Oracle features that they start with various restrictions or limitations that disappear over time. This note is about an enhancement to Bloom filter processing that appeared in the 18.1 optimizer and, for some people, may be a good enough reason for upgrading to a newer version of Oracle. This enhancement came to my attention by way of the Oracle Developer forum in a thread with the title Bloom filters and view using UNION ALL asking how to get a Bloom filter pushed inside a UNION ALL view. The original requirement wasn’t a trivial one so I’ll demonstrate the problem with a very simple example – first the data set:

rem
rem     Script:         bloom_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem
rem     Last tested:
rem             19.3.0.0
rem

create table t1 as select * from all_objects where rownum <= 50000;
create table t2 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4);
create table t3 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4); -- > comment to avoid wordpress format issue

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns object_type size 254'
        );
end;
/

I’ve been a bit lazy here, copying data from view all_objects. I’ve gathered stats on t1 so that I can generate a histogram on the object_type column because I’m going to query for a rare object_type and I want the optimizer to get a reasonable estimate of rows. I’m going to hint a parallel query to join t1 to t2 (aliased, trivially, as v1 for reasons that will become apparent soon):

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        t2 v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

In my case the optimizer chooses to do a hash join between these two table, and creates a Bloom filter to try and minimise the data passing through the data flow operation. The result set in my 12.2.0.1 database is only 16 rows, so it would be nice if the parallel scan could eliminate most of the 200,000 rows in t2 early – here’s the execution plan pulled from memory after running the query with rowsource execution stats enabled:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |   371 (100)|     16 |00:00:00.06 |      20 |      0 |       |       |          |
|   1 |  PX COORDINATOR        |          |      1 |        |            |     16 |00:00:00.06 |      20 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)  | :TQ10000 |      0 |     16 |   371   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN           |          |      2 |     16 |   371   (5)|     16 |00:00:00.05 |    6278 |   3988 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   6 |     JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|   7 |      PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.03 |    4244 |   3988 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"V1"."OBJECT_ID"))

We see that Oracle has generated a Bloom filter at operation 4 from the data returned from t1 at operation 5, and then used that Bloom filter at operation 6 to eliminate most of the data from t2 before passing the remaining few rows up to the hash join.

Let’s make the query more interesting – what if you want to use a UNION ALL of t2 and t3 in the query (for example one might be “current data” while the other is “historic data”. Here’s the query and plan from 12.2.0.1:

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   667 (100)|     32 |00:00:00.37 |      40 |      0 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     32 |00:00:00.37 |      40 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN            |          |      1 |     32 |   667   (5)|     32 |00:00:00.34 |    5125 |   3860 |  1250K|  1250K|     2/0/0|
|*  4 |     TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   5 |     VIEW                |          |      2 |    400K|   584   (4)|    400K|00:00:00.52 |    8488 |   7976 |       |       |          |
|   6 |      UNION-ALL          |          |      2 |        |            |    400K|00:00:00.24 |    8488 |   7976 |       |       |          |
|   7 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|*  8 |        TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|    200K|00:00:00.07 |    4244 |   3988 |       |       |          |
|   9 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|* 10 |        TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|    200K|00:00:00.03 |    4244 |   3988 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   4 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue

No Bloom filter – so all 400,000 rows feed up the plan and through the hash join. This won’t matter too much for my sub-second tiny data set but on a pair of 50GB tables, with the potential to offload the Bloom filter to storage in Exadata and, perhaps, eliminate 99% of the data at the cell servers, this could make a huge difference to performance.

Since Bloom filters are all about hashing data (in Oracle the standard Bloom filter is the bitmap summarising the build table in a hash join) let’s trying pushing the optimizer into a hash distribution for the parallel join to see if that had any effect:


select
        /*+ 
                parallel(2) 
                gather_plan_statistics
                leading(@sel$1 t1@sel$1 v1@sel$1)
                use_hash(@sel$1 v1@sel$1)
                pq_distribute(@sel$1 v1@sel$1 hash hash)
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |      1 |        |   667 (100)|     32 |00:00:00.43 |      60 |      0 |       |       |          |
|   1 |  PX COORDINATOR             |          |      1 |        |            |     32 |00:00:00.43 |      60 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN BUFFERED       |          |      1 |     32 |   667   (5)|     32 |00:00:00.38 |    4000 |   3752 |  2290K|  2082K|     2/0/0|
|   4 |     JOIN FILTER CREATE      | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX RECEIVE             |          |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |      0 |      4 |    75   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |        STATISTICS COLLECTOR |          |      2 |        |            |      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|   8 |         PX BLOCK ITERATOR   |          |      2 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|*  9 |          TABLE ACCESS FULL  | T1       |     26 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|  10 |     PX RECEIVE              |          |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  12 |       JOIN FILTER USE       | :BF0000  |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  13 |        VIEW                 |          |      2 |    400K|   584   (4)|    400K|00:00:00.68 |    8488 |   7976 |       |       |          |
|  14 |         UNION-ALL           |          |      2 |        |            |    400K|00:00:00.59 |    8488 |   7976 |       |       |          |
|  15 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.18 |    4244 |   3988 |       |       |          |
|* 16 |           TABLE ACCESS FULL | T2       |     32 |    200K|   292   (4)|    200K|00:00:00.06 |    4244 |   3988 |       |       |          |
|  17 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.12 |    4244 |   3988 |       |       |          |
|* 18 |           TABLE ACCESS FULL | T3       |     32 |    200K|   292   (4)|    200K|00:00:00.08 |    4244 |   3988 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue 
       filter("T1"."OBJECT_TYPE"='SCHEDULE') 
  16 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue
  18 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue

We’ve managed to introduce a Bloom filter (which is visible as :BF0000 in the plan, even through there’s no reference to sys_op_bloom_filter() in the predicate information) but there’s a problem, we’re still passing 400,000 rows up the plan and the Bloom filter is only being applied at (or just after) the VIEW operator, discarding all but 66 rows before doing the hash join. It’s an improvement but not ideal; we’d like to see the Bloom filter applied to each of the two tables separately to eliminate rows as early as possible.

This can’t be done in 12.2, and you’d have to rewrite the query, changing a “join with union” into a “union of joins”, and that’s not really a desirable strategy.

Next Steps

Searching MOS, though you will be able to find the following note:

Doc ID 18849313.8 – ENH : bloom filters/pruning are pushed through union-all view

There’s an enhancement request to do what we want in 18.1, and the enhancement has got into the software. Here’s the (unhinted) plan from 19.3 (the plan stays the same when optimizer_features_enable is set back to 18.1.0, but drops back to the 12.1. plan when OFE is set to 12.2.0.1):

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   666 (100)|     32 |00:00:00.11 |      10 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     32 |00:00:00.11 |      10 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)    | :TQ10000 |      0 |     32 |   666   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN             |          |      2 |     32 |   666   (5)|     32 |00:00:00.05 |   10020 |   7958 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE   | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|   6 |     VIEW                 |          |      2 |    400K|   583   (4)|     32 |00:00:00.04 |    8022 |   7958 |       |       |          |
|   7 |      UNION-ALL           |          |      1 |        |            |     12 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   8 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   9 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 10 |         TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  11 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  12 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 13 |         TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."OBJECT_ID")) 
  13 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T3"."OBJECT_ID"))

As you can see, we create a Bloom filter at operation 4, and use it twice at operations 8 and 11 – with the sys_op_bloom_filter() functions clearly visible in the predicate information showing us that the Bloom filter is applied to the object_id column in both cases.

If you want to disable this enhancement for some reasons there are two hidden parameters available (which you might set for a single query using the opt_param() hint):

  • _bloom_filter_setops_enabled = true
  • _bloom_pruning_setops_enabled = true

The first is for Bloom filters in the situation shown, I assume the second deals with Bloom filters for partition pruning.

Summary

In versions prior to 18.1 the optimizer is unable to push Bloom filters down to the individual tables in a UNION ALL view, but this limitation was removed in the 18.1 code set.

 

July 10, 2020

Recursive WITH upgrade

Filed under: ANSI Standard,CBO,Execution plans,Oracle,Subquery Factoring,Upgrades — Jonathan Lewis @ 4:19 pm BST Jul 10,2020

There’s a notable change in the way the optimizer does cost and cardinality calculations for recursive subquery factoring that may make some of your execution plans change – with a massive impact on performance – as you upgrade to any version of Oracle from 12.2.0.1 onwards. The problem appeared in a question on the Oracle Developer Community forum a little while ago, with a demonstration script to model the issue.

I’ve copied the script – with a little editing – and reproduced the change in execution plan described by the OP. Here’s my copy of the script, with the insert statements that generate the data (all 1,580 of them) removed.

rem
rem     Script:         recursive_with_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem
rem     Notes:
rem     https://community.oracle.com/thread/4338248
rem
rem     The upgrade to 12.2.0.1 made this query much slower (on 15,000 rows)
rem     Setting OFE to 12.1.0.1 is a first possible fix for the issue.
rem     The scale is too small to see much difference in this case
rem

drop table test_folder purge;

create table test_folder(
        fldr_key                number(16,0)            not null        enable,                 
        fldr_id                 varchar2(255 byte)      not null        enable,                 
        fldr_desc_tx            varchar2(255 byte),                     
        par_fldr_key            number(16,0),                   
        seus_key                number(16,0)            not null        enable,                 
        fldr_private_flg        varchar2(1 byte)        not null        enable,                 
        last_updt_dt            date                    not null        enable,                 
        last_upby_seus_key      number(16,0)            not null        enable,                 
        lock_seq_nbr            number(9,0) default 0   not null        enable,                 
        content_guid            raw(16),                
        constraint test_folder_pk primary key (fldr_key)                
)       
;              

-- list of insert statements

alter table test_folder add constraint test_folder_fk  
        foreign key (par_fldr_key) references test_folder(fldr_key)
;  
  
create or replace force editionable view test_folder_vw (fldr_key) as   
with rec_path(fldr_key)  as (
        select  tf.fldr_key  
        from    test_folder tf where tf.par_fldr_key is null  
        union all  
        select  tf.fldr_key  
        from    test_folder tf, rec_path  
        where   rec_path.fldr_key = tf.par_fldr_key
)  
select fldr_key  
from rec_path   
; 

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'TEST_FOLDER',
                method_opt  => 'for all columns size 1'
        );
end;
/


select * from test_folder_vw where fldr_key = -41;  

I’ve run the test 3 times. First in 12.2.0.1 with no tweaking; then in 12.2.0.1 with the hint /*+ optimizer_features_enable(‘12.1.0.2’) */ and finally in a genuine 12.1.0.2 environment. In all three cases I enabled rowsource execution stats (‘alter session set statistics_level = all’) and pulled the plans from memory – with the following results

First, the base result from 12.1.0.2

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.03 |     604 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.03 |     604 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.03 |     604 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      68 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.02 |     536 |  1696K|  1696K| 1488K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|*  6 |     TABLE ACCESS FULL                     | TEST_FOLDER |      8 |   1419 |  11352 |00:00:00.01 |     536 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

Two points to note, in particular. First that the hash join has the recursive with pump as its first (build table) child and the table access full of test_folder as its second child (probe table); secondly that there is no value given for E-Rows for the recursive with pump.

Now the 12.2.0.1 plan:

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |   2524K|      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |   2524K|   1419 |00:00:00.01 |      23 |  1743K|  1743K| 1632K (0)|
|   5 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  6 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
|   7 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41)
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

There are three changes to notice in this plan – which (for the OP) was much slower than the 12.1.0.2 plan. First, the order of the hash join has changed, the recursive with pump is now the second child (probe table) in the join (and again shows no value for E-Rows); secondly that Oracle has introduced an extra operation – the buffer sort (reuse) – populated by the table access full – as the build table; thirdly (presumably the point of buffer sort (reuse) operation) the number of buffer visits has dropped from a few hundred to a couple of dozen.

Finally let’s check what happens if we set the OFE (optimizer_features_enable) to 12.1.0.2 while running 12.2.0.1

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.01 |      23 |  1797K|  1797K| 1573K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|   6 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  7 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   7 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

In these conditions the recursive with pump has gone back to being the build table (first child); but it’s worth noting that the 12.2 buffer sort (reuse) is still in place – saving us a few hundred buffer gets (and, for a bigger table, a number of disc reads possibly). Downgrading the optimizer_features_enable has given us the plan we needed, but this we’ve got an example that shows that hacking the parameter isn’t a guarantee that we will get exactly the path we used to get in the older version.

The story so far.

It seems that we can address the performance problem that the OP had by setting the optimzer_feature_enable to the older version – possibly through a hint embedded in the SQL, perhaps through an SQL Baseline or SQL Patch. Maybe we’ll have to have a logon trigger that sets the parameter for particular users or, worst case scenario, maybe we’ll have to set the parameter at the system level. Given how undesirable the last option could be it would be nice to know exactly what is causing the change in plan.

As a basic clue – if the order of tables in a hash join reverses itself this usually means that the byte (not row) estimates have changed. The optimizer will use the table with the lower byte count as the build table in a hash join. So the recursive with pump – whose row and byte estimates don’t appear – must have produced larger numbers in 12.2.0.1.

A change in the 12.2 plan that I haven’t yet mentioned is the E-rows for the hash join; it’s gone up from 640 (12.1.0.2) to 2.5 million! So let’s repeat the tests with the CBO (10053) trace file enabled and see if we can find a number like 2524K appearing as a join estimate in the trace file. Having created the two trace files (in 12.2.0.1, one with the OFE set backwards) I executed the following grep command against the trace files:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc

I’d set the tracefile_identifier to ‘ofe’ and ‘base’ respectively for the 12.1.0.2 and 12.2.0.1 tests, and here are the results:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
1166:Join Card - Rounded: 640 Computed: 639.941176

grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc
1195:Join Card - Rounded: 640 Computed: 639.941176
1391:Join Card - Rounded: 2544 Computed: 2543.865546
1576:Join Card - Rounded: 10112 Computed: 10111.865546
1737:Join Card - Rounded: 40193 Computed: 40193.075630
1898:Join Card - Rounded: 159759 Computed: 159758.731092
2059:Join Card - Rounded: 635008 Computed: 635008.462185
2220:Join Card - Rounded: 2524023 Computed: 2524023.394958
2269:Join Card - Rounded: 2524023 Computed: 2524023.394958

That’s an interesting clue. Something seems to be calculating a larger and larger value in the 12.2.0.1 trace, starting with the hash join cardinality that appeared in 12.1.0.2 had, growing by a factor of nearly 4 each time, and ending with the hash join cardinality we saw in the 12.2.0.1 plan.

Taking a closer look at the content of the 12.2.0.1 trace file it turned out that every stage in that escalation was Oracle recalculating the cost and cardinality of joining test_folder (the table) and rec_path (the “with” subquery) using the figures from the previous join calculation as the new base figures for rec_path. In effect the optimizer was calculating the cost of following the recursive subquery down to its 7th level of recursion.

Side note: in agreement with my comment about the smaller (in bytes) rowsource being used as the build table, the initial join order started as (test_folder, rec_path) in the first two iterations, but switched to (rec_path, test_folder) from the 3rd iteration onwards.

So we’ve identified the mechanics that cause the change in plan; the question now is: why 7 iterations to the final cost? (Briefly I did a quick check to see how many circles of hell there were in Dante’s Inferno – but it’s 9 (or 10 depending how you count). A quick check of v$parameter (and the x$ tables for the hidden parameters) revealed the following:

Name                                     Value
------------------------------------ ---------
_recursive_with_branch_iterations            7

Setting this parameter to 1 in the session, or adding the hint /*+ opt_param(‘_recursive_with_branch_iterations’ 1) */ to the query resulted in the 12.1.0.2 plan appearing in 12.2.0.1 – and this is a much less intrusive way of getting the plan we’re interested in than setting the entire OFE back to 12.1.0.2. One might even set the parameter in the spfile (after receiving approval from Oracle Corp., of course) given how precisely targetted it is (and know that it doesn’t switch off that nice little buffering trick.)

Summary

From 12.2 onwards the optimizer does recursive recosting of recursive “with” subqueries. This means the cost and cardinality estimates of a plan may change and the impact may cause a significant change in performance – it certainly did for the OP.

The change seems to be driven by the hidden parameter _recursive_with_branch_iterations, which was introduced in 12.2.0.0 with a default value of 7. Setting this parameter to 1 reverts the optimizer to the pre-12.2 behaviour. If you run into a problem of recursive “with” subqueries changing plans and causing performance problems on an upgrade from pre-12.2 to a newer version of Oracle then it would be worth investigating this parameter as the least intrusive way of reverting back to the original plans.

Update (July 2020)

I’ve just been prompted to check MOS for any references to the hidden parameter – and discovered a note that was published in September 2018, updated ub Feb 2019.  It’s amazing how easy it can be to find an answer on MOS when you already know what the answer is ;) Document id 2443466.1 Oracle 12.2.0.1 CBO calculating high cost/CPU for queries with recursive sub-query (Doc ID 2443466.1)

This gives two workarounds to the problem of a change in cost in 12.2 – set the optimizer_features_enable to 12.1.0.2, or set the hidden parameter to 1. It references two bugs (one a duplicate of the other, both apparently unpublished):

  • Bug 23515289 : PERFORMANCE REGRESSION OBSERVED WITH RECURSIVE WITH SERIAL PLAN
  • Bug 24566985 : UPG: QUERY PERFORMANCE ON ALL_TSTZ_TABLES 160 TIMES SLOWER THAN 11.2.0.4

and the Permanent Fix for the problem is to install the patch for Bug 24566985 on 12.2.0.1

 

June 29, 2020

Most Recent – 2

Filed under: CBO,Execution plans,Oracle,Tuning — Jonathan Lewis @ 1:02 pm BST Jun 29,2020

A question arrived in my email a few days ago with the following observations on a statement that was supposed to query the data dictionary for some information about a specified composite partitioned table. The query was wrapped in a little PL/SQL, similar to the following:

declare
        v_src_part      varchar2(30) := null;
        v_tab           varchar2(30)  := 'PT_COMPOSITE_1';
begin

        select
                /*+ qb_name(main) */
                uts1.subpartition_name
        into    v_src_part
        from
                user_tab_subpartitions uts1
        where
                uts1.table_name = v_tab
        and     uts1.last_analyzed is not null
        and     uts1.num_rows = (
                        select
                                /*+ qb_name(max_subq) */
                                max (uts2.num_rows)
                        from
                                user_tab_subpartitions uts2
                        where
                                uts2.table_name = /* v_tab */ uts1.table_name
                )
        and     rownum = 1
        ;

The requirement is simple: identify the subpartitions of a specific table that have the largest number of rows of any subpartition of the table – but report only the first match.

You’ll notice that the where clause of the subquery has a commented “v_tab” in it. This is the PL/SQL variable used in the outer query block to identify the target table, and it shouldn’t really make any difference if I use the PL/SQL variable in the subquery rather than using a correlating column. However, the question that came with this block of code was was follows:

All the partitions and subpartitions had their stats when running the test. On a first run using the correlated subquery the block reported oracle error ORA-01403: no data found. Changing the code to use the PL/SQL variable the block reported a specific subpartition as expected. A few hours later (after changing the code back to use the correlated subquery) the block reported the same subpartition. Have you ever seen anything like this? The Oracle version is 12.1.0.2.

Rule 1, of course, is to be a little sceptical when someone says “Honest, Guv, the stats are all okay”. But I’m going to assume that the statistcs on this table really were complete and that there was no “data-related” reason for this query to behave in such a surprising way.

The email is an invitation to consider two points.

  1. This looks like a bug: the two versions of the query are logically equivalent, they should return the same results if the underlying data had not changed. (In fact, I think the only “legal” way that the query could return ORA-01403 is if there were no stats on any subpartitions of the table in question – any ordinary usage of the dbms_stats package other than delete_table_stats() would have ensured that the query had to find something.) So, the first run of the correlated subquery produced no data while the modified query did get a result. That suggests a problem with some transformation in the 12.1.0.2 code to handle correlated aggregate subqueries.
  2. How could the second execution of the version with the correlated subquery produce a result a few hours later. Here are a couple of possibilities:
    • Someone had gathered dictionary stats (i.e. on the tables used by the query, not on the subpartitioned table) in the “few hours” gap so the optimizer picked a different execution plan which bypassed the bug.
    • (minor variation on previous) Someone had gather dictionary stats when the first execution plan was already in memory but the “auto_invalidate” option for cursor invalidation meant that the query didn’t get re-optimised for a few hours.
    • Nothing changed, but the query had been flushed from the library cache and did need re-optimisation a few hours later. Since the version is 12.1.0.2 this means statistics feedback or automatic SQL directives could have had an impact – which means there may be dynamic sampling during optimisation – and a different set of random samples could have resulted in a different execution plan.
    • Other …

The interesting bit

There is a generic feature about this question that is more interesting than the “what went wrong, how could I get different results”, and it’s in the choice you can make between using a correlation column and repeating a pl/sql variable (or literal value ).

The switch to using a pl/sql variable turns the subquery into a single-row, “standalone”, subquery – one that could be run without any reference to the outer query – and this imposes a dramatic change on what the optimizer can doSometimes that change will make a huge difference to the optimisation time and the run time.

As a correlated subquery the notional “first strategy” for the optimizer is:

“for each row in the outer query execute the inner query as a filter subquery passing in the correlation value

If you take the “standalone” approach the optimizer will be looking for a plan that says (in effect):

“run the subquery once to generate a constant that you will need to execute the rest of the query”

Running the subquery once rather than once per row is likely to be a good idea – on the other hand Oracle can do “scalar subquery caching” so if the value of the correlation column is always the same the correlated subquery will actually run only once anyway.

More importantly, when the optimizer sees a correlated subquery it will consider unnesting it and then transforming it in various other ways; and it might take the optimizer a long time to work out what it can and can’t do, and the plan it finally does produce may be much slower than what it could have done if it had not unnested the subquery.

Some test results

So I ran 3 variations of the PL/SQL block on Oracle 19.3.0.0 with the CBO trace (10053) enabled and picked out a few highlights. The three tests in order were:

  1. Use the pl/sql variable so the subquery could run as a standalone query
  2. Use the correlating column to make the subquery a correlated subquery
  3. Use the correlating column, but add the hint /*+ no_unnest */ to the subquery.

The results were as follows – first the timing, then a critical measure that explains the timing:

  • Case 1 – standalone subquery – total time 0.82 seconds
  • Case 2 – correlated subquery – total time 5.76 seconds
  • Case 3 – correlated subquery with no_unnest hint – total time 0.84 seconds

Where did all that extra time go – a lot of it went in optimisation. How many “Join Orders” were examined for each query

  • Case 1 – standalone subquery – 90 join orders
  • Case 2 – correlated subquery – 863 join orders
  • Case 3 – correlated subquery with no_unnest hint – 90 join orders

If you’re wondering what the 773 extra join orders were about here’s a clue. I extracted all the lines from the case 2 trace file that started with “SU:” – those are the lines tagged for “Subquery Unnest” – using a call to grep -n “^SU:” {tracefile name} and this is the result:


  2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
  2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
  2948:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
  2949:SU:   Passed validity checks, but requires costing.
  2950:SU: Using search type: exhaustive
  2951:SU: Starting iteration 1, state space = (2) : (1)
  2952:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
  3089:SU: Costing transformed query.
 66112:SU: Considering interleaved complex view merging
 66113:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
 66366:SU: Costing transformed query.
129372:SU: Finished interleaved complex view merging
129373:SU: Considering interleaved distinct placement
129374:SU: Finished interleaved distinct placement
129375:SU: Considering interleaved join pred push down
129376:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
251638:SU: Rejected interleaved query.
251640:SU: Finished interleaved join pred push down
251641:SU: Considering interleaved OR Expansion
251642:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
251651:SU: Finished interleaved OR Expansion
251653:SU: Updated best state, Cost = 19.085153
251654:SU: Starting iteration 2, state space = (2) : (0)
251665:SU: Costing transformed query.
310395:SU: Not update best state, Cost = 20.083998
310396:SU: Will unnest subquery SEL$4F5F2F29 (#2)

The optimizer checks the validity of unnesting (generated) query block SEL$4F5F2F29 at line 2948 of the trace and decides, 308,000 lines later after an exhaustive examination of the possibilities, that it will unnest the subquery. Since this is a recent version of Oracle we take one simple extra step by checking for “TIMER” information, again using a “grep -n” call –

251639:TIMER:  SU: Interleaved JPPD SEL$B73B51DC cpu: 1.263 sec elapsed: 1.263 sec
251652:TIMER: SU: iteration (#1) SEL$B73B51DC cpu: 2.607 sec elapsed: 2.607 sec
310577:TIMER: CBQT SU and CVM SEL$071BB01A cpu: 3.323 sec elapsed: 3.323 sec
433371:TIMER: Cost-Based Join Predicate Push-Down SEL$12B6FE6C cpu: 1.307 sec elapsed: 1.306 sec
433477:TIMER: Cost-Based Transformations (Overall) SEL$12B6FE6C cpu: 4.731 sec elapsed: 4.731 sec
496189:TIMER: SQL Optimization (Overall) SEL$12B6FE6C cpu: 5.306 sec elapsed: 5.306 sec

Of course most of the time spent in this particular example was a result of optimising (and writing the optimizer trace), but for my tiny example (table definition below) the final figures I’ll show are the buffer gets and CPU time reported by a basic 10046 trace file after optimisation with all the relevant data was cached:

  • Case 1 – standalone subquery – 89 buffer gets / 0.00 seconds
  • Case 2 – correlated subquery – 130 buffer gets / 0.53 seconds
  • Case 3 – correlated subquery with no_unnest hint – 121 buffer gets / 0.08 CPU seconds

The sub-centisecond time is a little suspect, of course, but the others seem fairly trustworthy.

Conclusion

The title of this piece is “Most Recent” because the commonest requirement for a query of this shape is find the most recent row matching the following predicates”, even though in this case the interpretation is “find me the row matching the largest value”.

The “standard” pattern for writing a “most recent” query is to use a correlated subquery – but it’s worth remembering that you may reduce optimisation time and run time by “copying down the constant” rather than using the correlation mechanism.

(There are alternative strategies to the subquery approach, of course, and the analytic max() – introduced in Oracle 8i – is gaining traction as one of the popular alternatives.)

Footnote 1

If you want to re-run my test on different platforms and versions of Oracle, here’s the code to generate the table.  (Don’t be surprised if you don’t get completely consistent results – much of the optimization will depend on the size of all the relevant tables (tab$, tabcompart$, etc.) in the data dictionary, rather than on the actual definition of this partitioned table.


em
rem     Script:         most_recent_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table pt_composite_1 (
        id,
        grp,
        small_vc,
        padding
)
nologging
partition by range(id) 
subpartition by hash (grp)
subpartitions 4
(
        partition p2 values less than (400),
        partition p3 values less than (800),
        partition p4 values less than (1600),
        partition p5 values less than (3200)
)
as
select
        rownum                          id,
        trunc(rownum/50)                grp,
        to_char(trunc(rownum/20))       small_vc,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 3000 -- > comment to avoid wordpress format issue
;

execute dbms_stats.gather_table_stats(user,'pt_composite_1',granularity=>'ALL')


Footnote 2

For reference, here are the outputs I got from executing egrep -n -e”^SU:” -e”TIMER” against the other two CBO trace files.

First for the “standalone” form – note how line 3130 tells us that “there is no correlation”.


806:SU: Considering subquery unnesting in query block MISC$1 (#0)
2947:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
2952:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
2953:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
2954:SU:     SU bypassed: No correlation to immediate outer subquery.
2955:SU:     SU bypassed: Failed basic validity checks.
2956:SU:   Validity checks failed.
3130:SU:     SU bypassed: No correlation to immediate outer subquery.

Then for the correlated subquery with /*+ no_unnest */ hint; and line 3122 tells us that SU was bypassed because of a hint/parameter:


809:SU: Considering subquery unnesting in query block MISC$1 (#0)
2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
2948:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
2949:SU:     SU bypassed: Not enabled by hint/parameter.
2950:SU:     SU bypassed: Failed basic validity checks.
2951:SU:   Validity checks failed.
3122:SU:     SU bypassed: Not enabled by hint/parameter.

Neither file showed any “TIMER” information since that appears, by default, only for steps that take longer than one second. (If you want to adjust the granularity, see Franck Pachot’s note on parse time that describes bug/fix_control 16923858.

June 17, 2020

Lower Cost Ignored

Filed under: CBO,Oracle — Jonathan Lewis @ 3:49 pm BST Jun 17,2020

This is an update on a post I wrote nearly 10 years ago describing how the optimizer could choose to ignore a lower cost indexed access path and use a higher cost index if the lower cost were based on guesswork. The original article article used (select {constant} from dual) as a way of supplying an “unpeekable bind” in a predicate, but that mechanism stopped working 11gR2, hence the update.

The upate also goes into a little more detail about event 38036 which can be used to modify this behaviour by defining a “cut-off” percentage where Oracle will switch back to using the lower cost path.

We start with the code to generate the data – including, in this case – a table that I can query to supply “hidden constants” to the optimizer:

em
rem     Script:         index_range_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                rownum <= 10000 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        9 + mod(rownum-1,3)     scattered,
        trunc(rownum / 500)     clustered,
        lpad(mod(rownum,2),10)  ind_pad,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 40000 -- > comment to avoid wordpress format issue
;

create index t1_equi  on t1(scattered, ind_pad, clustered);
create index t1_range on t1(clustered, scattered);

create table driver(name varchar2(20), value number(4,0));
create index drv_i1 on driver(name);

insert into driver values('Forty',40);
insert into driver values('FortyOne',41);
commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'DRIVER',
                method_opt       => 'for all columns size 1'
        );
end;
/


The driver table models a pattern that I see fairly frequently of a table holding a set of “constants” that are subject to infrequent changes. (An alternative strategy is to have a PL/SQL package of constants and a function to return a value when supplied with a name.)

The key columns in the table are

  • scattered: 3 distinct values evenly but randomly scattered across the table – not a column you would index separately
  • clustered: 500 consecutive rows each for 80 distinct values, so very well clustered data

There are two indexes on the main data table

  • t1_range (clustered scattered) — roughly (80 * 3 =) 240 distinct keys
  • t1_equi (scattered, ind_pad, clustered) — roughly (3 * 2 * 80) = 480 distinct keys

If I execute the query: “select * from t1 where clustered between 40 and 41 and scattered = 10” Oracle uses the t1_range index to return an estimated 491 rows at a cost of 36. The estimates aren’t too far out since in my case the query returned 334 rows in 26 buffer visits (when my SQL*Plus arraysize was 500).

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |   496 |  8432 |    36   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   496 |  8432 |    36   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_RANGE |   496 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CLUSTERED">=40 AND "SCATTERED"=10 AND "CLUSTERED".le.41)
       filter("SCATTERED"=10)

But what happens if we decide to keep the start and end values for clustered in the driver table:


set autotrace traceonly explain

select 
        /*  index(t1 t1_range) */
        small_vc
from
        t1
where
        scattered = 10
and     clustered between (select value from driver where name = 'Forty')
                  and     (select value from driver where name = 'FortyOne')
;

set autotrace off



You’ll notice the “hint that isn’t a hint” I’ve tested two versions of the query, one without a hint and one where I inserted the necessary “+” to have a hint instead of a comment. Here are the execution plans (with “<=” edited to “.le.” in the Predicate Information):


=======
Default
=======
-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |    33 |   561 |    67   (3)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1      |    33 |   561 |    63   (4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                    | T1_EQUI |    60 |       |    56   (4)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER  |     1 |    11 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | DRV_I1  |     1 |       |     1   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER  |     1 |    11 |     2   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | DRV_I1  |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SCATTERED"=10 AND "CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER"
              WHERE "NAME"='Forty') AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE
              "NAME"='FortyOne'))
       filter("CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='Forty')
              AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='FortyOne'))
   4 - access("NAME"='Forty')
   6 - access("NAME"='FortyOne')

=============
Hint t1_range
=============
--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |    33 |   561 |    10   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1       |    33 |   561 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                    | T1_RANGE |    60 |       |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER   |     1 |    11 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | DRV_I1   |     1 |       |     1   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| DRIVER   |     1 |    11 |     2   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | DRV_I1   |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CLUSTERED">= (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE "NAME"='Forty')
              AND "SCATTERED"=10 AND "CLUSTERED".le. (SELECT "VALUE" FROM "DRIVER" "DRIVER" WHERE
              "NAME"='FortyOne'))
       filter("SCATTERED"=10)
   4 - access("NAME"='Forty')
   6 - access("NAME"='FortyOne')

The big difference occurs at operation 2. You should notice two things – first, the estimate of rows is wrong by a factor of 10, but that’s not the point I want to chase. Secondly by default the optimizer has  selected the plan using the (far) more expensive index. This is a deliberately coded choice and if you have access to MOS then you should check Document ID 4112254.8: “CBO may not choose best index for single table access with unpeeked bind/function”.

Rerun the test (unhinted) after issuing.

alter session set events '38036 trace name context forever, level 10'

You will find that the optimizer automatically takes the path using the lower cost index. For values of 10 or higher Oracle will pick the lower cost index, for values of 9 or less the optimizer will stick with the higher cost index.

The question, of course, is what’s special about the value 10. Given the clue in the MOS document, and the costs from the plans above, and allowing for some fiddling with the arithmetic we can note the following:

  • Cost of using the expensive index for the table access is 63 (From the 10053 trace it’s actually 62.525996)
  • Cost of using the cheaper index for the table access if 6 (Again, from the 1003 trace, 6.037011)
  • Express 6.037011 / 62.525996 as a percentage and you get 9.655 which rounds to 10.

Probably not a coincidence! (Though it might take a few more test to decide whether it’s round() or ceiling())

So if the optimizer is picking the wrong index, and the arithmetic for the one it should be using is dictated by guessing, then calculate the ratio of the two costs for the table access, express as a percentage and round up (and add some if you want to) then set the event.

I’m not really sure how much use you could, or should, make of this event. Possibly it’s one of those things you use temporarily to allow you to generate an SQL Baseline (the event setting doesn’t get captured as an “opt_param()” hint unlike other fiddles of this type that you might use). Maybe there are a few special cases in batch jobs were the optimizer is a little inconsistent about which index to pick and the event is relevant and could be set through an ‘alter session’ command. It’s nice to know it’s there – but may be hard to use in practice.

Footnote

If you were wondering about the cardinality estimates of 60 rowids from the index and 33 rows from the table. This is the effect of the standard  “guesses for unpeeked binds” the optimizer uses. For bounded (between) range on an index the selectivity is 0.0045, for a table it’s 0.0025; in this case we then have to include a factor of 1/3 in the arithmetic to allow for the “scattered=10” predicate:

40,000 * 0.0045 * 1/3 = 60

40,000 * 0.0025 * 1/3 = 33.33….

 

 

 

 

June 5, 2020

Analytic cost error

Filed under: CBO,Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:06 pm BST Jun 5,2020

Here’s a surprising costing error that was raised on the Oracle Developer Forum a few days ago. There’s a glitch in the cost atributed to sorting when an analytic over() clause – with corresponding “window sort” operation – makes a “sort order by” operation redundant. Here’s a script to generate the data set I’ll use for a demonstration with a template for a few queries I’ll be running against the data.


rem
rem     Script:         window_sort_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                    id,
        cast(lpad(rownum,30,'0') as varchar2(30)) vc30,
        cast(lpad(rownum,65,'0') as varchar2(65)) vc65,
        lpad('x',100,'x')                         padding
from
        generator
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

select
--      row_number() over (order by vc65) rn1,
--      row_number() over (order by vc30) rn2,
--      vc30,
--      vc65,
        id
from
        t1
-- order by
--      vc65
--      vc30
/


I’m (optionally) using the row_number() analytic function over the entire data set and for each row_number() I include in the select list Oracle will have to sort the data; I’ve also got an (optional) order by on the two columns that appear in the row_number() functions and that may introduce some sorting as well. Here, for example, are a few examples of the queries I might run:


prompt  ===========================================
prompt  Select vc30, order by vc30
prompt  ===========================================

explain plan for
select 
        vc30,
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ==========================================
prompt  Select row_number over vc30 - no ordering
prompt  ==========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ===========================================
prompt  Select row_number over vc30 - order by vc65
prompt  ===========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

In the first query we select and sort vc30 so (approximately) we sort 10,000 rows x 30(-ish) bytes for 300K of sort space. In the second query we generate the row_number() based on sorting vc30 – the size of the output is much smaller (it’s only 10,000 numbers between 1 and 10,000) but to generate those numbers we do have to select and sort vc30, so the workload (predicted and actual) will probably be similar to that of the firsrt query. In the final query we have to select and sort vc30 to generate the row_number() but we also have to select (without reporting) and sort vc65 in order to report the results in the right order – so we should expect the workload to be roughly 3 times the size (approximately 10,000 * (30 + 65) bytes). Here, from 12.2.0.1, are the execution plans (with a little cosmetic tidying):

===========================================
Select vc30, order by vc30
===========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  SORT ORDER BY     |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

==========================================
Select row_number over vc30 - no ordering
==========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

===========================================
Select row_number over vc30 - order by vc65
===========================================
-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  SORT ORDER BY      |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

As expected, the execution plans are roughly consistent with the estimates I gave for volume of data – the agreement between the query with order by vc30 and the query with over(order by vc30), and the increased load of ordering by vc65 when selecting the row_number(over vc30) is good.

So let’s see what the plan looks like when we select row_number(over vc30) and then order by vc30. If the optimizer is smart it will recognise that it’s possible to adopt a strategy that allows it to take advantage of the sorting from the over() clause to avoid a separate sort order by:


explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|    42   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|    42   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|    42   (5)|
----------------------------------------------------------------

The plan shows us that Oracle has used the 10gR2 “order by elimination” feature to bypass the need for a “sort order by” operation because it knows the data will be arriving in the right order from the “Window Sort” operation. Ynfortunately it also shows us that Oracle has lost the cost of doing the Window Sort!

Let’s try another experiment – let’s generate two different row_number() columns, with and without ordering:


prompt  =====================================
prompt  Select Both row_numbers - no ordering
prompt  =====================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ======================================
prompt  Select Both row_numbers order by vc30
prompt  ======================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ============================================
prompt  Select Both row_numbers order by vc65
prompt  ============================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

You’ll notice that I’ve included a request for the projection information in the plans for these examples so that you can see what columns are passed up from each operation to its parent. Again, though, we’ll start by focusing on just the costs:


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC65"[VARCHAR2,65], "VC30"[VARCHAR2,30],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC30")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC65")[22]
   2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22],
       "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The first query – without an order by” clause – reports a cost of 463; add an order by clause and the cost drops to 253 (and the “order by” clause doesn’t appear as a sort order by operation in the plan). The cost differential between the ordered and “unordered” plans , by the way, is 210 (and from there down to the base tablescan is another 211) – and here’s another way to see that number (+/- 1) appearing:


explain plan for
select
        vc65,
        id
from
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   986K|       |   253   (3)|
|   1 |  SORT ORDER BY     |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
------------------------------------------------------------------------

The cost of the query with no “order by” clause is basically the cost of a table scan plus two sorts of (vc65, vc30, plus a few bits). When you add in an “order by” clause the optimizer discards the “order by” clause and then subtracts one of the sort costs as well.

CBO trace file

Every time I say something about 10053 (CBO) trace files I feel compelled to remind everyone that I rarely look at them, and then it’s usually because I think I know there’s a bug and where I’ll find it in the trace. That’s exactly the case here.

I’m expecting to see two differences in the trace files between the “no order” query, and a query where I’ve added in an “order by” clause. One difference is that one trace file will have an “OBYE” (eliminate order by) comment which won’t be in the other trace, one trace file will have a “cost for SORT” calculation which won’t be in the other.

So here are the relevant bits – first from the query without the order by clause:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE bypassed: no order by to eliminate.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 463.384707  card: 10000.000000  bytes: 1010000.000000
***********************

And from one of the plans with an order by:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE performed.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 252.512458  card: 10000.000000  bytes: 1010000.000000
***********************

As you can see, the first (no order) trace file has two sort calculations under WiF sort, (Window Function?) while the second (order by) trace file reports “OBYE performed” and then loses one of its WiF sorts.

Note: If I had ordered by ID I would have seen two calculations of cost under the heading of WiF sort and a third calculation below that with the heading ORDER BY sort. Unfortunately when I ran the test to check this I also found that the OBYE report said: “OBYE performed” when it wasn’t relevant and there wasn’t an “order by” available for elimination.

Interestingly I tested to see if I could change the optimizer’s behaviour by adding the no_eliminate_oby(@sel$1) hint to the “order by” queries but the GENERAL PLANS section didn’t change, even though the trace file report: “OBYE: OBYE bypassed: hinted”, and the “Hint Report” from the Oracle 19i execution plan acknowledge the hint as legal and used.

Summary

If you have an “order by” clause in a query block that includes analytic functions and the optimizer decides that it can eliminate the “order by” and rely on the side effect of an analytic over() clause you may find that the cost of the query block is reduced by the cost of one of the Window Sort operations. (Technically this might lead to cases where the optimizer then made some poor choices in overall shape of the execution plan – though such cases might be very rare given that this costing error doesn’t affect the cardinality estimates.)

Lagniappe

In the last set of tests I added in the projection information as a simple example of a case where it can help you understand a little more of what the plan is supposed to achieve.  If you examine the last two query plans carefully (select both row_number() values and order by vc30 / vc65 respectively) Operation 2 of the first plan reports:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

while operation 2 of the second plan reports:

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

It’s not until you look at these two lines that the plans show any differences – operations 1 and 2 simply say “Window Sort” without giving any clue about which window sort is for which over() clause. The projection information, though, tells you which way around the over() clauses operated – in the first query the over(order by vc65) is applied to the result of the tablescan first, while in the second query it’s the over(order by vc30) that is applied first.

Lagniappe 2

There’s another little oddity you might spot when you look at the projection information and think about the WiF sort costs from the unordered query. The columns passed from operation 3 to operation 2 are:

 3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The columns passed from operation 2 to operation 1 are one of:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

Operation 2 sorts the data from operation 3, and operation 1 sorts the data from operation 2 – but the columns arriving from operation 2 have an extra row_number()[22] added to them. So when you look in the trace file at the two Wif Sort calculations why do they both show:

Blocks to Sort: 150 Row size: 122 Total Rows: 10000

Shouldn’t one of them show a Row size that’s (at least) 22 longer than the other ?

 

 

 

 

 

 

June 1, 2020

Order By

Filed under: Bugs,CBO,Indexing,Oracle — Jonathan Lewis @ 1:05 pm BST Jun 1,2020

This is a brief note with an odd history – and the history is more significant than the note.

While searching my library for an example of an odd costing effect for the “order by” clause I discovered a script that looked as if I’d written for 11.1.0.6 in 2008 to demonstrate a redundant sort operation appearing in an execution plan; and then I discovered a second script written for 11.2.0.4 in 2014 demonstrating a variant of the same thing (presumably because I’d not found the original script in 2014) and the second script referenced a MOS bug number

Bug 18701129 : SORT ORDER BY ISN’T AVOIDED WHEN ROWID IS ADDED TO ORDER BY CLAUSE

Whenever I “discover” an old bug test I tend to re-run it to check whether or not the bug has been fixed.  So that’s what I did, and found that the anomaly was still present in 19.3.0.0. The really odd thing, though, was that the bug note no longer existed – and even after I’d done a few searches involving the text in the description I couldn’t manage to find it!

For the record, here’s the original 2008 script (with a couple of minor edits)


rem
rem     Script:         order_by_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2008
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0        Still sorting
rem             12.2.0.1
rem             11.1.0.6
rem

set linesize 180
set pagesize 60

create table test 
as 
select  * 
from    all_objects 
where   rownum <= 10000 -- >  comment to avoid wordpress format issue
;

alter table test modify object_name not null;
create index i_test_1 on test(object_name);

analyze table test compute statistics;

set serveroutput off
alter session set statistics_level = all;

select  * 
from    (select * from test order by object_name) 
where 
        rownum < 11 -- > comment to avoid wordpress format issue
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));



select  * 
from    (select /*+ index(test) */ * from test order by object_name,rowid) 
where
        rownum < 11 -- > comment to avoid wordpress format issue
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on

Yes, that is an analyze command – it’s a pretty old script and I must have been a bit lazy about writing it. (Or, possibly, it’s a script from an Oracle-l or Oracle forum posting and I hadn’t re-engineered it.)

I’ve run two queries – the first uses an inline view to impose an order on some data and then selects the first 10 rows. The second query does nearly the same thing but adds an extra column to the “order by” clause – except it’s not a real column it’s the rowid pseudo-column. Conveniently there’s an index on the table that is a perfect match for the “order by” clause and it’s on a non-null column so the optimizer can walk the index in order and stop after 10 rows.

Adding the rowid to the “order by” clause shouldn’t make any difference to the plan as the index Oracle is using is a single column non-unique index, which means that the internal representation makes it a two-column index where the rowid is (quite literally) stored as the second column. But here are the two execution plans:


----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |      1 |        |     10 |00:00:00.01 |       7 |
|*  1 |  COUNT STOPKEY                |          |      1 |        |     10 |00:00:00.01 |       7 |
|   2 |   VIEW                        |          |      1 |     10 |     10 |00:00:00.01 |       7 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST     |      1 |  10000 |     10 |00:00:00.01 |       7 |
|   4 |     INDEX FULL SCAN           | I_TEST_1 |      1 |     10 |     10 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<11)



----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |          |      1 |        |     10 |00:00:00.01 |    4717 |       |       |          |
|*  1 |  COUNT STOPKEY                         |          |      1 |        |     10 |00:00:00.01 |    4717 |       |       |          |
|   2 |   VIEW                                 |          |      1 |  10000 |     10 |00:00:00.01 |    4717 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY               |          |      1 |  10000 |     10 |00:00:00.01 |    4717 |  4096 |  4096 | 4096  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| TEST     |      1 |  10000 |  10000 |00:00:00.01 |    4717 |       |       |          |
|   5 |      INDEX FULL SCAN                   | I_TEST_1 |      1 |  10000 |  10000 |00:00:00.01 |      44 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<11)
   3 - filter(ROWNUM<11)


When I add the rowid to the “order by” clause the optimizer no longer sees walking the index as an option for avoiding work; it wants to collect all the rows from the table, sort them, and then report the first 10. In fact walking the index became such an expensive option that I had to hint the index usage (hence the non-null declaration) to make the optimizer choose it, the default plan for 19.3 was a full tablescan and sort.

It’s just a little example of an edge case, of course. It’s a pity that the code doesn’t recognise the rowid as (effectively) a no-op addition to the ordering when the rest of the “order by” clause matches the index declaration, but in those circumstances the rowid needn’t be there at all and you wouldn’t expect anyone to include it.

As I said at the start – the interesting thing about this behaviour is that it was once described in a bug note that has since disappeared from public view.

 

February 17, 2020

Join Elimination bug

Filed under: Bugs,CBO,Execution plans,Oracle — Jonathan Lewis @ 3:37 pm GMT Feb 17,2020

It is possible to take subquery factoring (common table expressions / CTEs) too far. The most important purpose of factoring is to make a complex query easier to understand – especially if you can identify a messy piece of text that is used in more than one part of the query – but I have seen a couple of patterns appearing that make the SQL harder to read.

  • In one pattern each table is given its own factored subquery holding the non-join predicates (and, possibly, filter subqueries) and then the main query is just a simple join of the factored subqueries with nothing but join (ON) predicates.
  • In another pattern each factored subquery consists of the previous subquery with one more table added to it, so every subquery is no more that a “two-table” query block and the final subquery is a simple “select from last_factored_subquery”.

Neither of these patterns is helpful – but today’s blog note is not going to be about going to extremes with subquery factoring; instead it’s an example of a fairly reasonable use of subquery factoring that ran into a “wrong results” bug.

Consider a system that collects data from some type of meters. Here’s a two-table definition for meters and meter readings:


create table meters (
        meter_id        number,
        meter_type      varchar2(10),  -- references meter_types
        date_installed  date,
        padding         varchar2(100),
        constraint met_pk primary key(meter_id)
)
;

create table meter_readings (
        meter_id        number,
        date_read       date,
        reading         number(10,3),
        padding         varchar2(100),
        constraint      mrd_pk primary key(meter_id, date_read),
        constraint      mrd_fk_met foreign key (meter_id) references meters
)
;

insert into meters
select
        1e6 + rownum,
        case mod(rownum,3)
                when 0 then 'A'
                when 1 then 'B'
                       else 'C'
        end,
        trunc(sysdate) - mod(rownum,5),
        rpad('x',100,'x')
from
        dual 
connect by 
        level <= 10
;

execute dbms_stats.gather_table_stats(null,'meters')

insert into meter_readings 
select
        met.meter_id,
        met.date_installed - v.id + 2,
        dbms_random.value,
        rpad('x',100,'x')
from
        meters met,
        (select rownum id from dual connect by level <= 4) v
;

commit;

execute dbms_stats.gather_table_stats(null,'meter_readings')

I’ve picked the obvious primary keys for the two tables and defined the appropriate referential integrity constraint – which means the optimzer should be able to choose the best possible strategies for any query that joins the two tables.

I’ve created a very small data set – a few meters installed in the last few days, and a few readings per meters over the last few days. So lets report the readings for the last 48 hours, and include in the output any meters that haven’t logged a reading in that interval.

Here’s the query I wrote, with its output, running on a 19.3 instance on 17th Feb 2020:


with mrd_cte as (
        select 
                meter_id, date_read, reading
        from 
                meter_readings
        where 
                date_read in (trunc(sysdate), trunc(sysdate)+1)
)
select
        met.meter_id, met.date_installed, mrd_cte.date_read, reading
from
        meters met
left join 
        mrd_cte
on      mrd_cte.meter_id = met.meter_id
;

  METER_ID DATE_INST DATE_READ    READING
---------- --------- --------- ----------
   1000001 16-FEB-20 17-FEB-20       .063
   1000002 15-FEB-20
   1000003 14-FEB-20
   1000004 13-FEB-20
   1000005 17-FEB-20 18-FEB-20        .37
   1000005 17-FEB-20 17-FEB-20       .824
   1000006 16-FEB-20 17-FEB-20       .069
   1000007 15-FEB-20
   1000008 14-FEB-20
   1000009 13-FEB-20
   1000010 17-FEB-20 17-FEB-20       .161
   1000010 17-FEB-20 18-FEB-20       .818

12 rows selected.

The query returns 12 rows – which SQL*Plus can report because it counts them as it fetches them so it can give you the total at the end of the query.

Of course, sometimes people write preliminary queries to find out how big the result set would be before they run the query to acquire the result set itself. In cases like that (where they’re just going to select a “count(*)” the optimizer may a choose different execution path from the base query – perhaps finding a way to do an index-only execution, and maybe eliminating a few table joins from the query. So let’s execute a count of the above query:


rem
rem     Script:         join_elimination_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

with mrd_cte as (
        select 
                meter_id, date_read, reading
        from 
                meter_readings
        where 
                date_read in (trunc(sysdate), trunc(sysdate)+1)
)
select count(*) from (
        select
                met.meter_id, met.date_installed, mrd_cte.date_read, mrd_cte.reading
        from
                meters met
        left join 
                mrd_cte
        on      mrd_cte.meter_id = met.meter_id
)
;

  COUNT(*)
----------
        10

1 row selected.

You’ll have to take my word for it, of course, but no-one else was using this database while I was running this test, and no-one else has access to the schema I was using anyway. Unfortunately when I count the 12 rows instead of reporting them Oracle thinks there are only 10 rows. Oops!

Step 1 in investigating the problem – check the execution plans to see if there are any differences in the structure of the plan, the use of predicates, or the outline information. I won’t bother with the plan for the base query because it was very obvious from the count query where the problem lay.


-------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |       |     1 (100)|          |
|   1 |  SORT AGGREGATE  |        |     1 |            |          |
|   2 |   INDEX FULL SCAN| MET_PK |    10 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$69B21C86")
      ELIMINATE_JOIN(@"SEL$00F67CF8" "METER_READINGS"@"SEL$1")
      OUTLINE(@"SEL$00F67CF8")
      MERGE(@"SEL$C43CA2CA" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$C43CA2CA")
      MERGE(@"SEL$D28F6BD4" >"SEL$E6E74641")
      OUTLINE(@"SEL$E6E74641")
      ANSI_REARCH(@"SEL$4")
      OUTLINE(@"SEL$D28F6BD4")
      MERGE(@"SEL$1" >"SEL$006708EA")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$006708EA")
      ANSI_REARCH(@"SEL$3")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$3")
      INDEX(@"SEL$69B21C86" "MET"@"SEL$3" ("METERS"."METER_ID"))
      END_OUTLINE_DATA
  */

This is the plan as pulled from memory by a call to dbms_xplan.display_cursor(). We note particularly the following: meter_readings doesn’t appear in the plan, there is no predicate section (and no asterisks against any of the operations that would tell us there ought to be some predicate information), and there’s a very revealing ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) in the outline information.

For some reason the optimizer has decided that it’s okay to remove meter_readings from the query (even though there may be many meter readings for each meter), so it was inevitable that it produced the wrong result.

Despite my opening note, this is not an issue with subquery factoring – it just looked that way when I started poking at the problem. In fact, if you rewrite the query using an inline view you get the same error, if you turn the inline view into a stored view you get the error, and if you turn the whole query into a simple (left) join with the date predicate as part of the ON clause you still get the error.

The problem lies somewhere in the join elimination transformation. If you go back to the outline information from the bad plan you’ll see the line: ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) – by changing this to NO_ELIMINATE_JOIN(…) and adding it to the main query block I got a suitable plan joining the two tables and producing the right result.

The problem appears in 12.2.0.1 and 19.5.0.0 (tested on livesql) – but does not appear in 12.1.0.2 or 11.2.0.4

There is a known bug associated with this problem:

Bug: 29182901
Abstract: WRONG COUNT WAS RETURNED WHEN _OPTIMIZER_ENHANCED_JOIN_ELIMINATION=TRUE
Query with Outer Join Returned a Wrong Result due to Join Elimination (Doc ID 29182901.8)

The bug is fixed in 20.1, with a fix that has been backported into the Jan 2020 patches for 19, 18, and 12.1

December 30, 2019

Scalar Subq Bug

Filed under: CBO,dbms_xplan,Execution plans,Oracle,subqueries — Jonathan Lewis @ 9:30 am GMT Dec 30,2019

This is an observation that came up on the Oracle Developer Forum a couple of days ago, starting life as the fairly common problem:

I have a “select” that runs quickly  but when I use in a “create as select” it runs very slowly.

In many cases this simply means that the query was a distributed query and the plan changed because the driving site changed from the remote to the local server. There are a couple of other reasons, but distributed DML is the one most commonly seen.

In this example, though, the query was not a distributed query, it was a fully local query. There were three features to the query that were possibly suspect, though:

  • “ANSI” syntax
  • scalar subqueries in the select list
  • redundant “order by” clauses in inline views

The OP had supplied the (horrible) SQL in a text format along with images from the Enterprise Manager SQL Monitor screen showing the two execution plans and two things were  obvious from the plans – first that the simple select had eliminated the scalar subqueries (which were redundant) while the CTAS had kept them in the plan, and secondly most of the elapsed time for the CTAS was spent in lots of executions of the scalar subqueries.

My first thought was that the problem was probably a quirk of how the optimizer translates “ANSI” SQL to Oracle-standard SQL, so I created a model that captured the key features of the problem – starting with 3 tables:

rem
rem     Script:         ctas_scalar_subq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

create table t1 as
select * from all_objects
where rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2 as
select * from t1
;

alter table t2 add constraint t2_pk primary key(object_id);

create table t3 as
select * from all_objects
where rownum <= 500 -- > comment to avoid wordpress format issue
;

alter table t3 add constraint t3_pk primary key(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’m going to use the small t3 table as the target for a simple scalar subquery in the select list of a query that selects some columns from t2; then I’m going to use that query as an inline view in a join to t1 and select some columns from the result. Here’s the starting query that’s going to become an inline view:


select 
        t2.*,
        (
        select  t3.object_type 
        from    t3 
        where   t3.object_id = t2.object_id
        )       t3_type
from
        t2
order by
        t2.object_id
;

And here’s how I join the result to t1:


explain plan for
        select
                v2.*
        from    (
                select
                        t1.object_id,
                        t1.object_name  t1_name,
                        v1.object_name  t2_name,
                        t1.object_type  t1_type,
                        v1.object_type  t2_type
                from
                        t1
                join (
                        select 
                                t2.*,
                                (
                                select  t3.object_type 
                                from    t3 
                                where   t3.object_id = t2.object_id
                                )       t3_type
                        from
                                t2
                        order by
                                t2.object_id
                )       v1
                on
                        v1.object_id = t1.object_id
                and     v1.object_type = 'TABLE'
                )       v2
;

select * from table(dbms_xplan.display(null,null,'outline alias'));

The initial t2 query becomes an inline view called v1, and that becomes the second table in a join with t1. I’ve got the table and view in this order because initially the OP had an outer (left) join preserving t1 and I thought that that might be significant, but it turned out that it wasn’t.

Having joined t1 and v1 I’ve selected a small number of columns from the t1 and t2 tables and ignored the column that was generated by the inline scalar subquery. (This may seem a little stupid – but the same problem appears when the inline view is replaced with a stored view, which is a more realistic possibility.) Here’s the resulting execution plan (taken from 11.2.0.4 in this case):


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   476 | 31416 |    45  (12)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   476 | 31416 |    45  (12)| 00:00:01 |
|   2 |   VIEW               |      |   476 | 15708 |    23  (14)| 00:00:01 |
|   3 |    SORT ORDER BY     |      |   476 | 41888 |    23  (14)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |   476 | 41888 |    22  (10)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   | 10000 |   322K|    21   (5)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   4 - filter("T2"."OBJECT_TYPE"='TABLE')

I was a little surprised by this plan as I had expected the optimizer to eliminate the in-line “order by” in view v1 – but even when I changed the code to traditional Oracle join syntax the redundant and wasteful sort at operaton 3 still took place. (You might note that the data will be reported in an order dictated by the order of the data arriving from the t1 tablescan thanks to the mechanism of the hash join, so the sort is a total waste of effort.)

The plus point, of course, is that the optimizer had been smart enough to eliminate the scalar subquery referencing t3. The value returned from t3 is not needed anywhere in the course of the execution, so it simply disappears.

Now we change from a simple select to a Create as Select which I’ve run, with rowsource execution stats enabled, using Oracle 19.3 for this output:

set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

create table t4 as
        select  
                v2.*
        from    (
                select
                        t1.object_id,
                        t1.object_name  t1_name,
                        v1.object_name  t2_name,
                        t1.object_type  t1_type,
                        v1.object_type  t2_type
                from
                        t1
                join (
                        select 
                                t2.*,
                                (
                                select  t3.object_type 
                                from    t3 
                                where   t3.object_id = t2.object_id
                                )       t3_type
                        from
                                t2
                        order by 
                                t2.object_id
                )       v1
                on
                        v1.object_id = t1.object_id
                and     v1.object_type = 'TABLE'
                )       v2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;

And here’s the run-time execution plan – showing the critical error and statistics to prove that it really happened:

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Writes |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |       |      1 |        |      0 |00:00:00.01 |     471 |      3 |       |       |          |
|   1 |  LOAD AS SELECT                  | T4    |      1 |        |      0 |00:00:00.01 |     471 |      3 |  1042K|  1042K| 1042K (0)|
|   2 |   OPTIMIZER STATISTICS GATHERING |       |      1 |    435 |    294 |00:00:00.01 |     414 |      0 |   256K|   256K|  640K (0)|
|*  3 |    HASH JOIN                     |       |      1 |    435 |    294 |00:00:00.01 |     414 |      0 |  1265K|  1265K| 1375K (0)|
|   4 |     VIEW                         |       |      1 |    435 |    294 |00:00:00.01 |     234 |      0 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID | T3    |    294 |      1 |     50 |00:00:00.01 |      54 |      0 |       |       |          |
|*  6 |       INDEX UNIQUE SCAN          | T3_PK |    294 |      1 |     50 |00:00:00.01 |       4 |      0 |       |       |          |
|   7 |      SORT ORDER BY               |       |      1 |    435 |    294 |00:00:00.01 |     234 |      0 | 80896 | 80896 |71680  (0)|
|*  8 |       TABLE ACCESS FULL          | T2    |      1 |    435 |    294 |00:00:00.01 |     180 |      0 |       |       |          |
|   9 |     TABLE ACCESS FULL            | T1    |      1 |  10000 |  10000 |00:00:00.01 |     180 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   6 - access("T3"."OBJECT_ID"=:B1)
   8 - filter("T2"."OBJECT_TYPE"='TABLE')

You’ll notice that the VIEW at operation 4 reports the inline scalar subquery as operations 5 and 6, and the Starts column show that the scalar subquery executes 294 times – which is the number of rows returned by the scan of table t2. Although my first thought was that this was an artefact of the transformation from ANSI to Oracle syntax it turned out that when I modified the two statements to use traditional Oracle syntax the same difference appeared. Finally I re-ran the CTAS after removing the order by clause in the in-line view and the redundant subquery disappeared from the execution plan.

Tiny Geek bit

It’s not immediately obvious why there should be such a difference between the select and the CTAS in this case, but the 10053 trace files do give a couple of tiny clues the CTAS trace file includes the lines:

ORE: bypassed - Top query block of a DML.
TE: Bypassed: Top query block of a DML.
SQT:    SQT bypassed: in a transaction.

The first two suggest that we should expect some cases where DML statement optimise differently from simple queries. The last one is a further indication that differences may appear. (SQT – might this be subquery transformation, it doesn’t appear in the list of abbreviations in the trace file).

Unfortunately the SELECT trace file also included the line:


SQT:     SQT bypassed: Disabled by parameter.

So “SQT” – whatever that is – being in or out of a transaction may not have anything to do with the difference.

Summary

There are cases where optimising a select statement is not sufficient as a strategy for optimising a CTAS statement. In this case it looks as if an inline view which was non-mergable (thanks to a redundant order by clause) produced the unexpected side-effect that a completely redundant scalar subquery in the select list of the inline view was executed during the CTAS even though it was transformed out of existence for the simple select.

There are some unexpected performance threats in “cut-and-paste” coding and in re-using stored views if you haven’t checked carefully what they do and how they’re supposed to be used.

 

 

October 30, 2019

Strange Estimates.

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 1:10 pm GMT Oct 30,2019

A question came up on the Oracle-L list server a couple of days ago expressing some surprise at the following execution plan:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |       |       |   845 (100)|          |
|   1 |  SORT AGGREGATE                      |                 |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| ANY_TABLE       | 84827 |   579K|   845   (1)| 00:00:01 |
|   3 |    SORT CLUSTER BY ROWID             |                 | 68418 |       |    76   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                 | ANY_INDEX       | 68418 |       |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("X"."ANY_COLUMN1"='J')
   4 - access("X"."ANY_COLUMN2"=89155)

You’ll notice that this is a very simple query accessing a table by index, yet the estimated table rows found exceeds the estimated number of index entries used to probe the table. How can this happen. The answer (most frequently) is that there’s a mismatch between the table (or, more commonly, column) statistics and the index statistics. This seems to happen very frequently when you start mixing partitioned tables with global (or globally partitioned) indexes but it can happen in very simple cases, especially since a call to gather_table_stats() with cascade set to true and using the auto_sample_size will take a small sample from the index while using a 100% “sample” from the table.

Here’s an example I engineered very quickly to demonstrate the point. There’s no particular reason for the choice of DML I’ve used on the data beyond a rough idea of setting up a percentage of nulls and deleting a non-uniform pattern of rows.


rem
rem     Script:         table_index_mismatch.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,1000)                n1,
        mod(rownum,1000)                n2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1, for columns (n1,n2) size 1'
        );
end;
/

create index t1_i1 on t1(n1);

delete from t1 where mod(trunc(sqrt(n1)),7) = 0;
update t1 set n1 = null where mod(n1,10) = 0;
delete from t1 where mod(n1,10) = trunc(dbms_random.value(0,10));

execute dbms_stats.gather_table_stats(user,'t1',estimate_percent=>1)
execute dbms_stats.gather_index_stats(null,'t1_i1',estimate_percent=> 0.01)

Of course you’re not supposed to collect stats with arbitrary samples in any recent version of Oracle, so going for a 1% and 0.01% sample seems a little daft but I’m just doing that to demonstrate the problem with a very small data set.

After generating the data and gathering the stats I ran a few queries to pick out some critical numbers.


select
        table_name, sample_size, num_rows
from
        user_tables
where
        table_name = 'T1'
/

select 
        index_name, sample_size, num_rows, distinct_keys
from
        user_indexes
where
        table_name = 'T1'
and     index_name = 'T1_I1'
/

select
        column_name, sample_size, num_nulls, num_distinct
from
        user_tab_cols
where
        table_name = 'T1'
and     (
            column_name = 'N1'
         or virtual_column = 'YES'
        )
order by
        column_name
/

You’ll notice that I’ve only picked one of my original columns and any virtual columns. My gather_table_stats() call had a method_opt that included the creation of extended stats for the column group (n1, n2) and I want to report the stats on the resulting virtual column.


TABLE_NAME           SAMPLE_SIZE   NUM_ROWS
-------------------- ----------- ----------
T1                          7865     786500


INDEX_NAME           SAMPLE_SIZE   NUM_ROWS DISTINCT_KEYS
-------------------- ----------- ---------- -------------
T1_I1                     385779     713292           714


COLUMN_NAME                      SAMPLE_SIZE  NUM_NULLS NUM_DISTINCT
-------------------------------- ----------- ---------- ------------
N1                                      7012      85300          771
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS          7865          0          855

A couple of observations on the stats

  • the table sample size is, as expected, 1% of the reported num_rows (the actual count is 778,154).
  • The index sample size is much bigger than expected – but that’s probably related to the normal “select 1,100 leaf blocks strategy”. Because of the skew in the pattern of deleted values it’s possible for the sample size in this model to vary between 694,154 and something in the region of 380,000.
  • The n1 sample size is about 10% smaller than the table sample size – but that’s because I set 10% of the column to null.
  • The column group sample size matches the table sample size because column group hash values are never null, even if an underlying column is null.

So let’s check the execution plan for a very simple query:


set autotrace on explain
select id from t1 where n1 = 140 and n2 = 140;
set autotrace off


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   920 | 11960 |   918   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   920 | 11960 |   918   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |   909 |       |     5   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=140)
   2 - access("N1"=140)

The estimate for relevant index rowids is smaller than the estimate for the number of table rows! The difference is not as extreme as the case reported on Oracle-l, but I’m only trying to demonstrate a principle, not reproduce the exact results.

There are several ways in which contradictory results like this can appear – but in this case we can see the following:

  • For the table access table.num_rows/column.num_distinct = 786,500 / 855 = 919.88  (using the column group num_distinct)
  • For the index range scan: (table.num_rows – column.num_nulls) / column.num_distinct = (786500 – 85300) / 771 = 909.47 (using the n1 statistics)

So the change in strategy as it becomes possible for the optimizer to take advantage of the column group means the index and table have been using incompatible sets of stats (in particular there’s that loss of information about NULLs) as their cardinalities are calculated. The question, then, is “how much is that likely to matter”, and the follow-up if it can matter is “in what circumstancs could the effect be large enough to cause problems”. But that’s a topic for another day.

Update / Footnote

In the case of the Oracle-l example, there was no column group, and in some cases the optimizer would produce a plan where the table estimate was much smaller than the index estimate, and in other cases (like the opening plan above) the table estimate was signficantly greater than the index estimate. This was a side effect of adaptive statistics: the low table estimate was due to the basic “multiply separate selectivities”; but the with adaptive statistics enabled Oracle started sampling the table to check the correlation between the two tables, and then produced an SQL Plan Directive to do so and got to the higher (and correct) result.

 

 

October 23, 2019

Clustering_Factor

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 9:56 pm BST Oct 23,2019

A few days ago I published a little note of a script I wrote some time ago to estimate the clustering_factor of an index before it had been built. At the time I pointed out that one of its limitations was that it would not handle cases where you were planning to set the table_cached_blocks preference, but a couple of days later I decided that I’d write another version of the code that would cater for the new feature – and that’s how I made an embarrassing discovery.

Having reviewed a number of notes I’ve published about the table_cached_blocks preference and its impact on the clustering_factor I’ve realised the what I’ve written has always been open to two interpretations – the one that I had in mind as I was writing, and the correct one.  I made this discovery because I had written a simple SQL statement – using the match_recognize() mechanism – to do what I considered to  be the appropriate calculation. After testing the query with a few sets of sample data that produced the correct results I emailed Stew Ashton (my “go-to” person for match_recognize() questions) asking if he would do a sanity check on the code because it was rather slow and I wondered if there was a better way of writing it.

His reply was roughly:

“I’ve read the notes you and Richard Foote have written about the clustering_factor and table_cached_blocks, and this isn’t doing what your description says it should.”

Then he explained what he had inferred from what I had written … and it made more sense than what I had been thinking when I wrote it. He also supplied some code to implement his interpretation – so I designed a couple of data models that would produce the wrong prediction for whichever piece of code implemented the wrong interpretation. His code gave the right answers, mine didn’t.

So here’s the difference in interpretation – the wrong one first – using 16 as a discussion value for the table_cached_blocks:

  • WRONG interpretation:  As you walk through index entries in order remember the last 16 rowids (that’s rowid for the rows in the table that the index is pointing to) you’ve seen. If the current rowid has a block id component that doesn’t match the block id from one of the remembered 16 rowids then increment the counter for the clustering_factor.
    • The simplicity of this algorithm means you can fix a “circular” array of 16 entries and keep walking around the circle overwriting the oldest entry each time you read a new one. It’s a pity that it’s the wrong idea because there’s a simple (though massively CPU -intensive match_recognize() strategy for implementing it – and if you were using an internal library mechanism during a proper gather_index_stats() it could be incredibly efficient.
  • RIGHT interpretation: set up an array for 16 block ids, each with an associated “row-number”. Walk through the index in order – giving each entry a row-number as you go. Extract the block id from the current entry and search through the array for a matching block id.  If you find a match then update its entry with the current row-number (so you can remembr how recently you saw the block id); if you don’t find a match then replace the entry that has the smallest (i.e. greatest distance into the past) row-number with the current block id and row-number and increment the counter for the clustering_factor.

The first piece of code that Stew Ashton sent me was an anonymous PL/SQL block that included some hard-coded fragments and embedded SQL to use a test table and index that I had defined, but he then sent a second piece of code that creates a generic function that uses dynamic SQL to construct a query against a table and an index definition that you want to test. The latter is the code I’ve published (with permission) below:


create or replace function predict_clustering_factor(
/*
Function to predict the clustering factor of an index,
taking into account the intended value of
the TABLE_CACHED_BLOCKS parameter of DBMS_STATS.SET_TABLE_PREFS.

Input is the table name, the list of column names
and the intended value of TABLE_CACHED_BLOCKS.

The function collects the last N block ids (not the last N entries).
When there is no more room, it increments the clustering factor
and replaces the least recently used block id with the current one.

Note: here a "block id" is a rowid with the row_number portion set to 0.
It is effectively a "truncated" rowid.
*/
  p_table_name in varchar2,
  p_column_list in varchar2,
  p_table_cached_blocks in number
) return number authid current_user is

  rc sys_refcursor;
  type tt_rids is table of rowid;
  lt_rids tt_rids;
  
  type t_block_list is record(
    rid rowid,
    last_hit number
  );

  type tt_block_list is table of t_block_list;
  lt_block_list tt_block_list := new tt_block_list();

  l_rid rowid;
  l_clustering_factor number := 0;
  b_block_found boolean;
  l_rn number := 0;
  l_oldest_hit number;
  i_oldest_hit binary_integer := 0;
  
  function truncated_rid(p_rid in rowid) return rowid is
    rowid_type number;
    object_number NUMBER;
    relative_fno NUMBER;
    block_number NUMBER;
    row_number NUMBER;
    rid rowid;

  begin

    DBMS_ROWID.ROWID_INFO (
      p_rid,
      rowid_type,
      object_number,
      relative_fno,
      block_number,
      row_number
    );

    rid := DBMS_ROWID.ROWID_CREATE (
      rowid_type,
      object_number,
      relative_fno,
      block_number,
      0
    );

    return rid;

  end truncated_rid;
  
begin
  if p_table_cached_blocks != trunc(p_table_cached_blocks)
  or p_table_cached_blocks not between 1 and 255 then
    raise_application_error(
      -20001, 
      'input parameter p_table_cached_blocks must be an integer between 1 and 255'
    );
  end if;

  open rc for 'select rowid from '||p_table_name||' order by '||p_column_list||', rowid';
  loop
    fetch rc bulk collect into lt_rids limit 1000;

    for irid in 1..lt_rids.count loop
      l_rn := l_rn + 1;
      l_rid := truncated_rid(lt_rids(irid));
      b_block_found := false;
      l_oldest_hit := l_rn;

      if l_rn = 1 then
        l_clustering_factor := l_clustering_factor + 1;
        lt_block_list.extend;
        lt_block_list(1).rid := l_rid;
        lt_block_list(1).last_hit := l_rn;

      else

        for i in 1..lt_block_list.count loop
          if l_oldest_hit > lt_block_list(i).last_hit then
            l_oldest_hit := lt_block_list(i).last_hit;
            i_oldest_hit := i;
          end if;
          if lt_block_list(i).rid = l_rid then
            b_block_found := true;
            lt_block_list(i).last_hit := l_rn;
            exit;
          end if;
        end loop;

        if not b_block_found then
          l_clustering_factor := l_clustering_factor + 1;
          if lt_block_list.count < p_table_cached_blocks then
            lt_block_list.extend;
            lt_block_list(lt_block_list.count).rid := l_rid;
            lt_block_list(lt_block_list.count).last_hit := l_rn; 
          else         
            lt_block_list(i_oldest_hit).rid := l_rid;
            lt_block_list(i_oldest_hit).last_hit := l_rn;
          end if;
        end if;

      end if;

    end loop;
    exit when rc%notfound;
  end loop;

  close rc;
  return l_clustering_factor;

exception when others then
  if rc%isopen then
    close rc;
  end if;
  raise;

end predict_clustering_factor;
/

After executing the above to create the function, here’s an example of usage:

rem
rem     Script:         clustering_factor_est_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2019
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        cast(rownum as varchar2(10))            v1,
        trunc(dbms_random.value(0,10000))       rand,
        rpad('x',100,'x')                       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
/

-- -------------------------------------------------------------------

SQL> execute dbms_output.put_line('Predicted cf for t1(rand, id): ' || predict_clustering_factor('t1','rand, id',16))
Predicted cf for t1(rand, id): 997218
Elapsed: 00:00:07.54

SQL> execute dbms_output.put_line('Predicted cf for t1(rand, id): ' || predict_clustering_factor('t1','rand, id',255))
Predicted cf for t1(rand, id): 985607
Elapsed: 00:00:50.61

You’ll notice that the larger the setting for the “table_cached_blocks” parameter the more time it takes to predict the clustering_factor – and it was all CPU time in my example. This isn;t surprising given the need to search through an array holding the previous history. In this example the table t1 holds 1,000,000 rows, and the number and scatter of distinct values is so arranged that the code will hardly ever find a cached block id – essentially it’s the sort of index that isn’t going to cause much of confusion to the optimizer and isn’t likely to need special attention to make the optimizer use it when it should and ignore it when it’s inappropriate.

Finally a cut-n-paste to show the accuracy of the two predictions:

SQL> create index t1_i on t1(rand, id);
Elapsed: 00:00:02.96

SQL> execute dbms_stats.set_table_prefs(null,'t1','table_cached_blocks',16)
Elapsed: 00:00:00.01

SQL> execute dbms_stats.gather_index_stats(null,'t1_i')
Elapsed: 00:00:09.55

SQL> select clustering_factor from user_indexes where index_name = 'T1_I';

CLUSTERING_FACTOR
-----------------
           997218

Elapsed: 00:00:00.11

SQL> execute dbms_stats.set_table_prefs(null,'t1','table_cached_blocks',255)
Elapsed: 00:00:00.01

SQL> execute dbms_stats.gather_index_stats(null,'t1_i')
Elapsed: 00:00:07.80

SQL> select clustering_factor from user_indexes where index_name = 'T1_I';

CLUSTERING_FACTOR
-----------------
           985607

Elapsed: 00:00:00.00

Both match perfectly – but you might notice that creating the index and gathering the stats was much faster than predicting the clustering factor for the case where we set table_cached_blocks = 255.

(If you’re wondering, my “simple but irrelevant” match_recognize() query took 370 CPU second to complete for table_cached_blocks = 200 – and a limit on march_recognize() meant that 200 was the maximum value I was allowed to use – so now you know why I emailed Stew Ashton (and just for lagniappe. he also told me about a simple workaround for the 200 limit)).

 

 

Next Page »

Powered by WordPress.com.