Oracle Scratchpad

November 12, 2020

rowsets

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:35 pm GMT Nov 12,2020

Here’s a little demonstration of the benefit of rowsets. It started with a very simple question that arrived in my in-box from Kaley Crum.

  • “Why does the query with the UNION ALL take so much longer than the query without the UNION ALL?”

Here are the two queries – surely they should take virtually the same amount of time.

select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me);


select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me
      --
      union all
      --
      select 1 from dual);

Look closely – the difference between the two queries is just the addition through a UNION ALL of “select 1 from dual”. The first query took about 3.5 seconds to complete (there were 70M rows in the table), and the second took nearly 69 seconds.

Here are the execution plans showing the rowsource execution statistics (and Query Block // Object Alias information) – first the baseline query:

QL_ID  0ph1vfuuxkbqb, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me)
 
Plan hash value: 3609429292
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        | 26465 (100)|          |      1 |00:00:03.34 |   95896 |  95891 |
|   1 |  SORT AGGREGATE     |         |      1 |      1 |            |          |      1 |00:00:03.34 |   95896 |  95891 |
|   2 |   VIEW              |         |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.32 |   95896 |  95891 |
|   3 |    TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.27 |   95896 |  95891 |
--------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2 / from$_subquery$_001@SEL$1
   3 - SEL$2 / DROP_ME@SEL$2

And then the UNION ALL query:

SQL_ID  0chdajr28y0ub, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me       --       union all       --       select 1 from
dual)
 
Plan hash value: 3408528233
 
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |      1 |        | 26467 (100)|          |      1 |00:01:18.58 |   95896 |  95891 |
|   1 |  SORT AGGREGATE      |         |      1 |      1 |            |          |      1 |00:01:18.58 |   95896 |  95891 |
|   2 |   VIEW               |         |      1 |     70M| 26467   (2)| 00:00:02 |     70M|00:01:10.84 |   95896 |  95891 |
|   3 |    UNION-ALL         |         |      1 |        |            |          |     70M|00:00:53.13 |   95896 |  95891 |
|   4 |     TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:19.28 |   95896 |  95891 |
|   5 |     FAST DUAL        |         |      1 |      1 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SET$1 / from$_subquery$_001@SEL$1
   3 - SET$1
   4 - SEL$2 / DROP_ME@SEL$2
   5 - SEL$3 / DUAL@SEL$3

This is not a trick question – I’ve got the SQL to create the data set and run the test; and I’ve run the test through 19.3.0.0 and 12.2.0.1 with results very similar to the above. Here’s a slightly modified version of the script – you’ll notice the column name has changed because I’ve used my usual “large data set” generator rather than scaling up a clone of all_objects:

rem
rem     Script:         rowset_kaley.sql
rem     Author:         Kaley Crum / JP Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table drop_me
pctfree 0
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum  n1
from
        generator
cross join
        generator
where rownum <= 7e7
;

alter session set statistics_level = all;

set timing on
set serveroutput off

prompt  ===========================================
prompt  Baseline: Simple query without a union all.
prompt  Runs in 3-4 seconds
prompt  ===========================================

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

prompt  ==========================================
prompt  Add a UNION ALL for one record from dual.
Prompt  Runs in over a minute
prompt  ==========================================

pause Press return

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me
      --
      union all
      --
      select 1 from dual);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

There are two factors involved in the massive variation in performance. The first factor is one that the test case will report – though I didn’t show it in the output above, the second is that the test case has enabled rowsource execution statistics.

Here’s the big clue – from the Column Projection Information, which is one of the things that appears with the “advanced” format option (or when you add the “projection” format option) in the call to dbms_xplan.display_xxx()

First for the fast query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)
   3 - (rowset=1019)

The rowsource passes 1,019 rows at a time from the tablescan operation to the view operation and from the view operation to the sort operation, for a total of roughly 70,000 calls for each of those two steps of the plan.

Compare this to the projection information for the slow UNION ALL query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

I haven’t missed a bit in copying – this really is it. The rows move up the stack one at a time, not in a rowset array. That’s 70,000,000 subroutime calls for each of the two steps of the plan.

I’m sure most of us have heard the mantra “row by row is slow by slow” (or some variant on the theme). This is true all the way down to the internal levels of an execution plan.

Of course 70M calls vs. 70K calls shouldn’t really explain a difference of more than 60 seconds – but when the calls to the O/S for timing information for rowsource execution stats are repeated for each call (or even if it’s only a 1% sample of the calls) down the stack that’s where a lot of time can disappear.

On my laptop, running Oracle 19.3 in a VM, this is how my timing went:

  • Fast query: 1.94 seconds, dropping to 1.79 seconds when I disabled rowsource execution stats.
  • Slow query: 65.91 seconds, dropping to a far less astounding 3.26 seconds when I disabled rowsource execution stats.

So there really is a noticeable difference between row-by-row and array-processing but most of the difference in the original test came from using rowsource execution statistics to measure how much of a difference there would be.

It’s also worth mentioning that this is probably the most extreme case you could produce to show the difference – using the largest possible rowset size with the smallest possible rows when you want to mazimise the gap between (internal) array processing and single row processing – and then processing a very large number of rows. [But isn’t ridiculously large numbers of rows what you do with Exadata?].

If you want further evidence that the difference is due to the rowset size you can always alter session set “_rowsets_enabled”=false; and watch the fast query slow down. It will take about half the time of the slow query as it only has to pass 70M rows up one step of the plan rather than the two steps that are in the UNION ALL plan.

In theory it looks as if you could also restrict the size of the rowset by setting _rowsets_max_rows or _rowsets_target_maxsize, but since the default value for the former is 256 the results above suggest that the settings might be ignored, and when I tried adjusting them at the session level nothing changed.

Another test you could run is to adjust (alter session) the parameter “_rowsource_statistics_sampfreq”, When I set this to 100 the time for the slow query dropped to about 21 seconds (and the fast query – with rowsets disabled – dropped to about 11 seconds).

Footnote

It’s always easy to hit an accidental special case without realising it, so when you’re testing something it’s important to think about what’s special in the example. I came up with two possibilities in this case – dual is always going to be a very special case in any circumstances, and I’ve got a table in my query that is guaranteed to return exactly one row. So I repeated the test with:

  • a real table with one row and primary key in case the reference to dual was disabling rowsets
  • a real table with 100 rows in case this enabled a rowset size of 100

In neither case did Oracle start using array processing.

Footnote 2

One of the patterns I see occasionally follows the form of the slow query above. Some front-end tool executes a select statement that starts with a select {list of desired column headings} from dual so that the front-end code doesn’t have to be programmed to create headings as it receives the data.

Although the impact this would have on rowsets looks like a large percentage of the work done in the database in this case, in more realistic cases it would probably be irrelevant, until someone starts testing with the statistics_level set to all, or _rowsource_execution_statistics explicitly enabled.

There is a related effect, though, from the SQL Monitor feature (which activates automatically for a statement if it’s executing in parallel, or if it’s predicted to take more than 5 seconds to complete). When I added the /*+ monitor */ hint to the two queries (and didn’t fiddle with the rowsets parameter) the fast query averaged 2.26 seconds instead of 1.79 seconds, and the slow query averaged 3.51 seconds instead of 3.26 seconds. I have to say, though, that the variation between executions of the same statement was quite significant compared the variation produced by adding or eliminating the hint.

November 9, 2020

I wish

Filed under: CBO,Execution plans,Oracle,sorting,Wishlist — Jonathan Lewis @ 12:01 pm GMT Nov 9,2020

Here’s a lovely little mechanism new to Postgres 13 that can minimise sorting costs: the “incremental sort”. It would be nice to see it in Oracle as well as it could make an enormous difference to “fetch first N” queries.

The concept is simple – if a rowsource moving up a plan is known to be in “partially sorted” order when it reaches a sort operation the optimizer can choose whether or not to sort the entire rowsource in one go or to sort it in batches as it arrives.

For example if you have a query which has “order by cola, colb” as its final clause and the plan has used an index range scan on an index on just (cola) then the “sort order by” operation can accept the rows for the first value of cola, sort them by colb and pass them on, then accept the rows for the second value of cola, sort them by colb and pass them on, and so on.

Better still, if you have an “order by table1.colA, table2,colB” and the optimizer used an indexed access path on table1.colA and a nested loop into table2, then the optimizer will still recognize that the generated data is already partially sorted, and sort batches for table1.colA to order them by table2.colB. (Oracle has a mechanism for dealing with sorted hash clusters that is roughly similar.)

Obviously the benefit is that you avoid doing a very large sort that might spill to disc; slightly less obviously is that you might avoid sorting the whole data set if you have a “fetch first N” query. Here’s an Oracle model setting up a demonstration of the principle:

rem
rem     Script:         fetch_first_postgres_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem

create table t1 as
with generator(id) as (
        select 1 from dual
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator
/

alter table t1 modify n1 not null;

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

create index t1_i1 on t1(n1);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

drop index t1_i1;
create index t1_i1 on t1(n1, n2);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

I’ve created a table with 100,000 rows where the value of n1 is repeated 10 times and well-clustered, while for each value of n1, the n2 column has 10 distinct values (not necessarily in order thanks to the mod(,13)). Using this data set I’ve executed the same query three times – selecting the rows for 6 consecutive values of n1, ordering by n1, n2.

The first test will have to do a tablescan, the second can use the index I’ve created on (n1) but will have to do a sort (after visiting the table), the third can walk the index I’ve created on (n1,n2) and complete without sorting.

Here, in order, are the three execution plans:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    70 |  4830 |   145   (9)| 00:00:01 |
|   1 |  SORT ORDER BY     |      |    70 |  4830 |   145   (9)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    70 |  4830 |   144   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"<=30 AND "N1">=25)


----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY                       |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    70 |  4830 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1">=25 AND "N1"<=30)


-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    70 |  4830 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    70 |  4830 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1">=25 AND "N1"<=30)

Having created the Oracle model I asked Phil Florent (who had prompted this note by emailing me with a question about why Postgres 13 was executing a “Fetch First” so much faster than Oracle 19 (the basic answer is at this URL) if he would run it under Postgres and send me the execution plans.

The code required two changes – the first to handle the change in dialect, the second to supply a change in scale because my model produced such a small output that Postgres didn’t bother to use the new feature. Here’s the modified SQL to generate the original test data:

create table t1 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)       small_vc,
        rpad('x',50,'x')        padding
from
        generator;

And the three plans (so that you can compare the content and style of output with Oracle plans) that went with this small data set – first the no-index plan, then the plan for the (n1) index, then the plan with (n1,n2) indexed:

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
Sort  (cost=2835.85..2836.00 rows=62 width=74) (actual time=19.534..19.600 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Seq Scan on t1  (cost=0.00..2834.00 rows=62 width=74) (actual time=0.067..19.417 rows=60 loops=1)
         Filter: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))
         Rows Removed by Filter: 99940

Planning Time: 0.351 ms
Execution Time: 19.703 ms


                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sort  (cost=11.50..11.66 rows=62 width=74) (actual time=0.224..0.289 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Index Scan using t1_i1 on t1  (cost=0.42..9.66 rows=62 width=74) (actual time=0.024..0.113 rows=60 loops=1)
         Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.665 ms
Execution Time: 0.391 ms



                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Scan using t1_i1 on t1  (cost=0.42..114.66 rows=62 width=74) (actual time=0.022..0.155 rows=60 loops=1)
   Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.690 ms
Execution Time: 0.259 ms
 

As you can see, there’s very little difference between Oracle’s plans and Postgres’ plans in this example. (Though it’s rather nice to see what extra details appear in the Postgres plans, which were generated with the equivalent of Oracle’s “statistics_level = all” setting.

The middle plan shows us that Postgres didn’t use the “incremental sort” – but it’s useful to see it anyway so that we can compare the structure of the plan when we increase the volume of data – which the following script is for:

create table t2 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 10000000
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator;
 
create index t2_i1 on t2(n1);

explain analyze
select  *
from    t2
where   n1 between 25000 and 30000
order by
        n1, n2
;

analyze t2;

Now we have 10M rows, still with 10 rows per value of n1, and our query requests 5,001 values of n1, so 50,010 rows in total. With that much data the optimizer decided to use the incremental sort rather than sorting the whole result set in one go:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Incremental Sort  (cost=0.49..4532.57 rows=52677 width=74) (actual time=0.160..164.125 rows=50010 loops=1)
   Sort Key: n1, n2
   Presorted Key: n1
   Full-sort Groups: 1251  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.055..61.663 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.197 ms
Execution Time: 204.490 ms

Note, particularly, the “Presorted Key” line leading to the “Sort Key” line. We can also check the plan without the incremental sort with a set command to disable the feature:

set enable_incremental_sort = off;

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=8812.64..8944.33 rows=52677 width=74) (actual time=181.487..227.735 rows=50010 loops=1)
   Sort Key: n1, n2
   Sort Method: external merge  Disk: 4128kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.070..74.975 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.239 ms
Execution Time: 266.325 ms

Comparing the times for these two queries there isn’t really much difference – but you will notice that the old approach has had to dump 4MB to disc while the incremental sorts gets the job done in 30Kb of memory, which may be an important difference in other circumstances.

More importantly than a tiny time difference in this trivial example is the fact that Phil Florent’s original question was:

“Why is this two-table join with ‘fetch first 5’ taking 28 seconds to complete in Oracle when it takes less than one millisecond to complete in Postgres?”

The answer was:

  • Oracle is doing a hash join that produces a 25M row result set, sorting it to do an analytic row_number() call to get the first 5 rows.
  • Postgres is doing a nested loop join, fetching the first row(s) from the first table in the right order by index range scan then fetching the matching rows from the second table – then doing an incremental sort on those rows; then fetching the next row(s) and repeating the process – stopping after it’s got through a total of 5 rows instead of generating and sorting 25M rows.

Here’s the plan (with obfuscated table names):

                                                    QUERY PLAN                                                    

-------------------------------------------------------------------------------------------------------------------

Limit  (cost=284.36..287.21 rows=5 width=10)
   ->  Incremental Sort  (cost=284.36..14261803.18 rows=25000000 width=10)
         Sort Key: table1.col1 DESC, table2.col2 DESC
         Presorted Key: table1.col1
         ->  Nested Loop  (cost=1.00..13294209.39 rows=25000000 width=10)
               ->  Index Scan using table1_col1_idx on table1  (cost=0.56..1300593.47 rows=25000000 width=8)
               ->  Index Scan using table2.pk_col on table2 (cost=0.43..0.48 rows=1 width=6)
                     Index Cond: (id = table1.id)

You’ll notice in this exanple that the incremental sort can take advantage of the optimizer’s knowledge of the index definitions whether the sort is ascending or descending.

You’ll also notice that Postgres has the same problem as Oracle when it comes to coping with Fetch First (or, in Oracle’s case, rownum <= N and optimizer_mode = first_rows_N). Even when it “knows” that a query is going to stop fetching data very early the plan still reports 25M rows as the expected volume of data.

Summary

Postgres 13 has a wonderful mechanism for optimising sorts that can make a huge difference to “first rows” queries and even basic “order by” clauses involving join query result sets.

Footnote

Browsing the Internet for documentation and comment on the incremental sort I found the following. Given my level of ignorance about Postgres I can’t comment on the completeness or correctness of the information, but it looked good to me, and I found its comments about this feature very informative.

October 10, 2020

Interval Oddity

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 2:51 pm BST Oct 10,2020

Interval partitioning is a popular strategy for partitioning date-based data. It’s an enhanced variant of range partitioning that allows you to define a starting partition and an interval that should be used to derive the high values for all subsequent partitions – and Oracle doesn’t even have to create intervening partitions if you insert data that goes far beyond the current partition, it automatically creates exactly the right partition (with the correct high_value and correctly inferred lower boundary) for the incoming data and behaves as if the intervening partitions will become available when they’re needed at some later point in time. So no need for DBAs to work out actual partition high_values, no need to ensure that all the partitions you need out into the future have been pre-created, no more batch processes crashing with Oracle error ORA-14400: inserted partition key does not map to any partition.

But there’s a surprising difference between traditional range partitioning and the newer interval partitioning that will increase CPU usage in some cases and may (though I haven’t yet investigated this in sufficient detail to create an example) lead to variations in execution plans.

To demonstrate the difference I’m going to create two tables with the same structure and content then run the same query against them and show you the resulting execution plans. Here’s the code to create and populate the tables:

rem
rem     Script:         interval_or_range.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem


create table t_interval(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
interval (numtoyminterval(1,'MONTH'))
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy'))
        )
;

create table t_range(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy')),
                partition start_p2 values less than (to_date('01-Feb-2020','dd-mon-yyyy')),
                partition start_p3 values less than (to_date('01-Mar-2020','dd-mon-yyyy')),
                partition start_p4 values less than (to_date('01-Apr-2020','dd-mon-yyyy')),
                partition start_p5 values less than (to_date('01-May-2020','dd-mon-yyyy')),
                partition start_p6 values less than (to_date('01-Jun-2020','dd-mon-yyyy')),
                partition start_p7 values less than (to_date('01-Jul-2020','dd-mon-yyyy')),
                partition start_p8 values less than (to_date('01-Aug-2020','dd-mon-yyyy')),
                partition start_p9 values less than (to_date('01-Sep-2020','dd-mon-yyyy'))
        )
;

insert into t_range select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue  
;

insert into t_interval select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_RANGE',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_INTERVAL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve used composite partitioned tables in this example but the same anomaly appears with simple partitioning – which you can test by commenting out the “subpartion by ..” lines. The two tables have partitions defined to hold a month’s data. I’ve inserted a few rows into most of the partitions, and ensured that I haven’t accidentally attempted to insert data that falls outside the legal range of the table with the predefined partitions .

To show that the tables are nominally identical here’s the list of partitions with their high values and number of rows:

column table_name       format a15
column partition_name   format a15
column high_value       format a80

break on table_name skip 1

select
        table_name, partition_name, num_rows, high_value
from
        user_tab_partitions
where
        table_name in ('T_INTERVAL','T_RANGE')
order by
        table_name, partition_name
;


ABLE_NAME      PARTITION_NAME    NUM_ROWS HIGH_VALUE
--------------- --------------- ---------- --------------------------------------------------------------------------------
T_INTERVAL      START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10722              30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10723              29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10724              31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10725              30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10726              31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10727              30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10728              31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10729              28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

T_RANGE         START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P2                30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P3                29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P4                31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P5                30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P6                31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P7                30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P8                31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P9                28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA


18 rows selected.

With this setup we can now run a simple query against the two tables using a where clause that is an exact match for the partition definition for the July data:

set serveroutput off

select 
        count(*) 
from 
        t_range 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

select 
        count(*) 
from 
        t_interval 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

I’ve left in two versions of the date predicates – the “ANSI” style, and a strictly formatted “to_char()” style that is still the approached used most commonly in Oracle systems. The effect on the execution plans is the same for both variations of the date declaration, and here are the two execution plans – reported from an instance of 19.3.0.0:

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |         |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|   3 |    TABLE ACCESS FULL    | T_RANGE |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
---------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|            |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|*  3 |    TABLE ACCESS FULL    | T_INTERVAL |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("ORDER_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

It’s quite possible that you won’t notice the difference these plans, even though I’ve put them directly one below the other; and it’s very likely that most people wouldn’t think about the difference if they didn’t have the two plans available at the same time to compare.

If you can’t spot the difference it’s because I’ve deliberately arranged them in a way that might fool you into not looking carefully enough.

If you’ve spotted the difference it’s probably because you’ve examined the Predicate Information section. There isn’t one for the the t_range example but there is one for the t_interval example – and it was a little sneaky of me to make it easy for you to assume that the one set of predicates I showed was common to the two plans.

Since the plans have been pulled from memory (v$sql_plan) it seems likely that they are truthful and the query based on the interval partitioning is actually checking every row in the partition against two date values. In my example that’s not going to make much difference to performance, but in a datawarehouse with millions of rows per partition there’s scope for a noticeable increase in CPU between the two queries – especially since the block reads are likely to be direct path, bypassing much of the usual (CPU -intensive) buffer cache activity.

Interestingly when I checked the 10053 trace files for the two queries the CPU cost for the pair was identical. Normally you expect to see at least a little CPU cost (even if it’s only tens of units out of millions) for checking a predicate. This raises the question – does the optimizer allow a cost for the t_range table for an event that isn’t going to happen, or is it reporting an event that isn’t going to happen for the t_interval table. [Edit: easy enough to check by doing the test on a simple heap table that clones the data from that one partition, and checking the calculated cost with and without the predicate]

There’s a further ramification to this anomaly, relating to the question on the Oracle-L list server that prompted the investigation. What happens on Exadata where the predicates can be offloaded to storage?

The original question wasn’t actually about the appearance (or not) of the predicates, it was about an unexpected cardinality estimate for a query involving two equi-partitioned tables, so this note has gone completely off-topic from the question; but another strange detail about the predicates showed up when I suggested the creation of a column group on the join columns. A (redundant) predicate that had been offloaded to storage stopped being offloaded; here, from the original posting with the original tables, are the two different Predicate Information sections that appeared – the first without the column group, the second after the column group had been created and its stats collected:

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Predicate Information (identified by operation id):
---------------------------------------------------
   4 -
access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Notice how the storage() predicate that appears at operation 8 in the first set has disappeared from the second even though operation 6 manages to retain its storage() predicate throughout.

I’ve modelled a further example of odd behaviour using two pairs of tables – one pair using range/hash partitioning the other interval/hash partitioning. I won’t reproduce it here but the problem of redundant predicates appearing and then disappearing gets even stranger.

I haven’t yet produced an example where the unexpected predicate behaviour has affected the costs or cardinalities – but I’ve only spent a couple of hours playing around with well-formed examples: it’s possible that in badly formed examples (e.g. with statistical inconsistencies) the side effect could mean that two notionally identical queries produce different numbers and different plans because they end up with different predicates in the final transformed query.

Footnote

Following an email from David Kurtz, it occurred to me that I should have made it clear that the disappearance of predicates on the partition key is expected behaviour when the predicates are clearly synchronised with the partition boundaries. The behaviour for the interval partitioning is the oddity, the behaviour for the “normal” range partitioning is the standard.

October 9, 2020

Inline Hint

Filed under: 18c,CBO,Execution plans,Hints,Oracle,subqueries,Subquery Factoring,Tuning — Jonathan Lewis @ 12:46 pm BST Oct 9,2020

If you’ve ever used subquery factoring (“with” subqueries or common table expressions (CTEs) as they are often called) then you’re probably aware of the (undocumented) hints /*+ materialize */ , which forces Oracle to create a local temporary table to hold the result of the subquery for subsequent use, and /*+ inline */, which forces the optimizer to copy the text of the subquery into the body of the query before starting the optimisation phase.

There’s a small, but important, enhancement to these hints that appeared in Oracle 18. Like so many other hints in Oracle they can now have a query block name as a “parameter”, so you can use them at the top level of your query. Here’s some code to demonstrate:

rem
rem     Script:         inline_hint.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1  -- hints don't have any effect
rem

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
/

create index t1_i1 on t1(object_id);

create table t2
as
select  *
from    t1
/

create index t2_i1 on t2(object_id);

spool inline_hint.lst


explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                inline(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and     exists (
                select
                        null
                from
                        v1      v1b
                where
                        v1b.object_id = t1.object_id
                and     v1b.object_name like 'WRI%'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

explain plan for
with v1 as (
        select 
                /*+ 
                        qb_name(cte) 
                */ 
                object_id, object_type, object_name 
                from t2 gtt1
                where object_id is not null
)
select
        /*+
                qb_name(main)
                materialize(@cte)
        */
        t1.object_id,
        t1.object_name
from
        t1
where
        exists (
                select
                        null
                from
                        v1      v1a
                where
                        v1a.object_id = t1.object_id
                and     v1a.object_type = 'TABLE'
        )
and
        t1.object_id between 100 and 200
/

select * from table(dbms_xplan.display(format=>'alias'));

The first of these two queries uses the factored subquery twice so, by default, it will create a “cursor duration memory” temporary table to hold the results of the subquery and then use that temporary table twice in the execution plan.

Conversely the second query uses the factored subquery just once, so the optimizer’s default action will be to copy the text into the body of the main query and optimize the whole thing as a single query block.

To reverse the default behaviour in versions of Oracle up to 12.2.0.1 (though later patch sets may include the 18c enhancements) you could add the /*+ inline */ or /*+ materialize */ hints respectively to the factored subqueries; but my demonstration you can see that I’ve given the factored subquery a query block name and added the relevant hint to the main query block passing in the query block name of the factored subquery – hence /*+ inline(@cte) */ and /*+ materialize(@cte) */.

Here – from 19.3 – are the resulting execution plans (with some cosmetic editing) – first the plan with the inline() hint.

------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |     1 |    63 |     9  (12)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                     |       |     1 |    63 |     9  (12)| 00:00:01 |
|   2 |   NESTED LOOPS                         |       |     1 |    50 |     7  (15)| 00:00:01 |
|   3 |    SORT UNIQUE                         |       |     1 |    25 |     4   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |    25 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T2_I1 |    48 |       |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED | T1    |     1 |    25 |     2   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN                   | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  8 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |     1 |    13 |     2   (0)| 00:00:01 |
|*  9 |    INDEX RANGE SCAN                    | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$06B48120
   4 - SEL$06B48120 / GTT1@CTE
   5 - SEL$06B48120 / GTT1@CTE
   6 - SEL$06B48120 / T1@MAIN
   7 - SEL$06B48120 / T1@MAIN
   8 - SEL$06B48120 / GTT1@CTE
   9 - SEL$06B48120 / GTT1@CTE

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OBJECT_NAME" LIKE 'WRI%')
   5 - access("OBJECT_ID">=100 AND "OBJECT_ID"<=200)
   7 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("T1"."OBJECT_ID"<=200 AND "T1"."OBJECT_ID">=100)
   8 - filter("OBJECT_TYPE"='TABLE')
   9 - access("OBJECT_ID"="T1"."OBJECT_ID")
       filter("OBJECT_ID"<=200 AND "OBJECT_ID">=100)

As you can see Oracle has copied the subquery text into the main body of the text and then optimized to produce a three-table join. One of the subqueries has been unnested into an aggregate view (operations 3,4,5), the other has been transformed into a semi-join.

In passing you’ll also notice that the optimizer has used transitive closure to add the range predicate on t1 to both occurrences of the t2 table.

And here’s the plan for the query with the single use of the subquery and materialize() hint:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  2448 |    39   (8)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6611_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  2448 |    13  (16)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED   | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                     | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  7 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|   8 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6611_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A3F38ADC
   2 - CTE
   3 - CTE          / GTT1@CTE
   5 - SEL$A3F38ADC / T1@MAIN
   6 - SEL$A3F38ADC / T1@MAIN
   7 - SEL$AA28F105 / V1A@SEL$1
   8 - SEL$AA28F105 / T1@SEL$AA28F105

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   6 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   7 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

In this plan the optimizer has created an in-memory temporary table and then used it in the existence subquery – which it has then transformed into a semi-join, so we have a query block with the name SEL$A3F38ADC; but we also see that the query block CTE still exists, labelling the operations that Oracle used to populate the temporary table.

It is an interesting (and irritating) detail that when we look at object aliases we see (operation 8) that Oracle has given the temporary table the alias of t1 – which is just a little confusing since I actually have a table called t1!

Next Steps

Being able to nominate a query block for the inline() and materialize() hints may be of great help in some cases (there’s a recent example on the Oracle Developer Forum (may need a MOS login) where it might make a huge difference to the performance of a particular query without requiring a rewrite of the SQL).

But there are a couple of details to investigate. First, I had a query block name built into my factored subquery – what happens if the author of the SQL didn’t include a query block name?

Before I’d added the inline() hint and query block names in the first example above this is what the plan looked like:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |    48 |  6240 |    48  (11)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6612_F53A566 |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | T2                         | 10000 |   322K|    27   (8)| 00:00:01 |
|*  4 |   HASH JOIN SEMI                         |                            |    48 |  6240 |    21  (15)| 00:00:01 |
|*  5 |    HASH JOIN SEMI                        |                            |    48 |  4992 |    13  (16)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED  | T1                         |    48 |  1200 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                    | T1_I1                      |    48 |       |     2   (0)| 00:00:01 |
|*  8 |     VIEW                                 |                            | 10000 |   771K|     8  (13)| 00:00:01 |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
|* 10 |    VIEW                                  |                            | 10000 |   253K|     8  (13)| 00:00:01 |
|  11 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6612_F53A566 | 10000 |   322K|     8  (13)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$A317D234
   2 - SEL$1
   3 - SEL$1        / GTT1@SEL$1
   6 - SEL$A317D234 / T1@SEL$2
   7 - SEL$A317D234 / T1@SEL$2
   8 - SEL$D67CB2D2 / V1B@SEL$4
   9 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
  10 - SEL$D67CB2D3 / V1A@SEL$3
  11 - SEL$D67CB2D3 / T1@SEL$D67CB2D3

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V1A"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - access("V1B"."OBJECT_ID"="T1"."OBJECT_ID")
   7 - access("T1"."OBJECT_ID">=100 AND "T1"."OBJECT_ID"<=200)
   8 - filter("V1B"."OBJECT_NAME" LIKE 'WRI%' AND "V1B"."OBJECT_ID">=100 AND "V1B"."OBJECT_ID"<=200)
  10 - filter("V1A"."OBJECT_TYPE"='TABLE' AND "V1A"."OBJECT_ID">=100 AND "V1A"."OBJECT_ID"<=200)

As you can see, the factored subquery (operations 2 and 3) has the query block name of sel$1 and the main query (operations 6 an 7 where the real t1 is used) has the query block name sel$2. So without giving the subquery a name I could have used the hint /*+ inline(@sel$1) */ in the main query block.

This takes us on to the second point that needs investigation. If you’ve looked at the example on the Oracle Developer Forum you will have seen that there’s an SQL statement that references a stored view and the factored subquery of interest is defined in the view. This means we might be able to edit the query that calls the view to include a hint referencing the query block inside the view – but then what do we do if we can’t edit the main query itself?

To be investigated (1) – would the inline() hint with nominated query block work if the factored subquery was inside a stored view that we were using in our query?

To be investigated(2) – if (1) works, could we achieve the same result by using an SQL Patch to attach the hint to the main query text without editing the main query?

Update (Oct 2020)

It turns out that I discovered this enhancement a few months ago while doing some experimentation with recursive subquery factoring.

Update Nov 2020

A blog note from Nenad Noveljic warns of a surprising ORA-07445 if you get too trigger-happy with the inline() and materialize() hints.

October 1, 2020

Index FFS Cost

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 11:46 am BST Oct 1,2020

There are a number of unexpected issues with the optimizer’s treatment of the index fast full scan, the access path where Oracle ignores the structure of the B-tree and uses multiblock reads to do a brute-force segment scan as if the index were a “skinny table” with a few blocks of irrelevant garbage (i.e. the branch blocks) that could be ignored.

A recent comment made a few days on a blog about the optimizer’s “index-join” access path reminded me that I had a few notes to finish and publish that might help some people address performance issues relating to a couple of little-known fixes for optimizer oddities in this area. Today’s “fix” is one that appeared in 10g (or maybe very late in 9i) but never became the default optimizer behaviour (possibly because it’s one of those “not a bug” types of bug with no “always correct” strategy).

The problem revolves around the optimizer_index_cost_adj parameter (which is another reason why the fix might have been kept out of the standard code base). For index-only execution plans the parameter will apply to the index range scan, full scan, or skip scan, but not to the index fast full scan. Here’s a model to demonstrate this:


rem
rem     Script:         oica_iffs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(trunc(dbms_random.value(0,10000)),10)      v1,
        rpad('x',20)                                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(v1);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                method_opt       => 'for all columns size 1'
        );
end;
.

select blocks from user_tables where table_name = 'T1';
select leaf_blocks from user_indexes where index_name = 'T1_I1';

I’ve created a fairly narrow table with 1M rows, and created an index on that table on the v1 column. In my test there were 5,219 blocks in the table and 3,068 blocks in the index.

I’m now going to run a query to count the rows in the table where v1 is not null and report its execution plan. The first run will be unhinted, the second run will include a hint that forces an index full scan, and the third run will include a hint to force an index fast full scan.


prompt  ==============
prompt  Unhinted query
prompt  ==============

select 
        count(*) 
from
        t1
where
        v1 is not null
;

prompt  ==========
prompt  Index hint
prompt  ==========

select 
        /*+ index(t1) */
        count(*) 
from
        t1
where
        v1 is not null
;

prompt  ==============
prompt  index_ffs hint
prompt  ==============

select 
        /*+ index_ffs(t1) */
        count(*) 
from
        t1
where
        v1 is not null
;

Here are the three execution plans. As you might have predicted the default plan (for my Oracle 19.3) is the index fast full scan:


==============
Unhinted query
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==========
Index hint
==========
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |  3159   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|  3159   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==============
index_ffs hint
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

Since the size of the index is less than the size of the table the cost of the index fast full scan will be lower than the cost of a tablescan; and since an index fast full scan does multiblock reads while the index full scan is assumed to use single block reads the cost of the index fast full scan will be lower than the cost of the index full scan.

Now we set the optimizer_index_cost_adj to 1 (as happens in a number of “legacy” commercial products) and repeat the experiment. Here are the three plans (reduced to just the body of the plan):

==================
OICA = 1, Unhinted
==================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

====================
OICA = 1, index hint
====================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

========================
OICA = 1, index_ffs hint
========================
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

The default plan is now the index full scan, not the index fast full scan; and when we look at the two hinted plans we can see why. The full scan cost has been reduced to 1% of the original cost (as dictated by the setting for optimizer_index_cost_adj); the fast full scan cost has not been adjusted.

So is this a good thing or a bad thing? Different people may answer that question differently. There are clearly likely to be cases where changing the setting for the optimizer_index_cost_adj (which you shouldn’t have been doing anyway for the last 20 years or so) will result in some plans switching from fast full scans to full scans with unfortunate consequences;

On the other hand when you do a fast full scan it may do direct path reads, while the full scan will do cached reads, and the cache may be holding a lot of the index leaf blocks already, so some people might prefer the current behaviour;

On the other, other hand an Exadata system can use storage indexes when doing direct path segment scans (even when the segment is an index) so an Exadata fast full scan might be far more desirable than a partially cached full scan.

So no generic right answer, only specific answers for particular sets of circumstances.

Solutions

This issue was addressed some time in the 10g timeline- my notes say the earliest successful tests I did of the fix were on 10.2.0.4, and that it wasn’t available in 9.2.0.8. There are two possibilities – a fix control, and an event.

  • Fix control: 4483286 – descibed as “Discount FFS cost using optimizer_index_cost_adj”
  • Event: 38085 at level one. I don’t think this event has ever appeared in the oraus.msg file, but it is referenced by bug 4483286

Of the two, setting the fix control is probably the safer option since the event is (apparently) non-existent and there’s always the possibility that the number will be hi-jacked in a future release to do something completely different. It’s also worth noting that the fix control can be applied to an individual statement through the /*+ opt_param() */ hint (there’s an example of the syntax towards the end of another article of mine).

A little demonstration:


-- alter session set events '38085 trace name context forever, level 1';
alter session set "_fix_control"='4483286:1';
alter session set optimizer_index_cost_adj = 1;

set autotrace traceonly explain

prompt  ==============
prompt  Unhinted query
prompt  ==============

select 
        count(*) 
from
        t1
where
        v1 is not null
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With the event or _fix_control set the cost of the index fast full scan drops to 1%, and reappears as the default (unhinted) path.

 

September 30, 2020

Serial Bloom

Filed under: Execution plans,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:35 pm BST Sep 30,2020

Following the recent note I wrote about an enhancement to the optimizer’s use of Bloom filters, I received a question by email asking about the use of Bloom filters in serial execution plans:

I’m having difficulty understanding the point of a Bloom filter when used in conjunction with a hash join where everything happens within the same process.

I believe you mentioned in your book (Cost Based Oracle) that hash joins have a mechanism similar to a Bloom filter where a row from the probe table is checked against a bitmap, where each hash table bucket is indicated by a single bit. (You have a picture on page 327 of the hash join and bitmap, etc).

The way that bitmap looks and operates appears to be similar to a Bloom filter to me…. So it looks (to me) like hash joins have a sort of “Bloom filter” already built into them.

My question is… What is the advantage of adding a Bloom filter to a hash join if you already have a type of Bloom filter mechanism thingy built in to hash joins?

I can understand where it would make sense with parallel queries having to pass data from one process to another, but if everything happens within the same process I’m just curious where the benefit is.

 

The picture on page 327 of CBO-F is a variation on the following, which is the penultimate snapshot of the sequence of events in a multi-pass hash join. The key feature is the in-memory bitmap at the top of the image describing which buckets in the (partitioned and spilled) hash table hold rows from the build table. I believe that it is exactly this bitmap that is used as the Bloom filter.

The question of why it might be worth creating and using a Bloom filter in a simple serial hash join is really a question of scale. What is the marginal benefit of the Bloom filter when the basic hash join mechanism is doing all the hash arithmetic and comparing with a bitmap anyway?

If the hash join is running on an Exadata machine then the bitmap can be passed as a predicate to the cell servers and the hash function can be used at the cell server to minimise the volume of data that has to be passed back to the database server – with various optimisations dependent on the version of the Exadata software. Clearly minimising traffic through the interconnect is going to have some benefit.

Similarly, as the email suggests, for a parallel query where (typically) one set of parallel processes will read the probe table and distribute the data to the second set of parallel processes which then do the hash join it’s clearly sensible to allow the first set of procsses to apply the hash function and discard as many rows as possible before distributing the survivors – minimising inter-process communication.

In both these cases, of course, there’s a break point to consider of how effective the Bloom filter needs to be before it’s worth taking advantage of the technology. If the Bloom filter allows 99 rows out of every hundred to be passed to the database server / second set of parallel processes then Oracle has executed the hash function and checked the bitmap 100 times to avoid sending one row (and it will (may) have to do the same hash function and bitmap check again to perform the hash join); on the other hand if the Bloom filter discards 99 rows and leaves only one row surviving then that’s a lot of traffic eliminated – and that’s likely to be a good thing. This is why there are a few hidden parameters defining the boundaries of when Bloom filters should be used – in particular there’s a parameter “_bloom_filter_ratio” which defaults to 35 and is, I suspect, a figure which says something like “use Bloom filtering only if it’s expected to reduce the probe data to 35% of the unfiltered size”.

So the question then becomes: “how could you benefit from a serial Bloom filter when it’s the same process doing everything and there’s no “long distance” traffic going on between processes?” The answer is simply that we’re operating at a much smaller scale. I’ve written blog notes in the past where the performance of a query depends largely on the number of rows that are passed up a query plan before being eliminated (for example here, where the volume of data moving results in a significant fraction of the total time).

If you consider a very simple hash join its plan is going to be shaped something like this:


-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    45 |   720 |    31 |
|*  1 |  HASH JOIN         |      |    45 |   720 |    31 |
|*  2 |   TABLE ACCESS FULL| T2   |    15 |   120 |    15 |
|   3 |   TABLE ACCESS FULL| T1   |  3000 | 24000 |    15 |
-----------------------------------------------------------

If you read Tanel Poder’s article on execution plans as a tree of Oracle function calls you’ll appreciate that you could translate this into informal English along the lines of:

  • Operation 1 calls a function (at operation 2) to do a tablescan of t1 and return all the relevant rows, building an in-memory hash table by applying a hashing function to the join column(s) of each row returned by the call to the tablescan. As the hash table is populated the operation also constructs a bitmap to flag buckets in the hash table that have been populated.
  • Operation 1 then calls a function (at operation 3) to start a tablescan and then makes repeated calls for it to return one row (or, in newer versions, a small rowset) at a time from table t2. For each row returned operation 1 applies the same hash function to the join column(s) and checks the bitmap to see if there’s a potential matching row in the relevant bucket of the hash table, and if there’s a potential match Oracle examines the actual contents of the bucket (which will be stored as a linked list) to see if there’s an actual match.

Taking the figures above, let’s imagine that Oracle is using a rowset size of 30 rows. Operation 1 will have to make 100 calls to Operation 3 to get all the data, and call the hashing function 3,000 times.  A key CPU component of the work done is that the function represented by operation 3 is called 100 times and (somehow) allocates and fills an array of 30 entries each time it is called.

Now assume operation 1 passes the bitmap to operation 3 as an input and it happens to be a perfect bitmap. Operation 3 starts its tablescan and will call the hash function 3,000 times, but at most 45 rows will get past the bitmap. So operation 1 will only have to call operation 3 twice.  Admittedly operation 1 will (possibly) call the hash function again for each row – but maybe operation 3 will supply the hash value in the return array. Clearly there’s scope here for a trade-off between the reduction in work due to the smaller number of calls and the extra work needed to take advantage of the bitmap technology.

Here’s an example that shows the potential for savings – if you want to recreate this test you’ll need about 800MB of free space in the database, the first table takes about 300MB and the second about 450MB.


rem
rem     Script:         bloom_filter_serial_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,30,'0')             v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        round(rownum + 0.5,2)           id,
        mod(rownum,1e5)                 n1,
        lpad(rownum,10)                 v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;


prompt  =================
prompt  With Bloom filter
prompt  =================

select 
        /*+ 
                px_join_filter(t1) 
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

prompt  ===============
prompt  No Bloom filter
prompt  ===============

select 
        /*+
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

I’ve created tables t1 and t2 with an id column that never quite matches, but the range of values is set so that the optimizer thinks the two tables might have a near-perfect 1 to 1 match. I’ve given t2 an extra column with 105 distinct values in its 107 rows, so it’s going to have 100 rows per distinct value. Then I’ve presented the optimizer with a query that looks as if it’s going to find 100 rows in t2 and needs to find a probable 100 rows of matches in t1. For my convenience, and to highlight a couple of details of Bloom filters, it’s not going to find any matches.

In both runs I’ve enabled the SQL Monitor feature with the /*+ monitor */ hint, and in the first run I’ve also hinted the use of a Bloom filter. Here are the resulting SQL Monitor outputs. Bear in mind we’re looking at a reasonably large scale query – volume of input data – with a small result set.

First without the Bloom filter:


Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    3.00 |    2.24 |     0.77 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=2959412835)
==================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT     |      |         |       |         2 |     +2 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN          |      |     100 | 14373 |         2 |     +2 |     1 |        0 |      |       |   2MB |          |                 |
|  2 |    TABLE ACCESS FULL | T2   |      99 |  5832 |         2 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  3 |    TABLE ACCESS FULL | T1   |     10M |  8140 |         2 |     +2 |     1 |      10M |  463 | 453MB |     . |          |                 |
==================================================================================================================================================

According to the Global Stats the query has taken 3 seconds to complete, of which 2.24 seconds is CPU. (The 750MB read in 0.77 second would be due to the fact that I’m running off SSD, and I’ve got a 1MB read size that helps). A very large fraction of the CPU appears because of the number of calls from operation 1 to operation 3 (the projection information pulled from memory reports a rowset size of 256 rows, so that’s roughly 40,000 calls to the function.

When we force the use of a Bloom filter the plan doesn’t change much (though the creation and use of the Bloom filter has to be reported) – but the numbers do change quite significantly.

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    1.97 |    0.99 |     0.98 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=4148581417)
======================================================================================================================================================
| Id |       Operation       |  Name   |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT      |         |         |       |         1 |     +1 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN           |         |     100 | 14373 |         1 |     +1 |     1 |        0 |      |       |   1MB |          |                 |
|  2 |    JOIN FILTER CREATE | :BF0000 |      99 |  5832 |         1 |     +1 |     1 |      100 |      |       |     . |          |                 |
|  3 |     TABLE ACCESS FULL | T2      |      99 |  5832 |         1 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  4 |    JOIN FILTER USE    | :BF0000 |     10M |  8140 |         1 |     +1 |     1 |    15102 |      |       |     . |          |                 |
|  5 |     TABLE ACCESS FULL | T1      |     10M |  8140 |         1 |     +1 |     1 |    15102 |  463 | 453MB |     . |          |                 |
======================================================================================================================================================


In this case, the elapsed time dropped to 1.97 seconds (depending on your viewpoint that’s either a drop of “only 1.03 seconds” or drop of “an amazing 34.3%”; with the CPU time dropping from 2.24 seconds to 0.99 seconds (55.8% drop!)

In this case you’ll notice that the tablescan of t1 produced only 15,102 rows to pass up to the hash join at operation 1 thanks to the application of the predicate (not reported here): filter(SYS_OP_BLOOM_FILTER(:BF0000,”T1″.”ID”)). Instead of 40,000 calls for the next rowset the hash function has been applied during the tablescan and operation 5 has exhausted the tablescan after only about 60 calls. This is what has given us the (relatively) significant saving in CPU.

This example of the use of a Bloom filter highlights up the two points I referred to earlier.

  • First, although we see operations 4 and 5 as Join (Bloom) filter use and Table access full respectively I don’t think the data from the tablescan is being “passed up” from operation 5 to 4; I believe operation 4 can be views as a “placeholder” in the plan to allow us to see the Bloom filter in action, the hashing and filtering actually happening during the tablescan.
  • Secondly, we know that there are ultimately no rows in the result set, yet the application of the Bloom filter has not eliminated all the data. Remember that the bitmap that Oracle constructs of the hash table identifies used buckets, not actual values. Those 15,102 rows are rows that “might” find a match in the hash table because they belong in buckets that are flagged. A Bloom filter won’t discard any data that is needed, but it might fail to eliminate data that subsequently turns out to be unwanted.

How parallel is parallel anyway?

I’ll leave you with one other thought. Here’s an execution plan from 12c (12.2.0.1) which joins three dimension tables to a fact table. There are 343,000 rows in the fact table and the three joins individually identify about 4 percent of the data in the table. In a proper data warehouse we might have been looking at a bitmap star transformation solution for this query, but in a mixed system we might want to run warehouse queries against normalised data – this plan shows what Bloom filters can do to minimise the workload. The plan was acquired from memory after enabling rowsource execution statistics:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |        |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   1 |  SORT AGGREGATE              |          |      1 |      1 |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   2 |   PX COORDINATOR             |          |      1 |        |        |      |            |      2 |00:00:00.05 |      22 |      3 | 73728 | 73728 |          |
|   3 |    PX SEND QC (RANDOM)       | :TQ10000 |      0 |      1 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   4 |     SORT AGGREGATE           |          |      2 |      1 |  Q1,00 | PCWP |            |      2 |00:00:00.09 |    6681 |   6036 |       |       |          |
|*  5 |      HASH JOIN               |          |      2 |     26 |  Q1,00 | PCWP |            |     27 |00:00:00.09 |    6681 |   6036 |  2171K|  2171K|     2/0/0|
|   6 |       JOIN FILTER CREATE     | :BF0000  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  7 |        TABLE ACCESS FULL     | T3       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  8 |       HASH JOIN              |          |      2 |    612 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6634 |   6026 |  2171K|  2171K|     2/0/0|
|   9 |        JOIN FILTER CREATE    | :BF0001  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 10 |         TABLE ACCESS FULL    | T2       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 11 |        HASH JOIN             |          |      2 |  14491 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6614 |   6022 |  2171K|  2171K|     2/0/0|
|  12 |         JOIN FILTER CREATE   | :BF0002  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 13 |          TABLE ACCESS FULL   | T1       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|  14 |         JOIN FILTER USE      | :BF0000  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  15 |          JOIN FILTER USE     | :BF0001  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  16 |           JOIN FILTER USE    | :BF0002  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  17 |            PX BLOCK ITERATOR |          |      2 |    343K|  Q1,00 | PCWC |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|* 18 |             TABLE ACCESS FULL| T4       |     48 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.05 |    6594 |   6018 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

It’s a parallel plan, but it’s used the 12c “PQ_REPLICATE” strategy. The optimizer has decided that all the dimension tables are so small that it’s going to allow every PX process to read every (dimension) table through the buffer cache and build its own hash tables from them. (In earlier versions you might have seen the query coordinator scanning and broadcasting the three small tables, or one set of PX processes scanning and broadcasting to the other set).

So every PX process has an in-memory hash table of all three dimension tables and then (operation 17) they start a tablescan of the fact table, picking non-overlapping rowid ranges to scan. But since they’ve each created three in-memory hash tables they’ve also been able to create three Bloom filters each, which can all be applied simultaneously as the tablescan takes place; so instead of 343,000 rows being passed up the plan and through the first hash join (where we see from operation 11 that the number of surviving rows would have been about 14,500 ) we see all but 27 rows discarded very early on in the processing. Like bitmap indexes part of the power of Bloom filters lies in the fact that with the right plan the optimizer can combine them and identify a very small data set very precisely, very early.

The other thing I want you to realise about this plan, though, is that it’s not really an “extreme” parallel plan. It’s effectively running as a set of concurrent, non-interfering, serial plans. Since I was running (parallel 2) Oracle started just 2 PX processes: they both built three hash tables from the three dimension tables then split the fact table in half and took half each to do all the joins, and passed the nearly complete result to the query co-ordinator at the last moment. That’s as close as you can get to two serial, non-interfering, queries and still call it a parallel query. So, if you wonder why there might be any benefit in serial Bloom filters – Oracle’s actually being benefiting from it under the covers for several years.

Summary

Bloom filters trade a decrease in messaging against an increase in preparation and hashing operations. For Exadata systems with predicate offloading it’s very easy to see the potential benefit; for general parallel execution; it’s also fairly easy to see the potential benefit for parallel query execution what inter-process message between two sets of PX processes can be resource intensive; but even for serial queries there can be some benefit though, in absolute terms, they are likely to be only a small saving in CPU.

 

September 9, 2020

Bloom Upgrade

Filed under: 18c,CBO,Joins,Oracle,Tuning — Jonathan Lewis @ 5:18 pm BST Sep 9,2020

It’s a common pattern of Oracle features that they start with various restrictions or limitations that disappear over time. This note is about an enhancement to Bloom filter processing that appeared in the 18.1 optimizer and, for some people, may be a good enough reason for upgrading to a newer version of Oracle. This enhancement came to my attention by way of the Oracle Developer forum in a thread with the title Bloom filters and view using UNION ALL asking how to get a Bloom filter pushed inside a UNION ALL view. The original requirement wasn’t a trivial one so I’ll demonstrate the problem with a very simple example – first the data set:

rem
rem     Script:         bloom_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem
rem     Last tested:
rem             19.3.0.0
rem

create table t1 as select * from all_objects where rownum <= 50000;
create table t2 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4);
create table t3 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4); -- > comment to avoid wordpress format issue

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns object_type size 254'
        );
end;
/

I’ve been a bit lazy here, copying data from view all_objects. I’ve gathered stats on t1 so that I can generate a histogram on the object_type column because I’m going to query for a rare object_type and I want the optimizer to get a reasonable estimate of rows. I’m going to hint a parallel query to join t1 to t2 (aliased, trivially, as v1 for reasons that will become apparent soon):

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        t2 v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

In my case the optimizer chooses to do a hash join between these two table, and creates a Bloom filter to try and minimise the data passing through the data flow operation. The result set in my 12.2.0.1 database is only 16 rows, so it would be nice if the parallel scan could eliminate most of the 200,000 rows in t2 early – here’s the execution plan pulled from memory after running the query with rowsource execution stats enabled:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |   371 (100)|     16 |00:00:00.06 |      20 |      0 |       |       |          |
|   1 |  PX COORDINATOR        |          |      1 |        |            |     16 |00:00:00.06 |      20 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)  | :TQ10000 |      0 |     16 |   371   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN           |          |      2 |     16 |   371   (5)|     16 |00:00:00.05 |    6278 |   3988 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   6 |     JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|   7 |      PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.03 |    4244 |   3988 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"V1"."OBJECT_ID"))

We see that Oracle has generated a Bloom filter at operation 4 from the data returned from t1 at operation 5, and then used that Bloom filter at operation 6 to eliminate most of the data from t2 before passing the remaining few rows up to the hash join.

Let’s make the query more interesting – what if you want to use a UNION ALL of t2 and t3 in the query (for example one might be “current data” while the other is “historic data”. Here’s the query and plan from 12.2.0.1:

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   667 (100)|     32 |00:00:00.37 |      40 |      0 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     32 |00:00:00.37 |      40 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN            |          |      1 |     32 |   667   (5)|     32 |00:00:00.34 |    5125 |   3860 |  1250K|  1250K|     2/0/0|
|*  4 |     TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   5 |     VIEW                |          |      2 |    400K|   584   (4)|    400K|00:00:00.52 |    8488 |   7976 |       |       |          |
|   6 |      UNION-ALL          |          |      2 |        |            |    400K|00:00:00.24 |    8488 |   7976 |       |       |          |
|   7 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|*  8 |        TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|    200K|00:00:00.07 |    4244 |   3988 |       |       |          |
|   9 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|* 10 |        TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|    200K|00:00:00.03 |    4244 |   3988 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   4 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue

No Bloom filter – so all 400,000 rows feed up the plan and through the hash join. This won’t matter too much for my sub-second tiny data set but on a pair of 50GB tables, with the potential to offload the Bloom filter to storage in Exadata and, perhaps, eliminate 99% of the data at the cell servers, this could make a huge difference to performance.

Since Bloom filters are all about hashing data (in Oracle the standard Bloom filter is the bitmap summarising the build table in a hash join) let’s trying pushing the optimizer into a hash distribution for the parallel join to see if that had any effect:


select
        /*+ 
                parallel(2) 
                gather_plan_statistics
                leading(@sel$1 t1@sel$1 v1@sel$1)
                use_hash(@sel$1 v1@sel$1)
                pq_distribute(@sel$1 v1@sel$1 hash hash)
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |      1 |        |   667 (100)|     32 |00:00:00.43 |      60 |      0 |       |       |          |
|   1 |  PX COORDINATOR             |          |      1 |        |            |     32 |00:00:00.43 |      60 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN BUFFERED       |          |      1 |     32 |   667   (5)|     32 |00:00:00.38 |    4000 |   3752 |  2290K|  2082K|     2/0/0|
|   4 |     JOIN FILTER CREATE      | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX RECEIVE             |          |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |      0 |      4 |    75   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |        STATISTICS COLLECTOR |          |      2 |        |            |      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|   8 |         PX BLOCK ITERATOR   |          |      2 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|*  9 |          TABLE ACCESS FULL  | T1       |     26 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|  10 |     PX RECEIVE              |          |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  12 |       JOIN FILTER USE       | :BF0000  |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  13 |        VIEW                 |          |      2 |    400K|   584   (4)|    400K|00:00:00.68 |    8488 |   7976 |       |       |          |
|  14 |         UNION-ALL           |          |      2 |        |            |    400K|00:00:00.59 |    8488 |   7976 |       |       |          |
|  15 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.18 |    4244 |   3988 |       |       |          |
|* 16 |           TABLE ACCESS FULL | T2       |     32 |    200K|   292   (4)|    200K|00:00:00.06 |    4244 |   3988 |       |       |          |
|  17 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.12 |    4244 |   3988 |       |       |          |
|* 18 |           TABLE ACCESS FULL | T3       |     32 |    200K|   292   (4)|    200K|00:00:00.08 |    4244 |   3988 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue 
       filter("T1"."OBJECT_TYPE"='SCHEDULE') 
  16 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue
  18 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue

We’ve managed to introduce a Bloom filter (which is visible as :BF0000 in the plan, even through there’s no reference to sys_op_bloom_filter() in the predicate information) but there’s a problem, we’re still passing 400,000 rows up the plan and the Bloom filter is only being applied at (or just after) the VIEW operator, discarding all but 66 rows before doing the hash join. It’s an improvement but not ideal; we’d like to see the Bloom filter applied to each of the two tables separately to eliminate rows as early as possible.

This can’t be done in 12.2, and you’d have to rewrite the query, changing a “join with union” into a “union of joins”, and that’s not really a desirable strategy.

Next Steps

Searching MOS, though you will be able to find the following note:

Doc ID 18849313.8 – ENH : bloom filters/pruning are pushed through union-all view

There’s an enhancement request to do what we want in 18.1, and the enhancement has got into the software. Here’s the (unhinted) plan from 19.3 (the plan stays the same when optimizer_features_enable is set back to 18.1.0, but drops back to the 12.1. plan when OFE is set to 12.2.0.1):

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   666 (100)|     32 |00:00:00.11 |      10 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     32 |00:00:00.11 |      10 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)    | :TQ10000 |      0 |     32 |   666   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN             |          |      2 |     32 |   666   (5)|     32 |00:00:00.05 |   10020 |   7958 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE   | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|   6 |     VIEW                 |          |      2 |    400K|   583   (4)|     32 |00:00:00.04 |    8022 |   7958 |       |       |          |
|   7 |      UNION-ALL           |          |      1 |        |            |     12 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   8 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   9 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 10 |         TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  11 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  12 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 13 |         TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."OBJECT_ID")) 
  13 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T3"."OBJECT_ID"))

As you can see, we create a Bloom filter at operation 4, and use it twice at operations 8 and 11 – with the sys_op_bloom_filter() functions clearly visible in the predicate information showing us that the Bloom filter is applied to the object_id column in both cases.

If you want to disable this enhancement for some reasons there are two hidden parameters available (which you might set for a single query using the opt_param() hint):

  • _bloom_filter_setops_enabled = true
  • _bloom_pruning_setops_enabled = true

The first is for Bloom filters in the situation shown, I assume the second deals with Bloom filters for partition pruning.

Summary

In versions prior to 18.1 the optimizer is unable to push Bloom filters down to the individual tables in a UNION ALL view, but this limitation was removed in the 18.1 code set.

 

August 19, 2020

Subquery with OR

Filed under: 12c,Execution plans,Hints,Oracle,Performance — Jonathan Lewis @ 2:03 pm BST Aug 19,2020

I’ve written a couple of notes in the past about the problems of optimising queries with predicates of the form “or exists {subquery}”. A recent question on the Oracle Developer Community forum brought to my attention an improvement in this area in (very precisely) 12.2, as well as giving me a cute example of how the first cut of a new feature doesn’t always cover every detail, and creating a nice example of how the new technology enhances the old technology.

We start with some data and a simple query running under 12.2.0.1:

rem
rem     Script:         exists_with_or_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2  -- feature not implemented
rem

create table cat_contact(
        contact_method_id       varchar2(1) not null,
        contact_id              number(8,0) not null,
        small_vc                varchar2(10),
        padding                 varchar2(100)
);

alter table cat_contact add constraint cc_pk primary key(contact_id);
create index cc_i1 on cat_contact(contact_method_id);

insert into cat_contact
select
        chr(64 + case when rownum <= 10 then rownum else 26 end),
        rownum,
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_contact where contact_method_id in ('A','B','C');

create table cat_item(
        contact_id      number(8,0) not null,
        item_category   varchar2(1) not null,
        small_vc        varchar2(10),
        padding         varchar2(100),
        constraint ci_ref_cc foreign key(contact_id) references cat_contact
)
;

alter table cat_item add constraint ci_pk primary key(contact_id, item_category);
create index ci_i1 on cat_item(item_category);

insert into cat_item 
select
        rownum,
        chr(88 + case when rownum <= 10 then mod(rownum,2) else 2 end),
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_item where item_category in ('X','Y');

execute dbms_stats.gather_table_stats(user,'cat_contact')
execute dbms_stats.gather_table_stats(user,'cat_item')

I’ve created and populated two tables (the table and column names come from the ODC thread). There’s a foreign key relationship defined between cat_item and cat_contact, both tables have primary keys declared, with a couple of extra columns declared not null.

I’ve populated the two tables with a small amount of data and each table has one column rigged with very skewed data:

  • cat_contact.contact_method_id is mostly ‘Z’ with one row each of ‘A’ to ‘J’ ,
  • cat_item.item_category (the second column in the primary key) is mostly ‘Z’ with 5 rows each of ‘X’ and ‘Y’

After populating each table I’ve queried it in a way which means the subsequent stats gathering will create frequency histograms on these two columns and the optimizer will be able to take advantage of the skew in its arithmetic, which means it may choose to use the indexes I’ve created on those skewed columns if the right values appear in the queries.

So here’s the query we’re interested in:

SELECT  /*+ 
                qb_name(main) 
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR      c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

Here’s the default execution plan (in 12.2.0.1 with my settings for system stats and various other optimizer-related figures that MIGHT make a difference) pulled from memory after executing the query to return 10 rows.


-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |       |       |    34 (100)|          |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | CAT_CONTACT | 10000 |  1152K|    34   (6)| 00:00:01 |
|   3 |   INLIST ITERATOR   |             |       |       |            |          |
|*  4 |    INDEX UNIQUE SCAN| CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("C"."CONTACT_METHOD_ID") OR  IS NOT NULL))
   4 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR
              "I"."ITEM_CATEGORY"='Y')))

For every row in the cat_contact table Oracle has checked whether or not the contact_method is an ‘A’, ‘B’, or ‘C’ and passed any such rows up to its parent, for all other rows it’s then executed the subquery to see if the row with the matching contact_id in contact_item has an ‘X’ or ‘Y’ as the item_category. It’s had to run the subquery 9,997 times (there were only three rows matching ‘A’,’B’,’C’) and the INLIST ITERATOR at operation 3 means that it’s probed the index nearly 20,000 timtes. This does not look efficient.

I’ve said in previous articles that when you need to optimize queries of this shape you need to rewrite them as UNION ALL queries to separate the two parts of the complex OR predicate and then make sure that you don’t report any items twice – which you do by making use of the lnnvl() function. So let’s do this – but let’s do it the lazy “new technology” way by upgrading to 19c and executing the query there; here’s the plan I got in 19.3.0.0:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

The optimizer has used the new “cost-based OR-expansion” transformation to rewrite the query as a UNION ALL query. We can see an efficient access into cat_contact to identify the ‘A’,’B’,’C’ rows, and then we can see that the second branch of the union all handles the existence subquery but the optimizer has unnested the subquery to select the 10 rows from cat_item where the item_category is ‘X’ or ‘Y’ and used those rows in a nested loop to drive into the cat_contact table using the primary key. We can also see the use of the lnnvl() function in operation 13 that ensures we don’t accidentally report the ‘A’,’B’,’C’ rows again.

So let’s go back to 12.2.0.1 and see what happens if we just add the /*+ or_expand(@main) */ hint to the query. Here’s the resulting execution plan:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

We get exactly the plan we want – with the same cost as the 19c cost, which happens to be less than half the cost of the default plan that we got from 12.2.0.1. So it looks like there may be case where you will need to hint OR-expansion because is might not appear by default.

Other Observations 1 – ordering

You may have noticed that my query has, unusually for me, put the existence subquery first and the simple filter predicate second in the where clause. I don’t like this pattern as (over time, and with different developers modifying queries) it’s too easy in more complex cases to “lose” the simple predicate; a one-liner can easily drift, change indents, get bracketed with another predicate that it shouldn’t be connected with and so on. I’ve actually seen production systems producing wrong results because little editing accidents like this (counting brackets is the classic error) have occured – so I’m going to rerun the test on 12.2.0.1 with the predicates in the order I would normally write them.

Here’s the “corrected” query with its execution plan:


SELECT  /*+ 
                qb_name(main) 
                or_expand(@main)
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                c.contact_method_id IN ('A', 'B', 'C')  
        OR
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        )
;  


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    16 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    16   (7)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   4 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   5 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   6 |       INLIST ITERATOR                     |                 |       |       |            |          |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|*  9 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|  10 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
|* 11 |    FILTER                                 |                 |       |       |            |          |
|  12 |     INLIST ITERATOR                       |                 |       |       |            |          |
|  13 |      TABLE ACCESS BY INDEX ROWID BATCHED  | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN                    | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|  15 |     INLIST ITERATOR                       |                 |       |       |            |          |
|* 16 |      INDEX UNIQUE SCAN                    | CI_PK           |     1 |     6 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
   9 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  11 - filter(LNNVL( IS NOT NULL))
  14 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  16 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')))

The execution plan has jumped from 14 lines to 17 lines, the cost has gone up from 14 to 16, and both branches of the plan now report access to cat_contact and cat_item (though only through its primary key index in the second branch). What’s happened?

Oracle 12.2.0.1 has rewritten the query as a UNION ALL working from the bottom up – so in this case the first branch of the rewrite handles the original filter subquery, unnesting it to drive efficient from cat_item to cat_contact. This means the second branch of the rewrite has to find the ‘A’,’B’,’C’ rows in cat_contact and then check that the filter subquery hadn’t previously reported them – so the optimizer has applied the lnnvl() function to the filter subquery – which you can nearly see in the Predicate Information for operation 11.

To make it clearer, here’s what you get as the predicate information for that operation after calling explain plan and dbms_xplan.display()

  11 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

In 12.2 the order of predicates in your query seems to be important – unless told otherwise the optimizer is working from the bottom-up (then rewriting top-down). But there is hope (though not documented hope). I added the /*+ or_expand(@main) */ hint to the query to force OR-expansion. Checking the Outline Information of the plan I could see that this had been expanded to /*+ or_expand(@main (1) (2)) */. Taking a wild guess as the significance of the numbers and changing the hint to /*+ or_expand(@main (2) (1) */ I re-ran the test and back to the more efficient plan – with the filter subquery branch appearing second in the UNION ALL view and the lnnvl() applied to the simpler predicate.

So the OR-expansion code is not fully cost-based in 12.2.0.1, but you can modify the behaviour through hinting. First to force it to appear (which may not happen even if it seems to be the lower cost option), and secondly to control the ordering of the components of the UNION ALL. As with all things relating to hints, though, act with extreme caution: we do not have sufficient documentation explaining exactly how they work, and with some of them we don’t even know whether the code path is even complete yet.

Other Observations 2 – 12cR1

The or_expand() hint and cost-based OR-expansion appeared specifically in 12.2.0.1; prior to that we had a similar option in the use_concat() hint and concatenation – which also attempts to rewrite your query to produce a union all of disjoint data sets. But there are restrictions on what concatentation can do. I rarely remember what all the restrictions are, but there are two critical restrictions:

  • first, it will only appear by default if there is an indexed access path available to drive every branch of the rewrite
  • secondly, it will not apply further transformations to the separate branches that it produces

If we try adding the or_expand() hint to our query in 12.1.0.2 it will have no effect, so let’s add a suitable use_concat() hint and see what happens:

explain plan for
SELECT  /*+ 
                qb_name(main) 
                use_concat(@main 8 or_predicates(1))
--              use_concat(@main   or_predicates(1))
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR
                c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             | 10000 |  1152K|    40   (3)| 00:00:01 |
|   1 |  CONCATENATION                        |             |       |       |            |          |
|   2 |   INLIST ITERATOR                     |             |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |
|*  5 |   FILTER                              |             |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | CAT_CONTACT |  9997 |  1151K|    35   (6)| 00:00:01 |
|   7 |    INLIST ITERATOR                    |             |       |       |            |          |
|*  8 |     INDEX UNIQUE SCAN                 | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C')
   5 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))
   6 - filter(LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C'))
   8 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

26 rows selected.

As you can see by forcing concatentation I’ve got my “union all” view with lnnvl() applied in the second branch. But the second branch was the “select where exists()” branch and the optimizer has not been able (allowed?) to do the unnesting that would let it drive efficiently from the cat_item table to the cat_contact table. The effect of this is that the plan still ends up with a full tablescan of cat_contact running a filter subquery on virtually every row- so concatenation doesn’t save us anything.

The significance of the “8” in the hint, by the way is (I believe) that it tells the optimizer to use inlist iterators when possible. If I had omitted the “8” the plan would have had 4 branches – one each for ‘A’, ‘B’, and ‘C’ and the fourth for the filter subquery. I could also have added a hint /*+ use_concat(@subq or_predicates(1)) */ to replace operations 7 and 8 with a single index range scan with a filter predicate for the ‘X’/’Y’ check (and that might, in any case, be slightly more efficient than the iteration approach).

Footnote(s)

The “legacy” OR-expansion (“concatenation” a.k.a. LORE in the optimizer trace file) can be controlled through the hints use_concat(), and no_expand().

The new cost-based OR-expansion (now ORE in the optimizer trace file) can be controlled through the hints or_expand() and no_or_expand().

The new cost-based OR-expansion has some restrictions, for example it is explicitly blocked in a MERGE statement, even in 19c, as reported in this blog note by Nenad Noveljic. As the blog note shows, concatenation is still possible but you (may) have to disable cost based OR-expansion.

I scanned the executable for the phrase “ORE: bypassed” to see if there were any messages that would suggest other reasons why cost-based OR-expansion would not be used; unfortunately the only relevant string was “ORE: bypassed – %s” [update (see comment 5 below): after ignoring case there was a second option: “ORE: Bypassed for disjunct chain: %s.”] – in other words all the possible bypass messages would be filled in on demand. I found a list of messages that might be relevant; I’d be a little cautious about trusting it but if you don’t see the feature appearing when you’re expecting it then it might be worth checking whether one of these could apply.

  • Old OR expansion hints present
  • Semi join hint present
  • QB has FALSE predicate
  • QB marked for NO Execution
  • Full Outer join QB
  • Rownum found in disjunction
  • Anti/semi/outer join in disjunction
  • Opposite Range in disjunction
  • No Index or Partition driver found
  • Predicate chain has all constant predicates
  • Negated predicate found
  • Long bitmap inlist in OR predicate
  • PRIOR expression in OR predicate
  • All 1-row tables found
  • No valid predicate for OR expansion
  • Disjunctive subquery unnesting possible
  • Subquery unnesting possible
  • Subquery coalesced query block
  • Merge view query block

Finally – here’s another reference blog note comparing LORE with ORE from Mohamed Houri.

 

July 15, 2020

Fetch First vs. Rownum

Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:


rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.

Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.



create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:


select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:


select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.

Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.

Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)


As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.

 

July 13, 2020

Min/Max costing

Filed under: Oracle,Performance,subqueries,Tuning — Jonathan Lewis @ 1:07 pm BST Jul 13,2020

A question about the min/max index scan appeared on the Oracle Developer Community forum recently. The query supplied in the thread was a little odd – you might ask why anyone would run it as it stands – and I’ve modified it to make it even stranger to demonstrate a range of details.

I’ll start with a simple data set, not bothering to collect stats because that will be done automatically on create for my versions:

rem
rem     Script:         min_max_cost_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 
as 
select  * 
from    all_objects 
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(object_name);

Now a few simple queries – for which I’ll capture and display the in-memory execution plans a little further on:


set linesize 156
set pagesize 60
set trimspool on
set serveroutput off
alter session set statistics_level = all;

prompt  =====================
prompt  Baseline select max()
prompt  =====================

select max(object_name) from t1;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ============================
prompt  select max() with dummy join
prompt  ============================

select max(object_name) from t1, dual where dummy is not null;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  =============================================
prompt  select max() with dummy join and index() hint
prompt  =============================================

select /*+ index(t1) */  max(object_name) from t1, dual where dummy is not null;
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ============================================
prompt  select max() with dummy join and inline view
prompt  ============================================

select  obj
from    (
        select  max(object_name)  obj
        from    t1
        ),
        dual 
where   dummy is not null
/

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

prompt  ====================================
prompt  select max() with existence subquery
prompt  ====================================

select max(object_name) from t1 where exists (select null from dual where dummy is not null);
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

prompt  ============================================
prompt  select max() with failing existence subquery
prompt  ============================================

select max(object_name) from t1 where exists (select null from dual where dummy is null);
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last alias'));

With 50,000 rows and the appropriate index to allow Oracle to find the maximum value very quickly we expect the optimizer to invoke the “index full scan (min/max)” operation, visiting only the extreme leaf block of the index – and, indeed, we are not disappointed, that’s exactly what the baseline query shows us:

=====================
Baseline select max()
=====================
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |      1 |        |     3 (100)|      1 |00:00:00.01 |       3 |
|   1 |  SORT AGGREGATE            |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------

However, when we introduce the (as yet unjustified) join to dual something very different happens – the optimizer forgets all about the min/max optimisation and does an index fast full scan of the t1_i1 index, passing all 50,000 rows up to the parent operation.


============================
select max() with dummy join
============================
-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |      1 |        |    50 (100)|      1 |00:00:00.02 |     360 |
|   1 |  SORT AGGREGATE        |       |      1 |      1 |            |      1 |00:00:00.02 |     360 |
|   2 |   NESTED LOOPS         |       |      1 |  50000 |    50   (6)|  50000 |00:00:00.01 |     360 |
|*  3 |    TABLE ACCESS FULL   | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   4 |    INDEX FAST FULL SCAN| T1_I1 |      1 |  50000 |    48   (7)|  50000 |00:00:00.01 |     357 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DUMMY" IS NOT NULL)

We could, of course, try hinting an index range (full) scan to see what happens – and the result is even more surprising: Oracle takes the hint, uses the min/max optimisation, and shows us that it didn’t take that path by default because it had “forgotten” how to cost it correctly.

Note the cost of 354 at operation 5 when the original min/max cost was 3, note also that the optimizer thinks we have to visit all 50,000 index entries even though, at run-time, Oracle correctly uses a path that visits only one index entry:


=============================================
select max() with dummy join and index() hint
=============================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |   356 (100)|      1 |00:00:00.01 |       6 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       6 |
|   2 |   NESTED LOOPS               |       |      1 |  50000 |   356   (2)|      1 |00:00:00.01 |       6 |
|*  3 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   4 |    FIRST ROW                 |       |      1 |  50000 |   354   (2)|      1 |00:00:00.01 |       3 |
|   5 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |  50000 |   354   (2)|      1 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DUMMY" IS NOT NULL)

Of course we could recognise that the t1 access and the access to dual could be de-coupled – and hope that the optimizer doesn’t try to use complex view merging (maybe we should have included a /*+ no_merge */ hint) to fall back to a simple join. Fortunately the optimizer doesn’t try merging the two query blocks, so it optimises the max(object_name) query block correctly, giving us the benefit of the min/max optimisation. I’ve included the ‘alias’ format option in this call to dbms_xplan() so that we can see the two query blocks that are optimised separately.


============================================
select max() with dummy join and inline view
============================================

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       6 |
|   1 |  NESTED LOOPS                |       |      1 |      1 |     5   (0)|      1 |00:00:00.01 |       6 |
|*  2 |   TABLE ACCESS FULL          | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 |
|   3 |   VIEW                       |       |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|   4 |    SORT AGGREGATE            |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|   5 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / DUAL@SEL$1
   3 - SEL$2 / from$_subquery$_001@SEL$1
   4 - SEL$2
   5 - SEL$2 / T1@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DUMMY" IS NOT NULL)

There is a maxim (or guideline, or rule of thumb) that if the from clause of a query includes tables that don’t get referenced in the select list then those tables should (probably) appear in subqueries. Of course this guideline sometimes turns out to be a very bad idea, and sometimes it just means the optimizer unnests the subqueries and recreates the joins we started with, but let’s try the approach with this query. I’ve included the ‘alias’ option again so that you can see that this plan is optimised as two query blocks, allowing the max(object_name) query block to find the min/max strategy.


====================================
select max() with existence subquery
====================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       6 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       6 |
|*  2 |   FILTER                     |       |      1 |        |            |      1 |00:00:00.01 |       6 |
|   3 |    FIRST ROW                 |       |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      1 |      1 |     3   (0)|      1 |00:00:00.01 |       3 |
|*  5 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       3 | 
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   4 - SEL$1 / T1@SEL$1
   5 - SEL$2 / DUAL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   5 - filter("DUMMY" IS NOT NULL)

There’s a very important detail in the execution plan above. At first sight it looks like the optimizer has a plan using a simple filter subquery operation – which means you might be fooled into reading it as “for each row returned by operation 3 call operation 5”. This is not the case.

Because the subquery is not a correlated subquery – it’s an example that I sometimes call a “fixed” or (slightly ambiguously) “constant” subquery – Oracle can execute it once and use the resulting rowsource to decide whether or not to call the main query. It’s a case where (if you didn’t realise the plan consisted of two separate query blocks) you would say that Oracle was calling the second child first.

To prove this point I’ve set up one last variation of the query – the “failed subquery” version – where my select from dual returns no rows. Check the numbers of Starts shown for each line of the plan:


============================================
select max() with failing existence subquery
============================================
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     5 (100)|      1 |00:00:00.01 |       3 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |            |      1 |00:00:00.01 |       3 |
|*  2 |   FILTER                     |       |      1 |        |            |      0 |00:00:00.01 |       3 |
|   3 |    FIRST ROW                 |       |      0 |      1 |     3   (0)|      0 |00:00:00.01 |       0 |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T1_I1 |      0 |      1 |     3   (0)|      0 |00:00:00.01 |       0 |
|*  5 |    TABLE ACCESS FULL         | DUAL  |      1 |      1 |     2   (0)|      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   4 - SEL$1 / T1@SEL$1
   5 - SEL$2 / DUAL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NOT NULL)
   5 - filter("DUMMY" IS NULL)

The filter at operation 3 calls operation 5 – the query against dual – which runs once returning no rows. The min/max scan of t1_i1 at operation 4 doesn’t run. Operation 5 was called before operation 4 was considered.

Finally

This brings us back to the question – why would anyone run a strange query like this.

Perhaps the answer is that it’s just a demonstration of one part of a more complex query and what we’re trying to do is say: “if a certain record exists in a control table then include some information from table X”.

This note tells us that if there’s a possibility of a min/max optimisation to find the data then we should avoid using a join, instead we should use a “fixed subquery” to check the control table, and maybe we’ll also have to write the part of our query that collects (or isn’t required to collect) the interesting bit of data as an inline view.

 

July 10, 2020

Recursive WITH upgrade

Filed under: ANSI Standard,CBO,Execution plans,Oracle,Subquery Factoring,Upgrades — Jonathan Lewis @ 4:19 pm BST Jul 10,2020

There’s a notable change in the way the optimizer does cost and cardinality calculations for recursive subquery factoring that may make some of your execution plans change – with a massive impact on performance – as you upgrade to any version of Oracle from 12.2.0.1 onwards. The problem appeared in a question on the Oracle Developer Community forum a little while ago, with a demonstration script to model the issue.

I’ve copied the script – with a little editing – and reproduced the change in execution plan described by the OP. Here’s my copy of the script, with the insert statements that generate the data (all 1,580 of them) removed.

rem
rem     Script:         recursive_with_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem
rem     Notes:
rem     https://community.oracle.com/thread/4338248
rem
rem     The upgrade to 12.2.0.1 made this query much slower (on 15,000 rows)
rem     Setting OFE to 12.1.0.1 is a first possible fix for the issue.
rem     The scale is too small to see much difference in this case
rem

drop table test_folder purge;

create table test_folder(
        fldr_key                number(16,0)            not null        enable,                 
        fldr_id                 varchar2(255 byte)      not null        enable,                 
        fldr_desc_tx            varchar2(255 byte),                     
        par_fldr_key            number(16,0),                   
        seus_key                number(16,0)            not null        enable,                 
        fldr_private_flg        varchar2(1 byte)        not null        enable,                 
        last_updt_dt            date                    not null        enable,                 
        last_upby_seus_key      number(16,0)            not null        enable,                 
        lock_seq_nbr            number(9,0) default 0   not null        enable,                 
        content_guid            raw(16),                
        constraint test_folder_pk primary key (fldr_key)                
)       
;              

-- list of insert statements

alter table test_folder add constraint test_folder_fk  
        foreign key (par_fldr_key) references test_folder(fldr_key)
;  
  
create or replace force editionable view test_folder_vw (fldr_key) as   
with rec_path(fldr_key)  as (
        select  tf.fldr_key  
        from    test_folder tf where tf.par_fldr_key is null  
        union all  
        select  tf.fldr_key  
        from    test_folder tf, rec_path  
        where   rec_path.fldr_key = tf.par_fldr_key
)  
select fldr_key  
from rec_path   
; 

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'TEST_FOLDER',
                method_opt  => 'for all columns size 1'
        );
end;
/


select * from test_folder_vw where fldr_key = -41;  

I’ve run the test 3 times. First in 12.2.0.1 with no tweaking; then in 12.2.0.1 with the hint /*+ optimizer_features_enable(‘12.1.0.2’) */ and finally in a genuine 12.1.0.2 environment. In all three cases I enabled rowsource execution stats (‘alter session set statistics_level = all’) and pulled the plans from memory – with the following results

First, the base result from 12.1.0.2

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.03 |     604 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.03 |     604 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.03 |     604 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      68 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.02 |     536 |  1696K|  1696K| 1488K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|*  6 |     TABLE ACCESS FULL                     | TEST_FOLDER |      8 |   1419 |  11352 |00:00:00.01 |     536 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

Two points to note, in particular. First that the hash join has the recursive with pump as its first (build table) child and the table access full of test_folder as its second child (probe table); secondly that there is no value given for E-Rows for the recursive with pump.

Now the 12.2.0.1 plan:

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |   2524K|      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |   2524K|   1419 |00:00:00.01 |      23 |  1743K|  1743K| 1632K (0)|
|   5 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  6 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
|   7 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41)
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   6 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

There are three changes to notice in this plan – which (for the OP) was much slower than the 12.1.0.2 plan. First, the order of the hash join has changed, the recursive with pump is now the second child (probe table) in the join (and again shows no value for E-Rows); secondly that Oracle has introduced an extra operation – the buffer sort (reuse) – populated by the table access full – as the build table; thirdly (presumably the point of buffer sort (reuse) operation) the number of buffer visits has dropped from a few hundred to a couple of dozen.

Finally let’s check what happens if we set the OFE (optimizer_features_enable) to 12.1.0.2 while running 12.2.0.1

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |             |      1 |        |      1 |00:00:00.01 |      47 |       |       |          |
|*  1 |  VIEW                                     |             |      1 |    801 |      1 |00:00:00.01 |      47 |       |       |          |
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |      1 |        |   1580 |00:00:00.01 |      47 | 36864 | 36864 |  102K (0)|
|*  3 |    TABLE ACCESS FULL                      | TEST_FOLDER |      1 |    161 |    161 |00:00:00.01 |      24 |       |       |          |
|*  4 |    HASH JOIN                              |             |      8 |    640 |   1419 |00:00:00.01 |      23 |  1797K|  1797K| 1573K (0)|
|   5 |     RECURSIVE WITH PUMP                   |             |      8 |        |   1580 |00:00:00.01 |       0 |       |       |          |
|   6 |     BUFFER SORT (REUSE)                   |             |      8 |        |  11352 |00:00:00.01 |      23 | 73728 | 73728 |          |
|*  7 |      TABLE ACCESS FULL                    | TEST_FOLDER |      1 |   1419 |   1419 |00:00:00.01 |      23 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLDR_KEY"=(-41))
   3 - filter("TF"."PAR_FLDR_KEY" IS NULL)
   4 - access("REC_PATH"."FLDR_KEY"="TF"."PAR_FLDR_KEY")
   7 - filter("TF"."PAR_FLDR_KEY" IS NOT NULL)

In these conditions the recursive with pump has gone back to being the build table (first child); but it’s worth noting that the 12.2 buffer sort (reuse) is still in place – saving us a few hundred buffer gets (and, for a bigger table, a number of disc reads possibly). Downgrading the optimizer_features_enable has given us the plan we needed, but this we’ve got an example that shows that hacking the parameter isn’t a guarantee that we will get exactly the path we used to get in the older version.

The story so far.

It seems that we can address the performance problem that the OP had by setting the optimzer_feature_enable to the older version – possibly through a hint embedded in the SQL, perhaps through an SQL Baseline or SQL Patch. Maybe we’ll have to have a logon trigger that sets the parameter for particular users or, worst case scenario, maybe we’ll have to set the parameter at the system level. Given how undesirable the last option could be it would be nice to know exactly what is causing the change in plan.

As a basic clue – if the order of tables in a hash join reverses itself this usually means that the byte (not row) estimates have changed. The optimizer will use the table with the lower byte count as the build table in a hash join. So the recursive with pump – whose row and byte estimates don’t appear – must have produced larger numbers in 12.2.0.1.

A change in the 12.2 plan that I haven’t yet mentioned is the E-rows for the hash join; it’s gone up from 640 (12.1.0.2) to 2.5 million! So let’s repeat the tests with the CBO (10053) trace file enabled and see if we can find a number like 2524K appearing as a join estimate in the trace file. Having created the two trace files (in 12.2.0.1, one with the OFE set backwards) I executed the following grep command against the trace files:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc

I’d set the tracefile_identifier to ‘ofe’ and ‘base’ respectively for the 12.1.0.2 and 12.2.0.1 tests, and here are the results:

grep -n "^Join Card - Rounded" orcl12c_ora_5524_ofe.trc
1166:Join Card - Rounded: 640 Computed: 639.941176

grep -n "^Join Card - Rounded" orcl12c_ora_5524_base.trc
1195:Join Card - Rounded: 640 Computed: 639.941176
1391:Join Card - Rounded: 2544 Computed: 2543.865546
1576:Join Card - Rounded: 10112 Computed: 10111.865546
1737:Join Card - Rounded: 40193 Computed: 40193.075630
1898:Join Card - Rounded: 159759 Computed: 159758.731092
2059:Join Card - Rounded: 635008 Computed: 635008.462185
2220:Join Card - Rounded: 2524023 Computed: 2524023.394958
2269:Join Card - Rounded: 2524023 Computed: 2524023.394958

That’s an interesting clue. Something seems to be calculating a larger and larger value in the 12.2.0.1 trace, starting with the hash join cardinality that appeared in 12.1.0.2 had, growing by a factor of nearly 4 each time, and ending with the hash join cardinality we saw in the 12.2.0.1 plan.

Taking a closer look at the content of the 12.2.0.1 trace file it turned out that every stage in that escalation was Oracle recalculating the cost and cardinality of joining test_folder (the table) and rec_path (the “with” subquery) using the figures from the previous join calculation as the new base figures for rec_path. In effect the optimizer was calculating the cost of following the recursive subquery down to its 7th level of recursion.

Side note: in agreement with my comment about the smaller (in bytes) rowsource being used as the build table, the initial join order started as (test_folder, rec_path) in the first two iterations, but switched to (rec_path, test_folder) from the 3rd iteration onwards.

So we’ve identified the mechanics that cause the change in plan; the question now is: why 7 iterations to the final cost? (Briefly I did a quick check to see how many circles of hell there were in Dante’s Inferno – but it’s 9 (or 10 depending how you count). A quick check of v$parameter (and the x$ tables for the hidden parameters) revealed the following:

Name                                     Value
------------------------------------ ---------
_recursive_with_branch_iterations            7

Setting this parameter to 1 in the session, or adding the hint /*+ opt_param(‘_recursive_with_branch_iterations’ 1) */ to the query resulted in the 12.1.0.2 plan appearing in 12.2.0.1 – and this is a much less intrusive way of getting the plan we’re interested in than setting the entire OFE back to 12.1.0.2. One might even set the parameter in the spfile (after receiving approval from Oracle Corp., of course) given how precisely targetted it is (and know that it doesn’t switch off that nice little buffering trick.)

Summary

From 12.2 onwards the optimizer does recursive recosting of recursive “with” subqueries. This means the cost and cardinality estimates of a plan may change and the impact may cause a significant change in performance – it certainly did for the OP.

The change seems to be driven by the hidden parameter _recursive_with_branch_iterations, which was introduced in 12.2.0.0 with a default value of 7. Setting this parameter to 1 reverts the optimizer to the pre-12.2 behaviour. If you run into a problem of recursive “with” subqueries changing plans and causing performance problems on an upgrade from pre-12.2 to a newer version of Oracle then it would be worth investigating this parameter as the least intrusive way of reverting back to the original plans.

Update (July 2020)

I’ve just been prompted to check MOS for any references to the hidden parameter – and discovered a note that was published in September 2018, updated ub Feb 2019.  It’s amazing how easy it can be to find an answer on MOS when you already know what the answer is ;) Document id 2443466.1 Oracle 12.2.0.1 CBO calculating high cost/CPU for queries with recursive sub-query (Doc ID 2443466.1)

This gives two workarounds to the problem of a change in cost in 12.2 – set the optimizer_features_enable to 12.1.0.2, or set the hidden parameter to 1. It references two bugs (one a duplicate of the other, both apparently unpublished):

  • Bug 23515289 : PERFORMANCE REGRESSION OBSERVED WITH RECURSIVE WITH SERIAL PLAN
  • Bug 24566985 : UPG: QUERY PERFORMANCE ON ALL_TSTZ_TABLES 160 TIMES SLOWER THAN 11.2.0.4

and the Permanent Fix for the problem is to install the patch for Bug 24566985 on 12.2.0.1

 

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

June 29, 2020

Most Recent – 2

Filed under: CBO,Execution plans,Oracle,Tuning — Jonathan Lewis @ 1:02 pm BST Jun 29,2020

A question arrived in my email a few days ago with the following observations on a statement that was supposed to query the data dictionary for some information about a specified composite partitioned table. The query was wrapped in a little PL/SQL, similar to the following:

declare
        v_src_part      varchar2(30) := null;
        v_tab           varchar2(30)  := 'PT_COMPOSITE_1';
begin

        select
                /*+ qb_name(main) */
                uts1.subpartition_name
        into    v_src_part
        from
                user_tab_subpartitions uts1
        where
                uts1.table_name = v_tab
        and     uts1.last_analyzed is not null
        and     uts1.num_rows = (
                        select
                                /*+ qb_name(max_subq) */
                                max (uts2.num_rows)
                        from
                                user_tab_subpartitions uts2
                        where
                                uts2.table_name = /* v_tab */ uts1.table_name
                )
        and     rownum = 1
        ;

The requirement is simple: identify the subpartitions of a specific table that have the largest number of rows of any subpartition of the table – but report only the first match.

You’ll notice that the where clause of the subquery has a commented “v_tab” in it. This is the PL/SQL variable used in the outer query block to identify the target table, and it shouldn’t really make any difference if I use the PL/SQL variable in the subquery rather than using a correlating column. However, the question that came with this block of code was was follows:

All the partitions and subpartitions had their stats when running the test. On a first run using the correlated subquery the block reported oracle error ORA-01403: no data found. Changing the code to use the PL/SQL variable the block reported a specific subpartition as expected. A few hours later (after changing the code back to use the correlated subquery) the block reported the same subpartition. Have you ever seen anything like this? The Oracle version is 12.1.0.2.

Rule 1, of course, is to be a little sceptical when someone says “Honest, Guv, the stats are all okay”. But I’m going to assume that the statistcs on this table really were complete and that there was no “data-related” reason for this query to behave in such a surprising way.

The email is an invitation to consider two points.

  1. This looks like a bug: the two versions of the query are logically equivalent, they should return the same results if the underlying data had not changed. (In fact, I think the only “legal” way that the query could return ORA-01403 is if there were no stats on any subpartitions of the table in question – any ordinary usage of the dbms_stats package other than delete_table_stats() would have ensured that the query had to find something.) So, the first run of the correlated subquery produced no data while the modified query did get a result. That suggests a problem with some transformation in the 12.1.0.2 code to handle correlated aggregate subqueries.
  2. How could the second execution of the version with the correlated subquery produce a result a few hours later. Here are a couple of possibilities:
    • Someone had gathered dictionary stats (i.e. on the tables used by the query, not on the subpartitioned table) in the “few hours” gap so the optimizer picked a different execution plan which bypassed the bug.
    • (minor variation on previous) Someone had gather dictionary stats when the first execution plan was already in memory but the “auto_invalidate” option for cursor invalidation meant that the query didn’t get re-optimised for a few hours.
    • Nothing changed, but the query had been flushed from the library cache and did need re-optimisation a few hours later. Since the version is 12.1.0.2 this means statistics feedback or automatic SQL directives could have had an impact – which means there may be dynamic sampling during optimisation – and a different set of random samples could have resulted in a different execution plan.
    • Other …

The interesting bit

There is a generic feature about this question that is more interesting than the “what went wrong, how could I get different results”, and it’s in the choice you can make between using a correlation column and repeating a pl/sql variable (or literal value ).

The switch to using a pl/sql variable turns the subquery into a single-row, “standalone”, subquery – one that could be run without any reference to the outer query – and this imposes a dramatic change on what the optimizer can doSometimes that change will make a huge difference to the optimisation time and the run time.

As a correlated subquery the notional “first strategy” for the optimizer is:

“for each row in the outer query execute the inner query as a filter subquery passing in the correlation value

If you take the “standalone” approach the optimizer will be looking for a plan that says (in effect):

“run the subquery once to generate a constant that you will need to execute the rest of the query”

Running the subquery once rather than once per row is likely to be a good idea – on the other hand Oracle can do “scalar subquery caching” so if the value of the correlation column is always the same the correlated subquery will actually run only once anyway.

More importantly, when the optimizer sees a correlated subquery it will consider unnesting it and then transforming it in various other ways; and it might take the optimizer a long time to work out what it can and can’t do, and the plan it finally does produce may be much slower than what it could have done if it had not unnested the subquery.

Some test results

So I ran 3 variations of the PL/SQL block on Oracle 19.3.0.0 with the CBO trace (10053) enabled and picked out a few highlights. The three tests in order were:

  1. Use the pl/sql variable so the subquery could run as a standalone query
  2. Use the correlating column to make the subquery a correlated subquery
  3. Use the correlating column, but add the hint /*+ no_unnest */ to the subquery.

The results were as follows – first the timing, then a critical measure that explains the timing:

  • Case 1 – standalone subquery – total time 0.82 seconds
  • Case 2 – correlated subquery – total time 5.76 seconds
  • Case 3 – correlated subquery with no_unnest hint – total time 0.84 seconds

Where did all that extra time go – a lot of it went in optimisation. How many “Join Orders” were examined for each query

  • Case 1 – standalone subquery – 90 join orders
  • Case 2 – correlated subquery – 863 join orders
  • Case 3 – correlated subquery with no_unnest hint – 90 join orders

If you’re wondering what the 773 extra join orders were about here’s a clue. I extracted all the lines from the case 2 trace file that started with “SU:” – those are the lines tagged for “Subquery Unnest” – using a call to grep -n “^SU:” {tracefile name} and this is the result:


  2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
  2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
  2948:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
  2949:SU:   Passed validity checks, but requires costing.
  2950:SU: Using search type: exhaustive
  2951:SU: Starting iteration 1, state space = (2) : (1)
  2952:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
  3089:SU: Costing transformed query.
 66112:SU: Considering interleaved complex view merging
 66113:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
 66366:SU: Costing transformed query.
129372:SU: Finished interleaved complex view merging
129373:SU: Considering interleaved distinct placement
129374:SU: Finished interleaved distinct placement
129375:SU: Considering interleaved join pred push down
129376:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
251638:SU: Rejected interleaved query.
251640:SU: Finished interleaved join pred push down
251641:SU: Considering interleaved OR Expansion
251642:SU:   Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2)
251651:SU: Finished interleaved OR Expansion
251653:SU: Updated best state, Cost = 19.085153
251654:SU: Starting iteration 2, state space = (2) : (0)
251665:SU: Costing transformed query.
310395:SU: Not update best state, Cost = 20.083998
310396:SU: Will unnest subquery SEL$4F5F2F29 (#2)

The optimizer checks the validity of unnesting (generated) query block SEL$4F5F2F29 at line 2948 of the trace and decides, 308,000 lines later after an exhaustive examination of the possibilities, that it will unnest the subquery. Since this is a recent version of Oracle we take one simple extra step by checking for “TIMER” information, again using a “grep -n” call –

251639:TIMER:  SU: Interleaved JPPD SEL$B73B51DC cpu: 1.263 sec elapsed: 1.263 sec
251652:TIMER: SU: iteration (#1) SEL$B73B51DC cpu: 2.607 sec elapsed: 2.607 sec
310577:TIMER: CBQT SU and CVM SEL$071BB01A cpu: 3.323 sec elapsed: 3.323 sec
433371:TIMER: Cost-Based Join Predicate Push-Down SEL$12B6FE6C cpu: 1.307 sec elapsed: 1.306 sec
433477:TIMER: Cost-Based Transformations (Overall) SEL$12B6FE6C cpu: 4.731 sec elapsed: 4.731 sec
496189:TIMER: SQL Optimization (Overall) SEL$12B6FE6C cpu: 5.306 sec elapsed: 5.306 sec

Of course most of the time spent in this particular example was a result of optimising (and writing the optimizer trace), but for my tiny example (table definition below) the final figures I’ll show are the buffer gets and CPU time reported by a basic 10046 trace file after optimisation with all the relevant data was cached:

  • Case 1 – standalone subquery – 89 buffer gets / 0.00 seconds
  • Case 2 – correlated subquery – 130 buffer gets / 0.53 seconds
  • Case 3 – correlated subquery with no_unnest hint – 121 buffer gets / 0.08 CPU seconds

The sub-centisecond time is a little suspect, of course, but the others seem fairly trustworthy.

Conclusion

The title of this piece is “Most Recent” because the commonest requirement for a query of this shape is find the most recent row matching the following predicates”, even though in this case the interpretation is “find me the row matching the largest value”.

The “standard” pattern for writing a “most recent” query is to use a correlated subquery – but it’s worth remembering that you may reduce optimisation time and run time by “copying down the constant” rather than using the correlation mechanism.

(There are alternative strategies to the subquery approach, of course, and the analytic max() – introduced in Oracle 8i – is gaining traction as one of the popular alternatives.)

Footnote 1

If you want to re-run my test on different platforms and versions of Oracle, here’s the code to generate the table.  (Don’t be surprised if you don’t get completely consistent results – much of the optimization will depend on the size of all the relevant tables (tab$, tabcompart$, etc.) in the data dictionary, rather than on the actual definition of this partitioned table.


em
rem     Script:         most_recent_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table pt_composite_1 (
        id,
        grp,
        small_vc,
        padding
)
nologging
partition by range(id) 
subpartition by hash (grp)
subpartitions 4
(
        partition p2 values less than (400),
        partition p3 values less than (800),
        partition p4 values less than (1600),
        partition p5 values less than (3200)
)
as
select
        rownum                          id,
        trunc(rownum/50)                grp,
        to_char(trunc(rownum/20))       small_vc,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 3000 -- > comment to avoid wordpress format issue
;

execute dbms_stats.gather_table_stats(user,'pt_composite_1',granularity=>'ALL')


Footnote 2

For reference, here are the outputs I got from executing egrep -n -e”^SU:” -e”TIMER” against the other two CBO trace files.

First for the “standalone” form – note how line 3130 tells us that “there is no correlation”.


806:SU: Considering subquery unnesting in query block MISC$1 (#0)
2947:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
2952:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
2953:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
2954:SU:     SU bypassed: No correlation to immediate outer subquery.
2955:SU:     SU bypassed: Failed basic validity checks.
2956:SU:   Validity checks failed.
3130:SU:     SU bypassed: No correlation to immediate outer subquery.

Then for the correlated subquery with /*+ no_unnest */ hint; and line 3122 tells us that SU was bypassed because of a hint/parameter:


809:SU: Considering subquery unnesting in query block MISC$1 (#0)
2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest.
2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1).
2948:SU:   Checking validity of unnesting subquery SEL$4F5F2F29 (#2)
2949:SU:     SU bypassed: Not enabled by hint/parameter.
2950:SU:     SU bypassed: Failed basic validity checks.
2951:SU:   Validity checks failed.
3122:SU:     SU bypassed: Not enabled by hint/parameter.

Neither file showed any “TIMER” information since that appears, by default, only for steps that take longer than one second. (If you want to adjust the granularity, see Franck Pachot’s note on parse time that describes bug/fix_control 16923858.

June 5, 2020

Analytic cost error

Filed under: CBO,Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:06 pm BST Jun 5,2020

Here’s a surprising costing error that was raised on the Oracle Developer Forum a few days ago. There’s a glitch in the cost atributed to sorting when an analytic over() clause – with corresponding “window sort” operation – makes a “sort order by” operation redundant. Here’s a script to generate the data set I’ll use for a demonstration with a template for a few queries I’ll be running against the data.


rem
rem     Script:         window_sort_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                    id,
        cast(lpad(rownum,30,'0') as varchar2(30)) vc30,
        cast(lpad(rownum,65,'0') as varchar2(65)) vc65,
        lpad('x',100,'x')                         padding
from
        generator
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

select
--      row_number() over (order by vc65) rn1,
--      row_number() over (order by vc30) rn2,
--      vc30,
--      vc65,
        id
from
        t1
-- order by
--      vc65
--      vc30
/


I’m (optionally) using the row_number() analytic function over the entire data set and for each row_number() I include in the select list Oracle will have to sort the data; I’ve also got an (optional) order by on the two columns that appear in the row_number() functions and that may introduce some sorting as well. Here, for example, are a few examples of the queries I might run:


prompt  ===========================================
prompt  Select vc30, order by vc30
prompt  ===========================================

explain plan for
select 
        vc30,
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ==========================================
prompt  Select row_number over vc30 - no ordering
prompt  ==========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ===========================================
prompt  Select row_number over vc30 - order by vc65
prompt  ===========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

In the first query we select and sort vc30 so (approximately) we sort 10,000 rows x 30(-ish) bytes for 300K of sort space. In the second query we generate the row_number() based on sorting vc30 – the size of the output is much smaller (it’s only 10,000 numbers between 1 and 10,000) but to generate those numbers we do have to select and sort vc30, so the workload (predicted and actual) will probably be similar to that of the firsrt query. In the final query we have to select and sort vc30 to generate the row_number() but we also have to select (without reporting) and sort vc65 in order to report the results in the right order – so we should expect the workload to be roughly 3 times the size (approximately 10,000 * (30 + 65) bytes). Here, from 12.2.0.1, are the execution plans (with a little cosmetic tidying):

===========================================
Select vc30, order by vc30
===========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  SORT ORDER BY     |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

==========================================
Select row_number over vc30 - no ordering
==========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

===========================================
Select row_number over vc30 - order by vc65
===========================================
-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  SORT ORDER BY      |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

As expected, the execution plans are roughly consistent with the estimates I gave for volume of data – the agreement between the query with order by vc30 and the query with over(order by vc30), and the increased load of ordering by vc65 when selecting the row_number(over vc30) is good.

So let’s see what the plan looks like when we select row_number(over vc30) and then order by vc30. If the optimizer is smart it will recognise that it’s possible to adopt a strategy that allows it to take advantage of the sorting from the over() clause to avoid a separate sort order by:


explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|    42   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|    42   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|    42   (5)|
----------------------------------------------------------------

The plan shows us that Oracle has used the 10gR2 “order by elimination” feature to bypass the need for a “sort order by” operation because it knows the data will be arriving in the right order from the “Window Sort” operation. Ynfortunately it also shows us that Oracle has lost the cost of doing the Window Sort!

Let’s try another experiment – let’s generate two different row_number() columns, with and without ordering:


prompt  =====================================
prompt  Select Both row_numbers - no ordering
prompt  =====================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ======================================
prompt  Select Both row_numbers order by vc30
prompt  ======================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ============================================
prompt  Select Both row_numbers order by vc65
prompt  ============================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

You’ll notice that I’ve included a request for the projection information in the plans for these examples so that you can see what columns are passed up from each operation to its parent. Again, though, we’ll start by focusing on just the costs:


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC65"[VARCHAR2,65], "VC30"[VARCHAR2,30],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC30")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC65")[22]
   2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22],
       "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The first query – without an order by” clause – reports a cost of 463; add an order by clause and the cost drops to 253 (and the “order by” clause doesn’t appear as a sort order by operation in the plan). The cost differential between the ordered and “unordered” plans , by the way, is 210 (and from there down to the base tablescan is another 211) – and here’s another way to see that number (+/- 1) appearing:


explain plan for
select
        vc65,
        id
from
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   986K|       |   253   (3)|
|   1 |  SORT ORDER BY     |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
------------------------------------------------------------------------

The cost of the query with no “order by” clause is basically the cost of a table scan plus two sorts of (vc65, vc30, plus a few bits). When you add in an “order by” clause the optimizer discards the “order by” clause and then subtracts one of the sort costs as well.

CBO trace file

Every time I say something about 10053 (CBO) trace files I feel compelled to remind everyone that I rarely look at them, and then it’s usually because I think I know there’s a bug and where I’ll find it in the trace. That’s exactly the case here.

I’m expecting to see two differences in the trace files between the “no order” query, and a query where I’ve added in an “order by” clause. One difference is that one trace file will have an “OBYE” (eliminate order by) comment which won’t be in the other trace, one trace file will have a “cost for SORT” calculation which won’t be in the other.

So here are the relevant bits – first from the query without the order by clause:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE bypassed: no order by to eliminate.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 463.384707  card: 10000.000000  bytes: 1010000.000000
***********************

And from one of the plans with an order by:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE performed.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 252.512458  card: 10000.000000  bytes: 1010000.000000
***********************

As you can see, the first (no order) trace file has two sort calculations under WiF sort, (Window Function?) while the second (order by) trace file reports “OBYE performed” and then loses one of its WiF sorts.

Note: If I had ordered by ID I would have seen two calculations of cost under the heading of WiF sort and a third calculation below that with the heading ORDER BY sort. Unfortunately when I ran the test to check this I also found that the OBYE report said: “OBYE performed” when it wasn’t relevant and there wasn’t an “order by” available for elimination.

Interestingly I tested to see if I could change the optimizer’s behaviour by adding the no_eliminate_oby(@sel$1) hint to the “order by” queries but the GENERAL PLANS section didn’t change, even though the trace file report: “OBYE: OBYE bypassed: hinted”, and the “Hint Report” from the Oracle 19i execution plan acknowledge the hint as legal and used.

Summary

If you have an “order by” clause in a query block that includes analytic functions and the optimizer decides that it can eliminate the “order by” and rely on the side effect of an analytic over() clause you may find that the cost of the query block is reduced by the cost of one of the Window Sort operations. (Technically this might lead to cases where the optimizer then made some poor choices in overall shape of the execution plan – though such cases might be very rare given that this costing error doesn’t affect the cardinality estimates.)

Lagniappe

In the last set of tests I added in the projection information as a simple example of a case where it can help you understand a little more of what the plan is supposed to achieve.  If you examine the last two query plans carefully (select both row_number() values and order by vc30 / vc65 respectively) Operation 2 of the first plan reports:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

while operation 2 of the second plan reports:

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

It’s not until you look at these two lines that the plans show any differences – operations 1 and 2 simply say “Window Sort” without giving any clue about which window sort is for which over() clause. The projection information, though, tells you which way around the over() clauses operated – in the first query the over(order by vc65) is applied to the result of the tablescan first, while in the second query it’s the over(order by vc30) that is applied first.

Lagniappe 2

There’s another little oddity you might spot when you look at the projection information and think about the WiF sort costs from the unordered query. The columns passed from operation 3 to operation 2 are:

 3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The columns passed from operation 2 to operation 1 are one of:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

Operation 2 sorts the data from operation 3, and operation 1 sorts the data from operation 2 – but the columns arriving from operation 2 have an extra row_number()[22] added to them. So when you look in the trace file at the two Wif Sort calculations why do they both show:

Blocks to Sort: 150 Row size: 122 Total Rows: 10000

Shouldn’t one of them show a Row size that’s (at least) 22 longer than the other ?

 

 

 

 

 

 

May 12, 2020

from$_subquery$_NNN

Filed under: Execution plans,Oracle — Jonathan Lewis @ 4:26 pm BST May 12,2020

This is a reference note for a question that came up as a comment on a lengthy note I wrote about reading execution plans.

How do you interpret something like: from$_subquery$_001@SEL$1 in the Query Block Name / Object Alias section of an execution plan.

The simple answer is that if you’ve got an inline view in the FROM clause of a query and you haven’t given the inline view an alias the optimizer will have to invent one – and this is what they look like.

As a quick demo here’s a script to create a couple of tables and then run a query that joins two inline views (using “ANSI”-style SQL), with variations on which of the inline views are named:


rem
rem     Script:         from_subquery.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
select  * 
from    all_objects
where   rownum <= 100
;

create table t2
as
select  *
from    all_objects
where   rownum <= 100
;

set serveroutput off

prompt  =========================
prompt  Neither inline view named
prompt  =========================

select 
        count(*)
from    (select /*+ no_merge */ * from t1)
join 
        (select /*+ no_merge */ * from t2)
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  ============================
prompt  Only first inline view named
prompt  ============================

select 
        count(*)
from    (select /*+ no_merge */ * from t1) v1
join 
        (select /*+ no_merge */ * from t2)
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  =============================
prompt  Only second inline view named
prompt  =============================

select 
        count(*)
from    (select /*+ no_merge */ * from t1)
join 
        (select /*+ no_merge */ * from t2) v2
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  =======================
prompt  Both inline views named
prompt  =======================

select 
        count(*)
from    (select /*+ no_merge */ * from t1) v1
join 
        (select /*+ no_merge */ * from t2) v2
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

In all four examples I’ve added the /*+ no_merge */ hint to the inline views; if I hadn’t done that the optimizer would simply have reduced each query to a single query block joining two tables. As it is here are the resulting execution plans (with a little cosmetic editing) reporting the plan hash value, plan, outline and predicates:


=========================
Neither inline view named
=========================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / from$_subquery$_001@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / from$_subquery$_003@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("from$_subquery$_001"."OBJECT_ID"="from$_subquery$_003"."O
              BJECT_ID")


============================
Only first inline view named
============================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / V1@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / from$_subquery$_003@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."OBJECT_ID"="from$_subquery$_003"."OBJECT_ID")


=============================
Only second inline view named
=============================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / from$_subquery$_001@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / V2@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("from$_subquery$_001"."OBJECT_ID"="V2"."OBJECT_ID")


=======================
Both inline views named
=======================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / V1@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / V2@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."OBJECT_ID"="V2"."OBJECT_ID")

As you scan down the plans you can see that they are all the same – with only a change in naming where V1 is synonymous with from$_subquery$_001 and v2 is synonymous with from$_subquery$_003.

Don’t ask me to explain how the optimizer chooses the names – I had thought I might see a from$_subquery$_002 somewhere in the optimizer trace file, but apart from the two aliases I’ve shown here the only other alias I got was one occurrence of from$_subquery$_005@sel$4.

 

Next Page »

Powered by WordPress.com.