Oracle Scratchpad

October 4, 2019

opt_estimate catalogue

Filed under: CBO,Hints,Oracle — Jonathan Lewis @ 10:10 am BST Oct 4,2019

This is just a list of the notes I’ve written about the opt_estimate() hint.

  • opt_estimate – using the hint to affect index calculations: index_scan and index_filter
  • opt_estimate 2 – applying the hint to nested loop joins, options: nlj_index_scan and nlj_index_filter
  • opt_estimate 3 – a couple of little-known options for the hint, “group_by” and “having”.
  • opt_estimate 4 – applying the hint at the query block level: particularly useful for CTEs (“with subquery”) and non-mergeable views.
  • opt_estimate 5 – a story of failure: trying to use opt_estimate to push predicates into a union all view.

I have a couple more drafts on the topic awaiting completion, but if you know of any other articles that would be a good addition to the list feel free to reference them in the comments.

 

August 26, 2019

Troubleshooting

Filed under: CBO,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:19 pm BST Aug 26,2019

A recent thread on the Oracle Developer Community starts with the statement that a query is taking a very long time (with the question “how do I make it go faster?” implied rather than asked). It’s 12.1.0.2 (not that that’s particularly relevant to this blog note), and we have been given a number that quantifies “very long time” (again not particularly relevant to this blog note – but worth mentioning because your “slow” might be my “wow! that was fast” and far too many people use qualitative adjectives when the important detail is quantative). The query had already been running for 15 hours – and here it is:


SELECT 
        OWNER, TABLE_NAME 
FROM
        DBA_LOGSTDBY_NOT_UNIQUE 
WHERE
        (OWNER, TABLE_NAME) NOT IN (
                SELECT 
                        DISTINCT OWNER, TABLE_NAME 
                        FROM     DBA_LOGSTDBY_UNSUPPORTED
        ) 
AND     BAD_COLUMN = 'Y'

There are many obvious suggestions anyone could make for things to do to investigate the problem – start with the execution plan, check whether the object statistics are reasonably representative, run a trace with wait state tracing enabled to see where the time goes; but sometimes that are a couple of very simple observation you can make that point you to simple solutions.

Looking at this query we can recognise that it’s (almost certainly) about a couple of Oracle data dictionary views (which means it’s probably very messy under the covers with a horrendous execution plan) and, as I’ve commented from time to time in the past, Oracle Corp. developers create views for their own purposes so you should take great care when you re-purpose them. This query also has the very convenient feature that it looks like two simpler queries stitched together – so a very simple step in trouble-shooting, before going into any fine detail, is to unstitch the query and run the two parts separately to see how much data they return and how long they take to complete:


SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'

SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED

It’s quite possble that the worst case scenario for the total run time of the original query could be reduced to the sum of the run time of these two queries. One strategy to achieve this would be a rewrite of the form:

select  * 
from    (
        SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'
        minus
        SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED
)

Unfortunately the immediately obvious alternative may be illegal thanks to things like duplicates (which disappear in MINUS operations) or NULLs (which can make ALL the data “disappear” in some cases). In this case the original query might be capable of returning duplicates of (owner, table_name) from dba_lgstdby_not_unique which would collapse to a single ocurrence each in my rewrite – so my version of the query is not logically equivalent (unless the definition of the view enforces uniqueness); on the other hand tracking, back through the original thread to the MoS article where this query comes from, we can see that even if the query could return duplicates we don’t actually need to see them.

And this is the point of the blog note – it’s a general principle (that happens to be a very obvious strategy in this case): if a query takes too long, how does it compare with a simplified version of the query that might be a couple of steps short of the final target. If it’s easy to spot the options for simplification, and if the simplified version operates efficiently, them isolate it (using a no_merge hint if necessary), and work forwards from there. Just be careful that your rewrite remains logically equivalent to the original (if it really needs to).

In the case of this query, the two parts took 5 seconds and 9 seconds to complete, returning 209 rows and 815 rows respectively. Combining the two queries with a minus really should get the required result in no more than 14 seconds.

Footnote

The “distinct” in the second query is technically redundant as the minus operation applies a sort unique operation to both the two intermediate result sets before comparing them.  Similarly the  “distinct” was also redundant when the second query was used for the “in subquery” construction – again there would be an implied uniqueness operation if the optimizer decided to do a simple unnest of the subquery.

 

 

 

 

August 23, 2019

Optimizer Tricks 1

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 12:39 pm BST Aug 23,2019

I’ve got a number of examples of clever little tricks the optimizer can do to transform your SQL before starting in on the arithmetic of optimisation. I was prompted to publish this one by a recent thread on ODC. It’s worth taking note of these tricks when you spot one as a background knowledge of what’s possible makes it much easier to interpret and trouble-shoot from execution plans. I’ve labelled this one “#1” since I may publish a few more examples in the future, and then I’ll have to catalogue them – but I’m not making any promises about that.

Here’s a table definition, and a query that’s hinted to use an index on that table.


rem
rem     Script:         optimizer_tricks_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem

create table t1 (
        v1      varchar2(10),
        v2      varchar2(10),
        v3      varchar2(10),
        padding varchar2(100)
);

create index t1_i1 on t1(v1, v2, v3);


explain plan for
select
        /*+ index(t1 (v1, v2, v3)) */
        padding 
from 
        t1
where
        v1 = 'ABC'
and     nvl(v3,'ORA$BASE') = 'SET2'
;

select * from table(dbms_xplan.display);

The query uses the first and third columns of the index, but wraps the 3rd column in an nvl() function. Because of the hint the optimizer will generate a plan with an index range scan, but the question is – what will the Predicate Information tell us about Oracle’s use of my two predicates:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC')
       filter(NVL("V3",'ORA$BASE')='SET2')

The nvl() test is used during the index range scan (from memory I think much older versions of Oracle would have postponed the predicate test until they had accessed the table itself). This means Oracle will do a range scan over the whole section of the index where v1 = ‘ABC’, testing every index entry it finds against the nvl() predicate.

But what happens if we modify column v3 to be NOT NULL? (“alter table t1 modify v3 not null;”) Here’s the new plan:


Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    66 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    66 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"='ABC' AND "V3"='SET2')
       filter("V3"='SET2')


The optimizer will decide that with the NOT NULL status of the column the nvl() function can be eliminated and the predicate can be replaced with a simple column comparison. At this point the v3 predicate can now be used to reduce the number of index entries that need to be examined by using a type of skip-scan/iterator approach, but Oracle still has to test the predciate against the index entries it walks through – so the predicate still appears as a filter predicate as well.

You might notice, by the way, that the Plan hash value does not change as the predicate use changes – even though the change in use of predicates could make a huge difference to the performance. (As indicated in the comments at the top of the script, I’ve run this model against 11.2.0.4 – which is the version used in the ODC thread – and 19.3.0.0: the behaviour is the same in both versions, and the Plan hash value doesn’t change from version to version.)

Footnote

The reason why I decided to publish this note is that the original thread on the ODC forums reported the Following contradictory details – an index definition and the optimizer’s use of that index as shown in the predicate section of the plan:


Index column name      Column position
---------------------- ----------------
FLEX_VALUE_SET_ID      1
PARENT_FLEX_VALUE      2
RANGE_ATTRIBUTE        3
CHILD_FLEX_VALUE_LOW   4
CHILD_FLEX_VALUE_HIGH  5
ZD_EDITION_NAME        6

---------------------------------------------------------------------------
|* 17 |      INDEX RANGE SCAN             | FND_FLEX_VALUE_NORM_HIER_U1   |
---------------------------------------------------------------------------
  17 - access("FLEX_VALUE_SET_ID"=:B1 AND NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2')  
       filter((NVL("ZD_EDITION_NAME",'ORA$BASE')='SET2'  ..... lots more bits of filter predicate.

Since the expression nvl(zd_edition_name, ‘ORA$BASE’) = ‘SET2’ appears as an access predicate and a filter predicate it must surely be a column in the index. So either this isn’t the definition of the index being used or, somehow, there’s a trick that allows zd_edition_name to appear as a column name in the index when it really means nvl(zd_edition_name,’ORA$BASE’) at run-time. (And if there is I want to know what it is – edition-based redefinition and tricks with virtual columns spring to mind, but I avoid thinking about complicated explanations when a simpler one might be available.)

 

August 14, 2019

gather_system_stats

Filed under: CBO,Exadata,Oracle,Statistics,System Stats — Jonathan Lewis @ 2:20 pm BST Aug 14,2019

What happens when you execute dbms_stats.gather_system_stats() with the ‘Exadata’ option ?

Here’s what my system stats look like (12.2.0.1 test results) after doing so. (The code to generate the two different versions is at the end of the note).


System Stats
============
Status: COMPLETED
Timed: 13-Aug-2019 15:00:00 - 13-Aug-2019 15:00:00
--------------------------------------------------
CPUSPEED        :
CPUSPEEDNW      :          918
IOSEEKTIM       :           10
IOTFRSPEED      :      204,800
MAXTHR          :
MBRC            :          128
MREADTIM        :
SLAVETHR        :
SREADTIM        :

PL/SQL procedure successfully completed.

MBRC       :          128
MREADTIM   :
SREADTIM   :
CPUSPEED   :
CPUSPEEDNW :          918
IOSEEKTIM  :           10
IOTFRSPEED :      204,800
MAXTHR     :
SLAVETHR   :

PL/SQL procedure successfully completed.

All the code does is set the MBRC, IOSEEKTIM, and IOTFRSPEED to fixed values and the only real gather is the CPUSPEEDNW. The parameters showing blanks are deliberately set null by the procedure – before I called the gather_system_stats() every parameter had a value. You could also check the SQL trace file (with bind captured enabled) to see the statements that deliberately set those parameters to null if you want more proof.

What are the consequences of this call (assuming you haven’t also done something with the calibrate_io() procedure? Essentially Oracle now has information that says a single (8KB) block read request will take marginally over 10 milliseconds, and a multiblock read request of 1MB will take just over 15 milliseconds: in other words “tablescans are great, don’t use indexes unless they’re really precisely targetted”. To give you a quantitative feel for the numbers: given the choice between doing a tablescan of 1GB to pick 1,500 randomly scattered rows and using a perfect index the optimizer would choose the index.

To explain the time calculations: Oracle has set an I/O seek time of 10 ms, and a transfer rate of 204,800 bytes per ms (200 MB/s), with the guideline that a “typical” multiblock read is going to achieve 128 blocks. So the optimizer believes a single block read will take 10 + 8192/204800 ms = 10.04ms, while a multiblock read request for 1MB will take 10 + 1048576/204800 ms = 15.12 ms.

It’s also important to note that Oracle will use the 128 MBRC value in its calculation of the cost of the tablescan – even if you’ve set the db_file_mulitblock_read_count parameter for the session or system to something smaller; and if you have set the db_file_multiblock_read_count that’s the maximum size of multiblock read that the run-time engine will use.

Code

Here are the two procedures I used to report the values above. You will only need the privilege to execute the dbms_stats package for the second one, but you’ll need the privilege to access the SYS table aux_stats$ to use the first. The benefit of the first one is that it can’t go out of date as versions change.


rem
rem     Script:         get_system_stats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

set linesize 180
set trimspool on
set pagesize 60
set serveroutput on

spool get_system_stats

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        for r1 in (
                select  rownum rn, pname
                from    sys.aux_stats$
                where   sname = 'SYSSTATS_MAIN'
        ) loop
                dbms_stats.get_system_stats(m_status, m_start, m_stop, r1.pname, m_value);
                if r1.rn = 1 then
                        dbms_output.put_line('System Stats');
                        dbms_output.put_line('============');
                        dbms_output.put_line('Status: ' || m_status);
                        dbms_output.put_line(
                                'Timed: ' ||
                                to_char(m_start,'dd-Mon-yyyy hh24:mi:ss') ||
                                ' - ' ||
                                to_char(m_stop ,'dd-Mon-yyyy hh24:mi:ss')
                        );
                        dbms_output.put_line('--------------------------------------------------');
                end if;
                dbms_output.put_line(rpad(r1.pname,15) ||  ' : ' || to_char(m_value,'999,999,999'));
        end loop;
end;
/

declare
        m_value         number;
        m_status        varchar2(64);
        m_start         date;
        m_stop          date;
begin
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MBRC', m_value);
        dbms_output.put_line('MBRC       : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MREADTIM', m_value);
        dbms_output.put_line('MREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SREADTIM', m_value);
        dbms_output.put_line('SREADTIM   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEED', m_value);
        dbms_output.put_line('CPUSPEED   : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'CPUSPEEDNW', m_value);
        dbms_output.put_line('CPUSPEEDNW : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOSEEKTIM', m_value);
        dbms_output.put_line('IOSEEKTIM  : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'IOTFRSPEED', m_value);
        dbms_output.put_line('IOTFRSPEED : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'MAXTHR', m_value);
        dbms_output.put_line('MAXTHR     : ' || to_char(m_value,'999,999,999'));
        dbms_stats.get_system_stats(m_status, m_start, m_stop, 'SLAVETHR', m_value);
        dbms_output.put_line('SLAVETHR   : ' || to_char(m_value,'999,999,999'));
end;
/

spool off

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

Update Oct 2019

I’ve been saying for years that I don’t like the trick of pulling the Outline Information from an execution plan in memory and storing it in the database as an SQL Profile because that’s effectively storing an SQL Plan Baseline as an SQL Profile and there might be subtle and (potentially) misleading side effects of abusing the two mechanisms. Behind the argument I’ve also made the observation that while both mechamisms store hints, the hints for an SQL Profile are about statistics and the hints for an SQL Plan Baseline are about transformations, joins, and other mechanis.

However .;.

I’ve now down the test I described in the foot note above – created a table with data in it that made Oracle choose full tablescans for the t2a and t2b tables, then changed the data (without changing the object statistic) and run the SQL Tuning tool to see if the optimizer would suggest the plan I wanted and offer a profile to produce it.

I was successful – Oracle offered the profile, and when I looked at it (before accepting it) it looked like this:


         1 OPT_ESTIMATE(@"SEL$1", TABLE, "T2"@"SEL$1", SCALE_ROWS=200)
         1 OPT_ESTIMATE(@"SEL$1", JOIN, ("T2"@"SEL$1", "T1"@"SEL$1"), SCALE_ROWS=15)
         1 OPTIMIZER_FEATURES_ENABLE(default)
         1 IGNORE_OPTIM_EMBEDDED_HINTS

But when I accepted it and looked at it again it looked like this:


        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
        IGNORE_OPTIM_EMBEDDED_HINTS
        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
        INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
        USE_NL(@"SEL$1" "T2"@"SEL$1")
        LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
        NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
        FULL(@"SEL$1" "T1"@"SEL$1")
        OUTLINE(@"SET$1")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$2")
        OUTLINE_LEAF(@"SEL$1")
        PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
        OUTLINE_LEAF(@"SET$5715CE2E")
        OUTLINE_LEAF(@"SEL$B01C6807")
        OUTLINE_LEAF(@"SEL$639F1A6F")
        ALL_ROWS
        DB_VERSION('19.1.0')
        OPTIMIZER_FEATURES_ENABLE('19.1.0')
        INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))

In other words, Oracle has recorded something that looks like an SQL Plan Baseline and called it an SQL Profile.

July 1, 2019

opt_estimate 4

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:18 pm BST Jul 1,2019

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

June 28, 2019

opt_estimate 3

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:12 pm BST Jun 28,2019

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

June 25, 2019

opt_estimate 2

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t change the arithmetic the optimizer uses, but it does mean the presentation of the nested loop goes back to the original pre-9i form which makes it a little easier to see costs and cardinalities adding and multiplying their way through the plan. I do not do this in production systems.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2. The optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15 * 15 = 255 and the cost comes to: 2 (t1 cost) + (15 (t1 rows) * 16 (t2 unit cost)) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name   nlj_index_scan,   target_table_alias   (list of possible driving tables),   target_index,   numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is: 2 (t1 cost) + (15 (t1 rows) * 2 (t2 unit cost)).

Let’s try to adjust the prediction that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decide to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality at operation 1 has also dropped from 225 to 47 – if the optimizer is told that its cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins (where “multi” can be greater than 2).

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints and remember we’vs disabled “nlj batching“).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted became desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

June 6, 2019

Scalar Subquery Costing

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 7:54 pm BST Jun 6,2019

A question came up on Oracle-l list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem     Script:         ssq_costing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;

insert into t_1k
  select
         level
    from dual
    connect by level <= 1e3;

insert into t_100k
  select level
    from dual
    connect by level <= 1e5;

commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1K') ;
  dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/

explain plan for
select 
        /*+ qb_name(QB_MAIN) */
        (
        select /*+ qb_name(QB_SUBQ) */ count(*)
        from t_1k
        where t_1k.n1 = t_100k.n1
        )
from t_100k
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|  1533K  (2)| 00:01:00 |
|   1 |  SORT AGGREGATE    |        |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |    17   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    36   (9)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the t_1k tablescan, 36.356072 for the t_100K tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the t_100K.n1 column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the get_column_stats() and set_column_stats() procedures in the dbms_stats package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.


delete from plan_table;
commit;

declare

        srec                    dbms_stats.statrec;
        n_array                 dbms_stats.numarray;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;


begin

        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 't_100k',
                colname         => 'n1', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        for i in 1 .. 20 loop

                m_distcnt := 1000 * i;
                m_density := 1/m_distcnt;

                dbms_stats.set_column_stats(
                        ownname         => user,
                        tabname         => 't_100k',
                        colname         => 'n1', 
                        distcnt         => m_distcnt,
                        density         => m_density,
                        nullcnt         => m_nullcnt,
                        srec            => srec,
                        avgclen         => m_avgclen
                ); 


        execute immediate
        '
                explain plan set statement_id = ''' || m_distcnt || 
        '''
                for
                select
                        /*+ qb_name(QB_MAIN) */
                        (
                        select /*+ qb_name(QB_SUBQ) */ count(*)
                        from t_1k
                        where t_1k.n1 = t_100k.n1
                        )
                from t_100k
        ';
        
        end loop;       

end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the t_100k.n1 column, called get_column stats(), then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call “explain plan” for the original query.

You’ll notice I’ve given each plan a separate statement_id that corresponds to the num_distinct that generated the plan. In the code above I’ve changed the num_distinct from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to dbms_xplan.display(), I’m going to be selective about which rows and columns I report.

select
        statement_id, 
        io_cost,
        io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
        cpu_cost,
        cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
        cost
from 
        plan_table
where 
        id = 0
order by 
        to_number(statement_id)
;

I’ve picked id = 0 (the top line of the plan) for each statement_id and I’ve reported the cost column, which is made up of the io_cost column plus a scaled down value of the cpu_cost column. I’ve also used the analytic lag() function to calculate how much the io_cost and cpu_cost changed from the previous statement_id. Here are my results from 18c:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000                                17033            1099838920                 17253
2000                                34033      17000 2182897480 1083058560      34470
3000                                51033      17000 3265956040 1083058560      51686
4000                                68033      17000 4349014600 1083058560      68903
5000                                85033      17000 5432073160 1083058560      86119
6000                               102033      17000 6515131720 1083058560     103336
7000                               119033      17000 7598190280 1083058560     120553
8000                               136033      17000 8681248840 1083058560     137769
9000                               153033      17000 9764307400 1083058560     154986
10000                              170033      17000 1.0847E+10 1083058560     172202
11000                              197670      27637 1.2608E+10 1760725019     200191
12000                              338341     140671 2.1570E+10 8962036084     342655
13000                              457370     119029 2.9153E+10 7583261303     463200
14000                              559395     102025 3.5653E+10 6499938259     566525
15000                              647816      88421 4.1287E+10 5633279824     656073
16000                              725185      77369 4.6216E+10 4929119846     734428
17000                              793452      68267 5.0565E+10 4349223394     803565
18000                              854133      60681 5.4431E+10 3865976350     865019
19000                              908427      54294 5.7890E+10 3459031472     920005
20000                              957292      48865 6.1003E+10 3113128324     969492

The first pattern that hits the eye is the constant change of 17,000 in the io_cost in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * num_distinct) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the “scalar subquery caching” mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate m_distcnt as “19010 + i” this is the output from the final query:


-- m_distcnt := 10910 + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911                              185520            1.1834E+10                187887
10912                              185537         17 1.1835E+10    1083059     187904
10913                              185554         17 1.1836E+10    1083058     187921
10914                              185571         17 1.1837E+10    1083059     187938
10915                              185588         17 1.1838E+10    1083058     187956
10916                              185605         17 1.1839E+10    1083059     187973
10917                              185622         17 1.1841E+10    1083059     187990
10918                              185639         17 1.1842E+10    1083058     188007
10919                              185656         17 1.1843E+10    1083059     188025
10920                              185673         17 1.1844E+10    1083058     188042
10921                              185690         17 1.1845E+10    1083059     188059
10922                              185707         17 1.1846E+10    1083058     188076
10923                              185770         63 1.1850E+10    4027171     188140
10924                              185926        156 1.1860E+10    9914184     188298
10925                              186081        155 1.1870E+10    9912370     188455
10926                              186237        156 1.1880E+10    9910555     188613
10927                              186393        156 1.1890E+10    9908741     188770
10928                              186548        155 1.1900E+10    9906928     188928
10929                              186703        155 1.1909E+10    9905114     189085
10930                              186859        156 1.1919E+10    9903302     189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost stop changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start num_distinct off at slightly less than half the number of rows in the table:


 -- m_distcnt := (50000 - 10) + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991                             1514281            9.6488E+10               1533579
49992                             1514288          7 9.6489E+10     473357    1533586
49993                             1514296          8 9.6489E+10     473337    1533594
49994                             1514303          7 9.6490E+10     473319    1533601
49995                             1514311          8 9.6490E+10     473299    1533609
49996                             1514318          7 9.6491E+10     473281    1533616
49997                             1514325          7 9.6491E+10     473262    1533624
49998                             1514333          8 9.6492E+10     473243    1533631
49999                             1514340          7 9.6492E+10     473224    1533639
50000                             1514348          8 9.6493E+10     473205    1533646
50001                             1514348          0 9.6493E+10          0    1533646
50002                             1514348          0 9.6493E+10          0    1533646
50003                             1514348          0 9.6493E+10          0    1533646
50004                             1514348          0 9.6493E+10          0    1533646
50005                             1514348          0 9.6493E+10          0    1533646
50006                             1514348          0 9.6493E+10          0    1533646
50007                             1514348          0 9.6493E+10          0    1533646
50008                             1514348          0 9.6493E+10          0    1533646
50009                             1514348          0 9.6493E+10          0    1533646
50010                             1514348          0 9.6493E+10          0    1533646

The cost just stops changing when num_distinct = half the rows in the table.

Formulae

During the course of these experiments I had been exchanging email messages with Nenad Noveljic via the Oracle-L list-server (full monthly archive here) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

  • “tablescan cost + num_distinct * subquery unit cost” for values of num_distinct up to the cache size;
  • then, for values of num_distinct greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
  • then for values of num_distinct greater than half the number of rows in the table reported the cost associated with num_distinct = half the number of rows in the table.

Hence:

  • for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
  • for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
  • for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the “where” clause. The parameter is “_query_execution_cache_max_size” and has a default value of 131072 (power(2,7)) – so when I found that the initial formula didn’t quite work I made the following observation:

  • 131072 / 10922 = 12.00073
  • 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable m_cost to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the “execute immediate” I changed the “explain plan” line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of num_distinct in the region of 10,922:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  10911 -   183901                 185520            1.1834E+10                187887
  10912 -   184057                 185537         17 1.1835E+10    1083059     187904
  10913 -   184212                 185554         17 1.1836E+10    1083058     187921
  10914 -   184368                 185571         17 1.1837E+10    1083059     187938
  10915 -   184524                 185588         17 1.1838E+10    1083058     187956
  10916 -   184680                 185605         17 1.1839E+10    1083059     187973
  10917 -   184836                 185622         17 1.1841E+10    1083059     187990
  10918 -   184992                 185639         17 1.1842E+10    1083058     188007
  10919 -   185147                 185656         17 1.1843E+10    1083059     188025
  10920 -   185303                 185673         17 1.1844E+10    1083058     188042
  10921 -   185459                 185690         17 1.1845E+10    1083059     188059
  10922 -   185615                 185707         17 1.1846E+10    1083058     188076
  10923 -   185770                 185770         63 1.1850E+10    4027171     188140
  10924 -   185926                 185926        156 1.1860E+10    9914184     188298
  10925 -   186081                 186081        155 1.1870E+10    9912370     188455
  10926 -   186237                 186237        156 1.1880E+10    9910555     188613
  10927 -   186393                 186393        156 1.1890E+10    9908741     188770
  10928 -   186548                 186548        155 1.1900E+10    9906928     188928
  10929 -   186703                 186703        155 1.1909E+10    9905114     189085
  10930 -   186859                 186859        156 1.1919E+10    9903302     189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  49991 -  1514281                1514281            9.6488E+10               1533579
  49992 -  1514288                1514288          7 9.6489E+10     473357    1533586
  49993 -  1514296                1514296          8 9.6489E+10     473337    1533594
  49994 -  1514303                1514303          7 9.6490E+10     473319    1533601
  49995 -  1514311                1514311          8 9.6490E+10     473299    1533609
  49996 -  1514318                1514318          7 9.6491E+10     473281    1533616
  49997 -  1514325                1514325          7 9.6491E+10     473262    1533624
  49998 -  1514333                1514333          8 9.6492E+10     473243    1533631
  49999 -  1514340                1514340          7 9.6492E+10     473224    1533639
  50000 -  1514348                1514348          8 9.6493E+10     473205    1533646
  50001 -  1514355                1514348          0 9.6493E+10          0    1533646
  50002 -  1514363                1514348          0 9.6493E+10          0    1533646
  50003 -  1514370                1514348          0 9.6493E+10          0    1533646
  50004 -  1514377                1514348          0 9.6493E+10          0    1533646
  50005 -  1514385                1514348          0 9.6493E+10          0    1533646
  50006 -  1514392                1514348          0 9.6493E+10          0    1533646
  50007 -  1514400                1514348          0 9.6493E+10          0    1533646
  50008 -  1514407                1514348          0 9.6493E+10          0    1533646
  50009 -  1514415                1514348          0 9.6493E+10          0    1533646
  50010 -  1514422                1514348          0 9.6493E+10          0    1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value X for n1 returns value Y for count(*)”.

To pursue this idea I recreated the data sets using varchar2(10) as the definition of n1 and lpad(rownum,10) as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used set_column_stats() to hack the avg_col_,len of t_100K.n1 to 15 and the break point dropped to 4,096.  Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when n1 was a numeric the avg_col_len was 5, so we have three reference points:

  • Avg_col_len = 5. “Cache unit size” = 12
  • Avg_col_len = 11. Cache unit size = 24 (don’t forget the avg_col_len includes the length byte, so our padded varchar2(10) has a length of 11).
  • Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: “Cache unit size” = (2 x avg_col_len + 2).  Since I hadn’t been changing the t_1k.n1 column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the avg_col_len for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was.  (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is “where does the “spare 2″ come from?” Trying to minimize the change in the code I modified my subquery to select sum(to_number(n1)) rather than count(*), then to avg(to_number(n1)) – remember I had changed n1 to a varchar2(10) that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + 22” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of count(*) is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

  • a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
  • the avg_col_len of the input (correlating) column(s) from the driving table
  • the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once.  As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.

 

April 12, 2019

In-table predicates

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:49 pm BST Apr 12,2019

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

It turned out that I’d already written a blog note with a throwaway comment about the estimates and a general workaround for optimizer problems caused by examples of this kind. The comment I made about the estimate was that the selectivity seems to be the smaller of the selectivities of (using the example above) “amount_paid = :unpeekable_bind” and “amount_invoice = :unpeekable_bind”. I’m fairly sure I’ve made similar comments several times in the past, but after replying to the email I started to wonder whether this would still be true if there were histograms on the columns. So I ran up a little test and here, to start things off, is the code to generate the data I used for testing:


rem
rem     Script:         column_equality_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem     Purpose:

create table t1(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t2(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t3(
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(50)
)
;

execute dbms_random.seed(0)

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t3 (n1, n2, v1)
select
        t1.n1,
        t2.n1,
        rpad(rownum,50)
from
        t1, t2
where
        t1.id = t2.id
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 254 for columns v1 size 1'
        );
end;
/

select
        table_name, column_name, num_distinct, density, histogram,
        low_value, high_value
from
        user_tab_cols
where
        table_name in ('T1','T2','T3')
and     column_name in ('N1','N2')
order by
        table_name, column_name
;


TABLE_NAME      COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM       LOW_VALUE  HIGH_VALUE
--------------- --------------- ------------ ---------- --------------- ---------- ----------
T1              N1                        38     .00005 FREQUENCY       80         C128

T2              N1                        38     .00005 FREQUENCY       80         C126

T3              N1                        38     .00005 FREQUENCY       80         C128
                N2                        38     .00005 FREQUENCY       80         C126


I’ve created two sets of 10,000 rows each of normally distributed data – but taken the absolute values so I’ve only got half the bell curve, and I’ve scaled up by a factor of 10 and truncated. This has given me two similar but slightly different sets of values which happen to cover 38 distinct values each.

I’ve then generated my test set by joining these two tables on the unique (though not declared as such) id column to give a table with the same number of rows and two skewed sets of data. The calls to dbms_stats create histograms on the skewed data sets, and I’ve reported a few significant numbers about the 4 relevant columns.

Looking at the column statistics we have num_distinct = 38 across the board – so my observation from paragraph 2 above would tend to suggest that the optimizer would report 10,000/38 = 263 as the cardinality estimate for the predciate “t3.n1 = t3.n2” (I’m fairly confident that in this case 1/num_distinct will be preferred over using the density from user_tab_cols). But here’s what we get from a call to explain plan:


explain plan for
select
        v1
from
        t3
where
        n1 = n2
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   564 | 32148 |    18   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |   564 | 32148 |    18   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"="N2")

The estimate is 564 – which is a pretty good estimate in this case (the actual result was 552) as the two columns were randomly generated and there’s no correlation between them. Unfortunately this is quite a long way of my assumption of 263, so where did the optimizer get that number from?

Here’s a query (with result set) that you may recognise from an earlier post.


break on report skip 1
compute count of value on report
compute sum of t1_frequency on report
compute sum of t2_frequency on report
compute sum of product on report

column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N1'
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N2'
)
select
        f1.value,
        f1.frequency    t1_frequency,
        f2.frequency    t2_frequency,
        f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
order by
        f1.value
;



     VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ------------ ------------ ------------
         0          777          768      596,736
         1          806          753      606,918
         2          794          779      618,526
         3          808          763      616,504
         4          752          749      563,248
         5          627          729      457,083
         6          623          628      391,244
         7          584          616      359,744
         8          544          597      324,768
         9          512          546      279,552
        10          441          439      193,599
        11          409          342      139,878
        12          345          370      127,650
        13          318          300       95,400
        14          257          282       72,474
        15          244          242       59,048
        16          214          206       44,084
        17          172          193       33,196
        18          161          140       22,540
        19          113          114       12,882
        20          108           93       10,044
        21           95           81        7,695
        22           72           55        3,960
        23           54           56        3,024
        24           43           36        1,548
        25           38           31        1,178
        26           23           18          414
        27           18           23          414
        28            7           14           98
        29            9           13          117
        30           14           11          154
        31            4            2            8
        32            5            3           15
        33            1            3            3
        35            4            1            4
        37            2            2            4
---------- ------------ ------------ ------------
        36
                   9998         9998    5,643,754


I’m querying the histoggram information for the two columns, and where t3.n1 and t3.n2 have a value in common I’ve reported the two frequencies for that value and the product of the frequencies. For convenience I’ve included a count and a couple of sums to show that there isn’t a perfect match in the set of values for the two columns. The most important number at the bottom of the page, though, is the sum of the products of frequencies of common values. Take that value and divide by 10,000 and you get 564.3754 – compare that with the cardinality estimate of the predicate “t3.n1 = t3.n2”, it’s a perfect match (allowing for rounding).

The query against user_tab_histograms is the query I used to calculate the cardinality of a join where there were frequency histograms on the columns at both ends of the join. The optimizer’s estimate for “intra-table” predicates is consistent with its estimate for joins (in the special cases of “no histograms” and “two frequency histograms”, at least). Viewing it from a slightly different angle: the selectivity of the predicate “n1 = n2” can be derived as “the cardinality estimate for joining t3 to itself” divided by “the cardinality of the cartesian join” (the latter being num_rows * num_rows, of course).

Just as a closing demo – lets generate a plan for the appropriate self-join of t3 and check the cardinality estimate:


explain plan for
select
        t3a.v1, t3b.v1
from
        t3 t3a, t3 t3b
where
        t3a.n2 = t3b.n1
;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  5643K|   581M|   138  (83)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  5643K|   581M|   138  (83)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3A"."N2"="T3B"."N1")


As expected the (rounded) join cardinality is reported as 5,643K.

So the selectivity of the single table predicate “n1 = n2” will be (5,643,000 / (10,000 * 10,000) = 0.05643 and the cardinality estimate of the single table query will be 10,000 * 0.05643 = 564.3 QED.

I haven’t tested any other variations of types of histogram, degree of overlap of value ranges, etc. but I suspect that the general principle is probably going to give the selectivity as (or with the appearance of): “estimated cardinality of self-join” / “square of num_rows (allowing for nulls)”.

 

December 20, 2018

Transitive Closure

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 1:19 pm BST Dec 20,2018

This is a follow-up to a note I wrote nearly 12 years ago, looking at the problems of transitive closure (or absence thereof) from the opposite direction. Transitive closure gives the optimizer one way of generating new predicates from the predicates you supply in your where clause (or, in some cases, your constraints); but it’s a mechanism with some limitations. Consider the following pairs of predicates:


    t1.col1 = t2.col2
and t2.col2 = t3.col3

    t1.col1 = t2.col2
and t2.col2 = 'X'

A person can see that the first pair of predicate allows us to infer that “t1.col1 = t3.col3” and the second pair of predicates allows us to infer that “t1.col1 = ‘X'”. The optimizer is coded only to recognize the second inference. This has an important side effect that can have a dramatic impact on performance in a way that’s far more likely to appear if your SQL is generated by code. Consider this sample data set (reproduced from the 2006 article):

rem
rem     Script:         transitive_loop.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1 
as
select
        mod(rownum,100) col1,
        rpad('x',200)   v1
from
        all_objects
where   
        rownum <= 2000
;

create table t2
as
select
        mod(rownum,100) col2,
        rpad('x',200)   v2
from
        all_objects
where   
        rownum <= 2000
;

create table t3
as
select
        mod(rownum,100) col3,
        rpad('x',200)   v3
from
        all_objects
where   
        rownum <= 2000
;

-- gather stats if necessary

set autotrace traceonly explain

prompt  =========================
prompt  Baseline - two hash joins
prompt  =========================

select 
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

prompt  ================================================
prompt  Force mismatch between predicates and join order
prompt  ================================================

select 
        /*+
                leading(t1 t3 t2)
        */
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

The first query simply joins the tables in the from clause order on a column we know will have 20 rows for each distinct value, so the result sets will grow from 2,000 rows to 40,000 rows to 800,000 rows. Looking at the second query we would like to think that when we force Oracle to use the join order t1 -> t3 -> t2 it would be able to use the existing predicates to generate the predicate “t3.col3 = t1.col1” and therefore be able to do the same amount of work as the first query (and, perhaps, manage to produce the same final cardinality estimate).

Here are the two plans, taken from an instance of 12.2.0.1:


=========================
Baseline - two hash joins
=========================

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   800K|   466M|    48  (38)| 00:00:01 |
|*  1 |  HASH JOIN          |      |   800K|   466M|    48  (38)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   398K|    10   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    21   (5)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   398K|    10   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2")
   3 - access("T2"."COL2"="T1"."COL1")

================================================
Force mismatch between predicates and join order
================================================

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   800K|   466M| 16926   (3)| 00:00:01 |
|*  1 |  HASH JOIN            |      |   800K|   466M| 16926   (3)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | T2   |  2000 |   398K|    10   (0)| 00:00:01 |
|   3 |   MERGE JOIN CARTESIAN|      |  4000K|  1556M| 16835   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    BUFFER SORT        |      |  2000 |   398K| 16825   (2)| 00:00:01 |
|   6 |     TABLE ACCESS FULL | T3   |  2000 |   398K|     8   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."COL2"="T1"."COL1" AND "T3"."COL3"="T2"."COL2")

As you can see, there’s a dramatic difference between the two plans, and a huge difference in cost (though the predicted time for both is still no more than 1 second).

The first plan, where we leave Oracle to choose the join order, builds an in-memory hash table from t3, then joins t1 to t2 with a hash table and uses the result to join to t3 by probing the in-memory hash table.

The second plan, where we force Oracle to use a join order that (I am pretending) we believe to be a better join order results in Oracle doing a Cartesian merge join between t1 and t3 that explodes the intermediate result set up to 4 million rows (and the optimizer’s estimate is correct) before eliminating a huge amount of redundant data.

As far as performance is concerned, the first query took 0.81 seconds to generate its result set, the second query took 8.81 seconds. In both cases CPU time was close to 100% of the total time.

As a follow-up demo I added the extra predicate “t3.col3 = t1.col1” to the second query, allowing the optimizer to use a hash join with the join order t1 -> t3 -> t2, and this brought the run time back down (with a slight increase due to the extra predicate check on the second join).

Summary

The choice of columns in join predicates may stop Oracle from choosing the best join order because it is not able to use transitive closure to generate all the extra predicates that the human eye can see. If you are using programs to generate SQL rather than writing SQL by hand you are more likely to see this limitation resulting in some execution plans being less efficient than they could be.

 

 

 

 

December 18, 2018

NULL predicate

Filed under: CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 1:13 pm BST Dec 18,2018

People ask me from time to time if I’m going to write another book on the Cost Based Optimizer – and I think the answer has to be no because the product keeps growing so fast it’s not possible to keep up and because there are always more and more little details that might have been around for years and finally show up when someone asks me a question about some little oddity I’ve never noticed before.

The difficult with the “little oddities” is the amount of time you could spend trying to work out whether or not they matter and if it’s worth writing about them. Here’s a little example to show what I mean – first the data set:


rem
rem     Script:         null_filter.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem

create table t1
nologging
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
commit;

create index t1_i1 on t1(object_type, data_object_id, object_id, created);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/

It’s a simple data set with a single index. The only significant thing about the index is that the second column (data_object_id) is frequently null. This leads to a little quirk in the execution plans for a very similar pair of statements:


set serveroutput off
alter session set statistics_level = all;

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id = 20002
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id is null
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

How much difference would you expect in the execution plans for these two queries? There is, of course, the side effect of the “is null” predicate disabling the “implicit column group” that is the index distinct_keys value, but in this case I’ve got a range-based predicate on one of the columns so Oracle won’t be using the distinct_keys anyway.

Of course there’s the point that you can’t use the equality operator with null, you have to use “is null” – and that might make a difference, but how? Here are the two execution plan:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID"=20002 AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID" IS NULL AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))
       filter(("OBJECT_ID"=20002 AND "CREATED">TRUNC(SYSDATE@!-90)))

The query with the predicate “data_object_id is null” repeats the object_id and sysdate predicates as access predicates and filter predicates. This seems a little surprising and a potential performance threat. In the first query the run_time engine will hit the correct index leaf block in exactly the right place very efficiently and then walk along it supplying every rowid to the parent operator until it hits the end of the range.

With the “is null” plan the run-time engine will be checking the actual value of object_id and created for every index entry on the way – how much extra CPU will this use and, more importantly, might Oracle start with the first index entry where object_type = ‘TABLE’ and data_object_id is null and walk through every index entry that has that null checking for the correct object_id as it goes ?

That last question is the reason for running the query with rowsource execution stats enabled. The first query did a single physical read while the second didn’t have to, but the more important detail is that both queries did the same number of buffer gets – and there is, by the way, a set of eight rows where the object_id and data_object_id are  20,002, but they were created several years ago so the index range scan returns no rows in both cases.

Based on that comparison, how do we show that Oracle has not walked all the way from the first index entry where object_type = ‘TABLE’ and data_object_id is null checking every entry on the way or, to put it another way, has Oracle really managed to prune down the index range scan to the minimum “wedge” indicated by the presence of the predicates “OBJECT_ID”=20002 AND “CREATED”>TRUNC(SYSDATE@!-90) as access predicates?

Let’s just count the number of leaf blocks that might be relevant, using the sys_op_lbid() function (last seen here) that Oracle uses internally to count the number of leaf blocks in an index. First we get the index object_id, then we scan it to see how many leaf blocks hold entries that match our object_type and data_object_id predicates but appear in the index before our target value of 20,002:


column object_id new_value m_index_id

select
        object_id
from
        user_objects
where
        object_type = 'INDEX'
and     object_name = 'T1_I1'
;

select  distinct sys_op_lbid(&m_index_id, 'L', rowid)
from    t1
where   object_type    = 'TABLE'
and     data_object_id is null
and     object_id      < 20002
;


SYS_OP_LBID(159271
------------------
AAAm4nAAFAAACGDAAA
AAAm4nAAFAAACF9AAA
AAAm4nAAFAAACGCAAA
AAAm4nAAFAAACF/AAA
AAAm4nAAFAAACF+AAA
AAAm4nAAFAAACGFAAA
AAAm4nAAFAAACGEAAA
AAAm4nAAFAAACGGAAA

8 rows selected.


This tells us that there are 8 leaf blocks in the index that we would have to range through before we found object_id 20,002 and we would have seen 8 buffer gets, not 3 in the rowsource execution stats, if Oracle had not actually been clever with its access predicates and narrowed down the wedge of the index it was probing.

Bottom line: for a multi-column index there seems to be a difference in execution plans between “column is null” and “column = constant” when the column is one of the earlier columns in the index – but even though the “is null” option results in some access predicates re-appearing as filter predicates in the index range scan the extra workload is probably not significant – Oracle still uses the minimum number of index leaf blocks in the index range scan.

Update (May 2019)

Randolf Geist has written a blog note that extends this pattern to something that introduces a much bigger threat of a performance issue even though the SQL is only marginally more complex than the example above.

 

December 14, 2018

Extreme Nulls

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 7:01 pm BST Dec 14,2018

This note is a variant of a note that I wrote a few months ago about the impact of nulls on column groups. The effect showed up recently on a client site with a little camouflage that confused the issue for a little while, so I thought it would be worth a repeat.  We’ll start with a script to generate some test data:

rem
rem     Script:         pt_hash_cbo_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.1.0.1
rem             12.1.0.2
rem

create table t1 (
        hash_col,
        rare_col,
        n1,
        padding
)
nologging
partition by hash (hash_col)
partitions 32
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        mod(rownum,128),
        case when mod(rownum,1021) = 0 
                then rownum + trunc(dbms_random.value(-256, 256))
        end case,
        rownum,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1048576 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(hash_col, rare_col) nologging
local compress 1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                granularity => 'ALL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve got a hash-partitioned table with 32 partitions; the partitioning key is called hash_col, and there is another column called rare_col that is almost alway null – roughly 1 row in every 1,000 holds a value. I’ve added a local index on (hash_col, rare_col) compressing the leading column since hash_col is very repetitive, and gathered stats on the partitions and table. Here’s a view of the data for a single value of hash_col, and a summary report of the whole data set:

select  
        hash_col, rare_col, count(*)
from
        t1
where
        hash_col = 63
group by
        hash_col, rare_col
order by
        hash_col, rare_col
;

  HASH_COL   RARE_COL   COUNT(*)
---------- ---------- ----------
        63     109217          1
        63     240051          1
        63     370542          1
        63     501488          1
        63     631861          1
        63     762876          1
        63     893249          1
        63    1023869          1
        63                  8184

9 rows selected.

select
        count(*), ct
from    (
        select
                hash_col, rare_col, count(*) ct
        from
                t1
        group by
                hash_col, rare_col
        order by
                hash_col, rare_col
        )
group by ct
order by count(*)
;

  COUNT(*)         CT
---------- ----------
         3       8183
       125       8184
      1027          1

Given the way I’ve generated the data any one value for hash_col will have there are 8,184 (or 8,183) rows where the rare_col is null; but there are 1027 rows which have a value for both hash_col and rare_col with just one row for each combination.

Now we get to the problem. Whenever rare_col is non null the combination of hash_col and rare_col is unique (though this wasn’t quite the case at the client site) so when we query for a given hash_col and rare_col we would hope that the optimizer would be able to estimate a cardinality of one row; but this is what we see:


variable n1 number
variable n2 number

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   908 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

The optimizer has predicted a massive 908 rows. A quick check of the object stats shows us that this is “number of rows in table” / “number of distinct keys in index” (1,048,576 / 1,155, rounded up).

Any row with rare_col set to null cannot match the predicate “rare_col = :n2”, but because the optimizer is looking at the statistics of complete index entries (and there are 1048576 of them, with 1155 distinct combinations, and none that are completely null) it has lost sight of the frequency of nulls for rare_col on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this topic).

I’ve often said in the past that you shouldn’t create histograms on data unless your code is going to use them. In this case I need to stop the optimizer from looking at the index.distinct_keys and one way to do that is to create a histogram on one of the columns that defines the index; and I’ve chosen to do this with a fairly arbitrary size of 10 buckets:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns rare_col size 10')

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |     1 |    12 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Bonus observation

This problem came to my attention (and I’ve used a partitioned table in my demonstration) because I had noticed an obvious optimizer error in the client’s execution plan for exactly this simple a query. I can demonstrate the effect the client saw by running the test again without creating the histogram but declaring hash_col to be not null. Immediately after creating the index I’m going to add the line:


alter table t1 modify hash_col not null;

(The client’s system didn’t declare the column not null, but their equivalent of hash_col was part of the primary key of the table which meant it was implicitly declared not null). Here’s what my execution plan looked like with this constraint in place:


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |    28 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Spot the difference.

The estimate of index rowids is far smaller than the estimate of the rows that will be fetched using those rowids. This is clearly an error.

If you’re wondering how Oracle got this number divide 908 by 32 (the number of partitions in the table) – the answer is 28.375.

Fortunately it’s (probably) an error that doesn’t matter despite looking worryingly wrong. Critically the division hasn’t changed the estimate of the number of table rows (we’ll ignore the fact that the estimate is wrong anyway thanks to a different error), and the cost of the index range scan and table access have not changed. The error is purely cosmetic in effect.

Interestingly if you modify the query to be index-only (i.e. you restrict the select list to columns in the index) this extra division disappears.

Summary

1) If you have a B-tree index where one (or more) of the columns is null for a large fraction of the entries then the optimizer may over-estimate the cardinality of a predicate of the form: “(list of all index columns) = (list of values)” as it will be using the index.distinct_keys in its calculations and ignore the effects of nulls in the individual columns. If you need to work around this issue then creating a histogram on one of the index columns will be sufficient to switch Oracle back to the strategy of multiplying the individual column selectivities.

2) There are cases of plans for accessing partitioned tables where Oracle starts by using table-level statistics to get a suitable set of estimates but then displays a plan with the estimate of rows for an index range scan scaled down by the number of partitions in the table. This results in a visible inconsistency between the index estimate and the table estimate, but it doesn’t affect the cardinality estimate for the table access or either of the associated costs – so it probably doesn’t have a destabilising effect on the plan.

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm BST Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

November 8, 2018

Where / Having

Filed under: CBO,Conditional SQL,Execution plans,Oracle — Jonathan Lewis @ 12:11 pm BST Nov 8,2018

There’s a very old mantra about the use of the “having” clause that tells us that if it’s valid (i.e. will always give the same results) then any predicate that could be moved from the having clause to the where clause should be moved. In recent versions of Oracle the optimizer will do this for itself in some cases but (for reasons that I’m not going to mention) I came across a silly example recently where a little manual editing produced a massive performance improvement.

Here’s a quick demo:


rem
rem     Script:         where_having.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

reate table t1
as
select * 
from all_objects 
where rownum <= 50000   -- > comment to avoid WordPress format issue
;

spool where_having.lst

set serveroutput off

select /*+ gather_plan_statistics */ 
        object_type, count(*) 
from    t1 
group by 
        object_type 
having  count(*) > 0 
and     1 = 2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
;

The big question is: will Oracle do a full tablescan of t1, or will it apply a “null is not null” filter early to bypass that part of the plan. Here’s the plan pulled from memory, with run-time statistics (all versions from 11g to 18c):


--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|*  1 |  FILTER             |      |      1 |        |      0 |00:00:00.02 |     957 |    955 |       |       |          |
|   2 |   HASH GROUP BY     |      |      1 |      1 |     27 |00:00:00.02 |     957 |    955 |  1186K|  1186K| 1397K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.01 |     957 |    955 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((COUNT(*)>0 AND 1=2))


As you can see, the filter at operation 1 includes the contradiction “1=2”, but Oracle tests this only after doing the full tablescan and aggregation. If you move the “1=2” into the where clause the tablescan doesn’t happen.

Interestingly, if you write the query with an in-line view and trailing where clause:


select /*+ gather_plan_statistics */
        *
from    (
        select
                object_type, count(*)
        from    t1
        group by
                object_type
        having  count(*) > 0
        )
where
        1 = 2
;

The optimizer is clever enough to push the final predicate inside the view (where you might expect it to become part of the having clause) and push it all the way down into a where clause on the base table.


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      0 |00:00:00.01 |
|*  1 |  FILTER              |      |      1 |        |      0 |00:00:00.01 |
|   2 |   HASH GROUP BY      |      |      1 |      1 |      0 |00:00:00.01 |
|*  3 |    FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   4 |     TABLE ACCESS FULL| T1   |      0 |  50000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)>0)
   3 - filter(NULL IS NOT NULL)



A quirky case of the optimizer handling the (apparently) more complex query than it does the simpler query.

Next Page »

Powered by WordPress.com.