Oracle Scratchpad

October 4, 2019

opt_estimate catalogue

Filed under: CBO,Hints,Oracle — Jonathan Lewis @ 10:10 am BST Oct 4,2019

This is just a list of the notes I’ve written about the opt_estimate() hint.

  • opt_estimate – using the hint to affect index calculations: index_scan and index_filter
  • opt_estimate 2 – applying the hint to nested loop joins, options: nlj_index_scan and nlj_index_filter
  • opt_estimate 3 – a couple of little-known options for the hint, “group_by” and “having”.
  • opt_estimate 4 – applying the hint at the query block level: particularly useful for CTEs (“with subquery”) and non-mergeable views.
  • opt_estimate 5 – a story of failure: trying to use opt_estimate to push predicates into a union all view.

I have a couple more drafts on the topic awaiting completion, but if you know of any other articles that would be a good addition to the list feel free to reference them in the comments.

 

July 12, 2019

opt_estimate 5

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 10:28 am BST Jul 12,2019

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

So here’s a simplified version of the problem – it involves pushing a predicate into a union all view. First some data and a baseline query:

rem
rem     Script:         opt_estimate_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem

create table t1
as
select
        rownum                          id,
        100 * trunc(rownum/100)-1       id2,
        mod(rownum,1e3)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        dual
connect by
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

create table t2a pctfree 75 as select * from t1;
create table t2b pctfree 75 as select * from t1;

create index t2ai on t2a(id);
create index t2bi on t2b(id);

explain plan for
select
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select 'a' flag, t2a.* from t2a
         union all
         select 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
/

select * from table(dbms_xplan.display(null,null,'outline alias'))
/


There is one row with t1.id = 99, and I would like the optimizer to use an indexed access path to select the one matching row from each of the two tables in the union all view. The smart execution plan would be a nested loop using a “pushed join predicate” – and that’s exactly what we get by default with this data set:


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     1 |    15 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     1 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     1 |    15 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1        / T1@SEL$1
   3 - SET$5715CE2E / T2@SEL$1
   4 - SET$5715CE2E
   5 - SEL$639F1A6F / T2A@SEL$2
   6 - SEL$639F1A6F / T2A@SEL$2
   7 - SEL$B01C6807 / T2B@SEL$3
   8 - SEL$B01C6807 / T2B@SEL$3

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
      INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
      USE_NL(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SET$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
      OUTLINE_LEAF(@"SET$5715CE2E")
      OUTLINE_LEAF(@"SEL$B01C6807")
      OUTLINE_LEAF(@"SEL$639F1A6F")
      ALL_ROWS
      OPT_PARAM('_nlj_batching_enabled' 0)
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"=99)
   6 - access("T2A"."ID"="T1"."N1")
   8 - access("T2B"."ID"="T1"."N1")

So that worked well – operation 2 predicts one row for the tablescan of t1, with a nested loop join and union all pushed predicate where an index range scan of t2a_i1 and t2b_i1 gives us one row from each table. The “Predicate Information” tells us that the t1.n1 join predicate has been pushed inside the view to both subqueries so we see “t2a.id = t1.n1”, and “t2b.id = t1.n1”.

So what if I want to tell Oracle that it will actually find 5 rows in the t2a range scan and table access and 7 rows in the t2b range scan and table access (perhaps in a more complex view that would persuade Oracle to use two different indexes to get into the view and change the join order and access method for the next few tables it accessed). Since I’ve recently just written about the nlj_index_scan option for opt_estimate() you might think that this is the one we need to use – perhaps something like:


opt_estimate(@sel$639f1a6f nlj_index_scan, t2a@sel$2 (t1), t2a_i1, scale_rows=5)
opt_estimate(@sel$b01c6807 nlj_index_scan, t2b@sel$3 (t1), t2b_i1, scale_rows=7)

You’ll notice I’ve been very careful to find the fully qualified aliases for t2a and t2b by looking at the “Query Block Name / Object Alias” section of the plan (if the view appeared as a result of Oracle using Concatenation or OR-Expansion you would find that you got two query block names that looked similar but had suffixes of “_1” and “_2”). But it wasn’t worth the effort, it didn’t work. Fiddling around with all the possible variations I could think of didn’t help (maybe I should have used set$5715ce2e as the query block target for both the hints – no; what if I …)

Of course if we look at the “Outline Data” we’d notice that the use_nl() hint in the outline says: “USE_NL(@SEL$1 T2@SEL$1)”, so we don’t have a nested loop into t2a and t2b, we have a nested loop into the  view t2. So I decided to forget the nested loop idea and just go for the indexes (and the tables, when I got to them) with the following hints (you’ll notice that during the course of my experiments I added my own query block names to the initial query blocks – so the generated query block names have changed):



explain plan for
select
        /*+
                qb_name(main)
                opt_estimate(@sel$f2bf1101, index_scan, t2a@subq_a, t2ai, scale_rows=5)
                opt_estimate(@sel$f2bf1101, table,      t2a@subq_a,       scale_rows=5)
                opt_estimate(@sel$f4e7a233, index_scan, t2b@subq_b, t2bi, scale_rows=7)
                opt_estimate(@sel$f4e7a233, table,      t2b@subq_b,       scale_rows=7)
        */
        t1.v1,
        t2.flag,
        t2.v1
from
        t1,
        (select /*+ qb_name(subq_a) */ 'a' flag, t2a.* from t2a
         union all
         select /*+ qb_name(subq_b) */ 'b', t2b.* from t2b
        )       t2
where
        t2.id = t1.n1
and     t1.id = 99
;

select * from table(dbms_xplan.display(null,null,'outline alias'));


-----------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     2 |    96 |    30   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                          |      |     2 |    96 |    30   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T1   |     1 |    19 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                 |      |     1 |    29 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE          |      |       |       |            |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2A  |     5 |    75 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | T2AI |     5 |       |     1   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2B  |     7 |   105 |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T2BI |     7 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Excellent – we get the cardinalities we want to see for the tables – except the view operator doesn’t hold the sum of the table cardinalities, and the join doesn’t multiply up the estimates either. I couldn’t find a way of getting the view to show 12 rows (not even with a guessed – but presumably unimplemented – opt_estimate(view …) hint!), however during the course of my experiments I tried the hint: “opt_estimate(@main, table, t2@main, scale_rows=15)”. This didn’t have any visible effect in the plan but while searching through the 10053 trace file I found the following lines:

Table Stats::
  Table: from$_subquery$_002  Alias: T2  (NOT ANALYZED)
  #Rows: 20000  SSZ: 0  LGR: 0  #Blks:  37  AvgRowLen:  15.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 9

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 20000.000000 to 300000.000000 due to opt_estimate hint

Access path analysis for from$_subquery$_002
    >> Single Tab Card adjusted from 12.000000 to 180.000000 due to opt_estimate hint

So at some point in the code path the optimizer is aware that 5 + 7 = 12, and that 12 * 15 = 180. But this doesn’t show up in the final execution plan. You might notice, by the way, that the scale_rows=15 has been applied NOT ONLY to the place where I was aiming – it’s also been applied to scale up the 20,000 rows that are estimated to be in the union all to 300,000 as the estimate for a tablescan of the two tables.

Possibly if I spent more time working through the 10053 trace file (which, as I’ve said before, I try to avoid doing) I might have found exactly which code path Oracle followed to get to the plan it produced and managed to tweak some hints to get the numbers I wanted to see. Possibly the optimizer was already following the code path that actually produced the numbers I wanted, then “forgot” to use them. One day, perhaps, I’ll tale another look at the problem – but since I wasn’t trying to solve a problem for a client (and given that there was an alternative workaround) I closed the 10053 trace file and put the model aside for a rainy day.

Footnote

One thought did cross my mind as a way of finding out if there was a real solution – and I offer this for anyone who wants to play: create a second data set that genuinely produces the 5 and 7 I want to see (and, check that the view reports the sum of the two components); then run the original query against the original data so that you’ve got the execution plan in memory, overwrite the original data with the new data set (without changing the statistics on the orginal). Then use the SQL Tuning Advisor to see if it produces a SQL profile for the captured SQL_ID that reproduces the correct plan for the second data set and check what opt_estimate() hints it uses.  (Warning – this may turn into a frustrating waste of time.)

Update Oct 2019

I’ve been saying for years that I don’t like the trick of pulling the Outline Information from an execution plan in memory and storing it in the database as an SQL Profile because that’s effectively storing an SQL Plan Baseline as an SQL Profile and there might be subtle and (potentially) misleading side effects of abusing the two mechanisms. Behind the argument I’ve also made the observation that while both mechamisms store hints, the hints for an SQL Profile are about statistics and the hints for an SQL Plan Baseline are about transformations, joins, and other mechanis.

However .;.

I’ve now down the test I described in the foot note above – created a table with data in it that made Oracle choose full tablescans for the t2a and t2b tables, then changed the data (without changing the object statistic) and run the SQL Tuning tool to see if the optimizer would suggest the plan I wanted and offer a profile to produce it.

I was successful – Oracle offered the profile, and when I looked at it (before accepting it) it looked like this:


         1 OPT_ESTIMATE(@"SEL$1", TABLE, "T2"@"SEL$1", SCALE_ROWS=200)
         1 OPT_ESTIMATE(@"SEL$1", JOIN, ("T2"@"SEL$1", "T1"@"SEL$1"), SCALE_ROWS=15)
         1 OPTIMIZER_FEATURES_ENABLE(default)
         1 IGNORE_OPTIM_EMBEDDED_HINTS

But when I accepted it and looked at it again it looked like this:


        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$639F1A6F" "T2A"@"SEL$2")
        IGNORE_OPTIM_EMBEDDED_HINTS
        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B01C6807" "T2B"@"SEL$3")
        INDEX_RS_ASC(@"SEL$B01C6807" "T2B"@"SEL$3" ("T2B"."ID"))
        USE_NL(@"SEL$1" "T2"@"SEL$1")
        LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
        NO_ACCESS(@"SEL$1" "T2"@"SEL$1")
        FULL(@"SEL$1" "T1"@"SEL$1")
        OUTLINE(@"SET$1")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$2")
        OUTLINE_LEAF(@"SEL$1")
        PUSH_PRED(@"SEL$1" "T2"@"SEL$1" 1)
        OUTLINE_LEAF(@"SET$5715CE2E")
        OUTLINE_LEAF(@"SEL$B01C6807")
        OUTLINE_LEAF(@"SEL$639F1A6F")
        ALL_ROWS
        DB_VERSION('19.1.0')
        OPTIMIZER_FEATURES_ENABLE('19.1.0')
        INDEX_RS_ASC(@"SEL$639F1A6F" "T2A"@"SEL$2" ("T2A"."ID"))

In other words, Oracle has recorded something that looks like an SQL Plan Baseline and called it an SQL Profile.

July 1, 2019

opt_estimate 4

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:18 pm BST Jul 1,2019

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

June 28, 2019

opt_estimate 3

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 1:12 pm BST Jun 28,2019

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

June 25, 2019

opt_estimate 2

Filed under: CBO,Execution plans,Hints,Oracle,Statistics — Jonathan Lewis @ 8:22 pm BST Jun 25,2019

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t change the arithmetic the optimizer uses, but it does mean the presentation of the nested loop goes back to the original pre-9i form which makes it a little easier to see costs and cardinalities adding and multiplying their way through the plan. I do not do this in production systems.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2. The optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15 * 15 = 255 and the cost comes to: 2 (t1 cost) + (15 (t1 rows) * 16 (t2 unit cost)) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name   nlj_index_scan,   target_table_alias   (list of possible driving tables),   target_index,   numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is: 2 (t1 cost) + (15 (t1 rows) * 2 (t2 unit cost)).

Let’s try to adjust the prediction that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decide to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality at operation 1 has also dropped from 225 to 47 – if the optimizer is told that its cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins (where “multi” can be greater than 2).

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints and remember we’vs disabled “nlj batching“).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted became desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

June 3, 2019

Ignoring Hints

Filed under: 18c,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 8:36 am BST Jun 3,2019

One of the small changes (and, potentially big but temporary, threats) in 18.3 is the status of the “ignore hints” parameter. It ceases to be a hidden (underscore) parameter so you can now officially set parameter optimizer_ignore_hints to true in the parameter file, or at the system level, or at the session level. The threat, of course, it that some of your code may use the hidden version of the parameter (perhaps in an SQL_Patch as an opt_param() option rather than in its hint form) which no longer works after the upgrade.

But there’s more. The parameter (whether the old hidden version or the new revealed version) doesn’t make the optimizer ignore parallel() hints. But 18.3 now has a related parameter optimizer_ignore_parallel_hints to address this limitation. Here’s a quick demo – we start by creating a table and then running a query where the full tablescan is clearly the default strategy that the optimizer would take if we didn’t hint an indexed access path:

rem
rem     Script:         ignore_parallel_hints.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,10)                  n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(id);

set serveroutput off

prompt  =============
prompt  Baseline test
prompt  =============

select
        /*+ index(t1) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);


SQL_ID  gudnnk7j7q5bz, child number 0
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 356059923

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |   198 (100)|          |
|   1 |  SORT GROUP BY                       |       |    10 |    70 |   198   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    | 10000 | 70000 |   196   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 | 10000 |       |    22   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">0)


Now we repeat the exercise with the version-specific “alter session” command below – and you should try each option with each version of Oracle if you want to do the complete test cycle – to see that the session will ignore hints and the plan will change (side note – using the underscore version  with 18.3 doesn’t raise an error, the statement is silently ignored):


alter session set "_optimizer_ignore_hints" = true;
alter session set "optimizer_ignore_hints" = true;

SQL_ID  gudnnk7j7q5bz, child number 1
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)

Then, of course, we have to test a query with a parallel() hint – or shared() hint, which is the internal equivalent you will notice occasionally in outlines or the “remote” statement for distributed execution plans – to show that we don’t yet ignore parallel queries – the plans following the code are from 18.3:


select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);

alter session set "optimizer_ignore_parallel_hints" = true;

select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);




SQL_ID  7jynurdtc48kv, child number 0
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |       |       |    10 (100)|          |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY         |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE       | :TQ10000 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | P->P | RANGE      |
|   6 |       HASH GROUP BY      |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL| T1       | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access(:Z>=:Z AND :Z<=:Z) -- > comment added to avoid wordpress format issue
       filter("ID">0)

Note
-----
   - Degree of Parallelism is 3 because of table property


Session altered.


SQL_ID  7jynurdtc48kv, child number 1
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)


tl;dr

In 18.3 Oracle exposes the parameter optimizer_ignore_hints – any code using depending on the hidden version of this parameter will no longer behave as expected. 18.3 also introduces optimiser_ignore_parallel_hints to allow you to ignore parallel hints as well.

 

May 22, 2019

Danger – Hints

Filed under: 12c,Bugs,Hints,Oracle — Jonathan Lewis @ 2:56 pm BST May 22,2019

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

 

March 8, 2019

Append hint

Filed under: Hints,Ignoring Hints,Infrastructure,Oracle — Jonathan Lewis @ 1:13 pm BST Mar 8,2019

One of the questions that came up on the CBO Panel Session at the UKOUG Tech2018 conference was about the /*+ append */ hint – specifically how to make sure it was ignored when it came from a 3rd party tool that was used to load data into the database. The presence of the hint resulted in increasing amounts of space in the table being “lost” as older data was deleted by the application which then didn’t reuse the space the inserts always went above the table’s highwater mark; and it wasn’t possible to change the application code.

The first suggestion aired was to create an SQL Patch to associate the hint /*+ ignore_optim_embedded_hints */ with the SQL in the hope that this would make Oracle ignore the append hint. This won’t work, of course, because the append hint is not an optimizer hint, it’s a “behaviour” hint.

There are, however, various things that will make the append hint invalid – sometimes to the great surprise of the people using it. The three things I can think of at present are:

  • row-level triggers
  • enabled foreign key constraints
  • non-unique indexes enforcing unique constraints

It’s been some time since I last ran a detailed check so I’m not going to guarantee that the following claims are still true – if it matters to you then it’s easy enough to create a little test of (say) 10,000 rows inserted into a non-empty, indexed table.

  • Row level triggers turn array processing into single row processing, so there’s a significant increase in redo generation.
  • Non-unique indexes enforcing unique constraints have (small but) potentially undesirable effects on the optimizer and on run-time and may turn array processing into single row processing.
  • Foreign key constraint require checking which may have some impact, but doesn’t turn array processing into single row processing.

Of the three options the foreign key constraint seemed to me to be the best strategy to disable the hint with minimal side effects, so my answer was:

“Create a new table with no data and a primary key constraint; add an invisible column to the target table, index it (online) and add a foreign key constraint from the column to the new table”.

My thinking on this was that the foreign key will disable the append hint. The column will always be null which means it will always satisfy the foreign key constraint without being checked and it won’t introduce any index maintenance overheads. Net effect: /*+ append */ is disabled with no extra overheads.

Footnote

The append hint is also ignored if the table is an Index Organized Table (IOT), but changing a heap table to an IOT is probably not a sensible choice if all you want to do is disable the hint – the potential for unexpected client side anomalies is too high, and if the table is heavily indexed the processing overhead for the secondary indexes could be fairly significant.

If I recall correctly the person asking the question said that the “do nothing” trigger option sounded like something they would be most comfortable with and they’d live with the overheads. I understand the feeling – an invisible column with an invisible index and extra foreign key constraint sounds too close to the boundary where mixing and matching simple features ends up hitting some unexpected behaviour (i.e. a bug).

Update 14th March 2019

A note on tweeter has reminded me that distributed transactions introduce another limitation.  Inserting across a database link will work only if the insert is into a local table; the first insert below will append, the second will not.


insert /*+ append */ into local_table select * from table@remote_location;

insert /*+ append */ into table@remote_location select * from local_table;

There are a couple more limitations given in the comments – disable table locking, and adding a security policy (VPD / RLS / FGAC / OLS).

January 17, 2019

Hint Reports

Filed under: dbms_xplan,Execution plans,Hints,Oracle — Jonathan Lewis @ 9:59 am BST Jan 17,2019

Nigel Bayliss has posted a note about a frequently requested feature that has now appeared in Oracle 19c – a mechanism to help people understand what has happened to their hints.  It’s very easy to use, it’s just another format option to the “display_xxx()” calls in dbms_xplan; so I thought I’d run up a little demonstration (using an example I first generated 18 years and 11 versions ago) to make three points: first, to show the sort of report you get, second to show you that the report may tell you what has happened, but that doesn’t necessarily tell you why it has happened, and third to remind you that you should have stopped using the /*+ ordered */ hint 18 years ago.

I’ve run the following code on livesql:


rem
rem     Script:         c_ignorehint.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem


drop table ignore_1;
drop table ignore_2;

create table ignore_1
nologging
as
select
        rownum          id,
        rownum          val,
        rpad('x',500)   padding
from    all_objects
where   rownum <= 3000
;

create table ignore_2
nologging
as
select
        rownum          id,
        rownum          val,
        rpad('x',500)   padding
from    all_objects
where   rownum <= 500
;

alter table ignore_2
add constraint ig2_pk primary key (id);


explain plan for
update
        (
                select
                        /*+
                                ordered
                                use_nl(i2)
                                index(i2,ig2_pk)
                        */
                        i1.val  val1,
                        i2.val  val2
                from
                        ignore_1        i1,
                        ignore_2        i2
                where
                        i2.id = i1.id
                and     i1.val <= 10
        )
set     val1 = val2
;

select * from table(dbms_xplan.display(null,null,'hint_report'));

explain plan for
update
        (
                select
                        /*+
                                use_nl(i2)
                                index(i2,ig2_pk)
                        */
                        i1.val  val1,
                        i2.val  val2
                from
                        ignore_1        i1,
                        ignore_2        i2
                where
                        i2.id = i1.id
                and     i1.val <= 10
        )
set     val1 = val2
;

select * from table(dbms_xplan.display(null,null,'hint_report'));

As you can see I’ve simply added the format option “hint_report” to the call to dbms_xplan.display(). Before showing you the output I’ll just say a few words about the plans we might expect from the two versions of the update statement.

Given the /*+ ordered */ hint in the first statement we might expect Oracle to do a full tablescan of ignore_1 then do a nested loop into ignore_2 (obeying the use_nl() hint) using the (hinted) ig2_pk index. In the second version of the statement, and in the absence of the ordered hint, it’s possible that the optimizer will still use the same path but, in principle, it might find some other path.

So what do we get ? In order here are the two execution plans:


Plan hash value: 3679612214
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |          |    10 |   160 |   111   (0)| 00:00:01 |
|   1 |  UPDATE                               | IGNORE_1 |       |       |            |          |
|*  2 |   HASH JOIN                           |          |    10 |   160 |   111   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| IGNORE_2 |   500 |  4000 |    37   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN                   | IG2_PK   |   500 |       |     1   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS STORAGE FULL          | IGNORE_1 |    10 |    80 |    74   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("I2"."ID"="I1"."ID")
   5 - storage("I1"."VAL"<=10)
       filter("I1"."VAL"<=10)
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$DA9F4B51
           -  ordered
 
   3 -  SEL$DA9F4B51 / I2@SEL$1
         U -  use_nl(i2)
           -  index(i2,ig2_pk)




Plan hash value: 1232653668
 
------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |          |    10 |   160 |    76   (0)| 00:00:01 |
|   1 |  UPDATE                       | IGNORE_1 |       |       |            |          |
|   2 |   NESTED LOOPS                |          |    10 |   160 |    76   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |          |    10 |   160 |    76   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL | IGNORE_1 |    10 |    80 |    74   (0)| 00:00:01 |
|*  5 |     INDEX UNIQUE SCAN         | IG2_PK   |     1 |       |     0   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID| IGNORE_2 |     1 |     8 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage("I1"."VAL"<=10)
       filter("I1"."VAL"<=10)
   5 - access("I2"."ID"="I1"."ID")
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   5 -  SEL$DA9F4B51 / I2@SEL$1
           -  index(i2,ig2_pk)
           -  use_nl(i2)

As you can see, the “Hint Report” shows us how many hints have been seen in the SQL text, then the body of the report shows us which query block, operation and table (where relevant) each hint has been associated with, and whether it has been used or not.

The second query has followed exactly the plan I predicted for the first query and the report has shown us that Oracle noted, and used, the use_nl() and index() hints to access table ignore2, deciding for itself to visit the tables in the order ignore_1 -> ignore_2, and doing a full tablescan on ignore_1.

The first query reports three hints, but flags the use_nl() hint as unused. (There is (at least) one other flag that could appear against a hint – “E” for error (probably syntax error), so we can assume that this hint is not being ignored because there’s something wrong with it.) Strangely the report tells us that the optimizer has used the ordered hint but we can see from the plan that the tables appear to be in the opposite order to the order we specified in the from clause, and the chosen order has forced the optimizer into using an index full scan on ig2_pk because it had to obey our index() hint.  Bottom line – the optimizer has managed to find a more costly plan by “using but apparently ignoring” a hint that described the cheaper plan that we would have got if we hadn’t used the hint.

Explanation

Query transformation can really mess things up and you shouldn’t be using the ordered hint.

I’ve explained many times over the years that the optimizer evaluates the cost of an update statement by calculating the cost of selecting the rowids of the rows to be updated. In this case, which uses an updatable join view, the steps taken to follow this mechanism this are slightly more complex.  Here are two small but critical extracts from the 10053 trace file (taken from an 18c instance):


CVM:   Merging SPJ view SEL$1 (#0) into UPD$1 (#0)
Registered qb: SEL$DA9F4B51 0x9c9966e8 (VIEW MERGE UPD$1; SEL$1; UPD$1)

...

SQE: Trying SQ elimination.
Query after View Removal
******* UNPARSED QUERY IS *******
SELECT
        /*+ ORDERED INDEX ("I2" "IG2_PK") USE_NL ("I2") */
        0
FROM    "TEST_USER"."IGNORE_2" "I2",
        "TEST_USER"."IGNORE_1" "I1"
WHERE   "I2"."ID"="I1"."ID"
AND     "I1"."VAL"<=10


The optimizer has merged the UPDATE query block with the SELECT query block to produce a select statement that will produce the necessary plan (I had thought that i1.rowid would appear in the select list, but the ‘0’ will do for costing purposes). Notice that the hints have been preserved as the update and select were merged but, unfortunately, the merge mechanism has reversed the order of the tables in the from clause. So the optimizer has messed up our select statement, then obeyed the original ordered hint!

Bottom line – the hint report is likely to be very helpful in most cases but you will still have to think about what it is telling you, and you may still have to look at the occasional 10053 to understand why the report is showing you puzzling results. You should also stop using a hint that was replaced by a far superior hint more than 18 years ago – the ordered hint in my example should have been changed to /*+ leading(i1 i2) */ in Oracle 9i.

November 15, 2018

num_index_keys

Filed under: 12c,Bugs,CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:13 pm BST Nov 15,2018

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

September 5, 2018

Subquery Order

Filed under: 12c,Execution plans,Hints,Oracle,subqueries — Jonathan Lewis @ 1:09 pm BST Sep 5,2018

From time to time I’ve wanted to optimize a query by forcing Oracle to execute existence (or non-existence) subqueries in the correct order because I know which subquery will eliminate most data most efficiently and it’s always a good idea to look for ways to eliminate early. I’ve only just discovered (while doing some tests on 18c) that Oracle 12.2.0.1 introduced the /*+ order_subq() */ hint that seems to be engineered to do exactly that.

Here’s a very simple (and completely artificial) demonstration of use.


rem
rem     Script:         122_order_subq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2018
rem

create table t1 as select * from all_objects;
create table t2 as select * from all_objects;
create table t3 as select * from all_objects;

create index t2_i1 on t2(object_id);
create index t3_i1 on t3(object_id);

prompt  =============================
prompt  order_subq(@main subq2 subq3)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq2 subq3)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2585036931

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ2" "SUBQ3")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM
              "T2" "T2" WHERE "T2"."OBJECT_ID"=:B1*5) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM "T3" "T3" WHERE
              "T3"."OBJECT_ID"=:B2*13))
   3 - access("T2"."OBJECT_ID"=:B1*5)
   4 - access("T3"."OBJECT_ID"=:B1*13)

I’ve blocked subquery unnesting for the purposes of the demo and given a query block name to the two subqueries (using a name that identifies the associated table). As you can see the execution plan uses the subqueries as filter subqueries, operating them in the order I’ve specified in my hint. You can also see that the hint is echoed down into the Outline Data section of the plan.

It’s possible, of course, that this is the plan that the optimizer would have chosen without the order_subq() hint so I ought to see if I can also use the hint to make the subqueries filter in the oppostie order. I happen to know that executing the subquery referencing t3 is likely to eliminate more rows than executing the subquery referencing t2. (The “* 13” compared to the “* 5” is significant) so I really want the subqueries to be used in the opposite order anyway – so here’s what happens when I reverse the order in the hint:


prompt  =============================
prompt  order_subq(@main subq3 subq2)
prompt  =============================

explain plan for
select
        /*+
                qb_name(main)
                no_unnest(@subq2)
                no_unnest(@subq3)
                order_subq(@main subq3 subq2)
        */
        t1.object_name, t1.object_type
from
        t1
where
        exists (
                select
                        /*+ qb_name(subq2) */
                        null
                from    t2
                where   t2.object_id = t1.object_id * 5
        )
and     exists (
                select
                        /*+ qb_name(subq3) */
                        null
                from    t3
                where   t3.object_id = t1.object_id * 13
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

=============================
order_subq(@main subq2 subq3)
=============================

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3585049451

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    53 | 51090   (1)| 00:00:02 |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1    | 61765 |  3196K|   163   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T3_I1 |     1 |     5 |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | T2_I1 |     1 |     5 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
...
      ORDER_SUBQ(@"MAIN" "SUBQ3" "SUBQ2")
...
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST QB_NAME ("SUBQ3") */ 0 FROM
              "T3" "T3" WHERE "T3"."OBJECT_ID"=:B1*13) AND  EXISTS (SELECT /*+
              NO_UNNEST QB_NAME ("SUBQ2") */ 0 FROM "T2" "T2" WHERE
              "T2"."OBJECT_ID"=:B2*5))
   3 - access("T3"."OBJECT_ID"=:B1*13)
   4 - access("T2"."OBJECT_ID"=:B1*5)

With the modified hint in place the order of the filter subqueries is reversed. Notice how the Predicate Information also echoes the ordering of the subqueries.

Footnote

It should be noted that the order_subq() hint doesn’t get mentioned in the 18c SQL Language Reference “Alphabetical List of Hints”. If it were then one of the little oddities that might get a mention is that the optimizer seems to ignore the hint if you disable CPU costing. (not that anyone should be doing that since 10g).

July 13, 2018

pushing predicates

Filed under: CBO,Execution plans,Hints,Oracle — Jonathan Lewis @ 1:05 pm BST Jul 13,2018

I came across this odd limitation (maybe defect) with pushing predicates (join predicate push down) a few years ago that made a dramatic difference to a client query when fixed but managed to hide itself rather cunningly until you looked closely at what was going on. Searching my library for something completely different I’ve just rediscovered the model I built to demonstrate the issue so I’ve tested it against a couple of newer versions  of Oracle (including 18.1) and found that the anomaly still exists. It’s an interesting little detail about checking execution plans properly so I’ve written up the details. The critical feature of the problem is a union all view:


rem
rem	Script:		push_pred_limitation.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jan 2015
rem
rem	Last tested 
rem		18.1.0.0	via LiveSQL
rem		12.2.0.1
rem		12.1.0.2
rem		11.2.0.4
rem

create table t1
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t2
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

create table t3
as
select	* 
from	all_objects
where	rownum <= 10000 -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1 for columns owner size 254'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T3',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t2_id on t2(object_id);
-- create index t2_id_ot on t2(object_id, object_type);

create index t3_name_type on t3(object_name, object_type);

create or replace view v1
as
select 
	/*+ qb_name(part1) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.created	date_2,
	t3.created	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
union all
select
	/*+ qb_name(part2) */
	t2.object_id,
	t2.object_type	object_type_2,
	t3.object_type	object_type_3,
	t2.last_ddl_time	date_2,
	t3.last_ddl_time	date_3
from
	t2, t3
where
	t3.object_name = t2.object_name
;

Two points to note so far: first, the view is basically joining the same two tables in the same way twice but selecting different columns. It’s a close model of what the client was doing but so much simpler that it wouldn’t be hard to find a different way of getting the same result: the client’s version would have been much far harder to rewrite. Secondly, I’ve listed two possible indexes for table t2 but commented one of them out. The indexing will make a difference that I’ll describe later.

So here’s the query with execution plan (from explain plan – but pulling the plan from memory gives the same result):


select
	/*+ qb_name(main) */
	t1.object_name, t1.object_type,
	v1.object_id, v1.date_2, v1.date_3
from
	t1,
	v1
where
	v1.object_id = t1.object_id
and	v1.object_type_2 = t1.object_type
and	v1.object_type_3 = t1.object_type
and	t1.owner = 'OUTLN'
;

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   588 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   588 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|*  3 |   VIEW                                   | V1           |     1 |    44 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   3 - filter("V1"."OBJECT_TYPE_2"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

The execution plan appears to be fine – we can see at operation 4 that the union all view has been access with the pushed predicate option and that the subsequent sub-plan has
used index driven nested loop joins in both branches – until we look a little more closely and examine the Predicate section of the plan. What, exactly, has been pushed ?

Look at the predicate for operation 3: “V1″.”OBJECT_TYPE_2″=”T1″.”OBJECT_TYPE”. It’s a join predicate that hasn’t been pushed into the view. On the other hand the original, and similar, join predicate v1.object_type_3 = t1.object_type has been pushed into the view, appearing at operations 9 and 15. There is a difference, of course, the object_type_3 column appears as the second column of the index on table t3.

Two questions then: (a) will the object_type_2 predicate be pushed if we add it to the relevant index on table t2, (b) is there a way to get the predicate pushed without adding it to the index. The answer to both questions is yes. First the index – re-run the test but create the alternative index on t2 and the plan changes to:

Plan hash value: 497545587

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID_OT     |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='OUTLN')
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID" AND "T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

Notice how the predicate at operation 3 has disappeared, and the access predicate at operation 8 now includes the predicate “T2″.”OBJECT_TYPE”=”T1″.”OBJECT_TYPE”.

Alternatively, don’t mess about with the indexes – just tell Oracle to push the predicate. Normally I would just try /*+ push_pred(v1) */ as the hint to do this, but the Outline section of the original execution plan already included a push_pred() hint that looked like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 1), so I first copied exactly that into the SQL to see if it would make any difference. It did – I got the following plan (and the hint in the outline changed to PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2 1) so this may be a case where the plan produced by a baseline will perform better than the plan that the produced the baseline!):

Plan hash value: 4123301926

---------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |     7 |   553 |    82   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                            |              |     7 |   553 |    82   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                      | T1           |     7 |   280 |    26   (4)| 00:00:01 |
|   3 |   VIEW                                   | V1           |     1 |    39 |     8   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE            |              |       |       |            |          |
|   5 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|   6 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
|  11 |     NESTED LOOPS                         |              |     1 |    77 |     4   (0)| 00:00:01 |
|  12 |      NESTED LOOPS                        |              |     1 |    77 |     4   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2           |     1 |    41 |     2   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN                  | T2_ID        |     1 |       |     1   (0)| 00:00:01 |
|* 15 |       INDEX RANGE SCAN                   | T3_NAME_TYPE |     1 |       |     1   (0)| 00:00:01 |
|  16 |      TABLE ACCESS BY INDEX ROWID         | T3           |     1 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."OWNER"='TEST_USER')
   7 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
   8 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  13 - filter("T2"."OBJECT_TYPE"="T1"."OBJECT_TYPE")
  14 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
  15 - access("T3"."OBJECT_NAME"="T2"."OBJECT_NAME" AND "T3"."OBJECT_TYPE"="T1"."OBJECT_TYPE")

In this case we see that the critical late-joining predicate has disappeared from operation 3 and re-appeared as a filter predicate at operation 7 In many cases you may find that the change in predicate use makes little difference to the performance – in my example the variation in run time over several executions of each query was larger than the average run time of the query; nevertheless it’s worth noting that the delayed use of the predicate could have increased the number of probes into table t3 for both branches of the union all and resulted in redundant data passing up through several layers of the call stack before being eliminated … and “eliminate early” is one of the major commandments of optimisation.

You might notice that the Plan Hash Value for the hinted execution plan is the same as for the original execution plan: the hashing algorithm doesn’t take the predicates into account (just one of many points that Randolf Geist raised in a blog post several years ago). This is one of the little details that makes it easy to miss the little changes in a plan that can make a big difference in performance.

Summary

If you have SQL that joins simple tables to set based (union all, etc.) views and you see the pushed predicate option appearing take a little time to examine the predicate section of the execution plan to see if the optimizer is pushing all the join predicates that it should and, if it isn’t, test the effects of pushing more predicates.

In many cases adding the hint /*+ push_pred(your_view_name) */ at the top of the query may be sufficient to get the predicate pushing you need, but you may need to look at the outline section of the execution plan and add a series of more complicated push_pred() and no_push_pred() hints because the push_pred hint has evolved over time to deal with increasingly complicated transformations.

 

June 26, 2018

Hacking Profiles

Filed under: Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 8:25 am BST Jun 26,2018

Saturday’s posting about setting cursor_sharing to force reminded me about one of the critical limitations of SQL Profiles (which is one of those little reason why you shouldn’t be hacking SQL Profiles as a substitute for SQL Plan Baselines). Here’s a demo (taking advantage of some code that I think Kerry Osborne published several years ago) of creating an SQL Profile from the current execution plan of a simple statement – first we create some data and find the sql_id and child_number for a simple query:

rem
rem     Script:         sql_profile_restriction.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4 -- > comment to avoid WordPress format issue
;

alter system flush shared_pool;

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;

column sql_id new_value m_sql_id
column child_number new_value m_child_number

select  sql_id , child_number
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

Now I can create the SQL Profile for this query using the Kerry Osborne code:


declare
        ar_profile_hints        sys.sqlprof_attr;
        cl_sql_text clob;
begin
        select
                extractvalue(value(d), '/hint') as outline_hints
        bulk collect into 
                ar_profile_hints
        from
                xmltable(
                        '/*/outline_data/hint'
                        passing (
                                select
                                        xmltype(other_xml) as xmlval
                                from
                                        v$sql_plan
                                where
                                        sql_id = '&m_sql_id'
                                and     child_number =  &m_child_number 
                                and     other_xml is not null
                )
        ) d;

        select
                sql_fulltext
        into
                cl_sql_text
        from
                v$sql
        where
                sql_id = '&m_sql_id'
        and     child_number =  &m_child_number
        ;

        dbms_sqltune.import_sql_profile(
                sql_text        => cl_sql_text, 
                profile         => ar_profile_hints, 
                category        => 'DEFAULT',
                name            => 'PROFILE_LITERAL',
                force_match     =>  true
        );
end;
/

Note particularly that I have given the profile a simple name, put it in the DEFAULT category, and set force_match to true (which means that the profile ought to be used even if I change the literal values in the query). So now let’s check that the profile will be used as expected. First I’ll create an index that is a really good index for this query, then I’ll run the query to see if Oracle uses the index or obeys the profile; then I’ll change the query (literals) slightly and check again. I’ll also run a query that won’t be recognised as legally matching (thanks to the changed “hint”) to demonistrate that the index could have been used if the profile hadn’t been there:


alter system flush shared_pool;
set serveroutput off

prompt  =============================
prompt  Is the SQL Profile used ? Yes
prompt  =============================

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor);

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

Here (with a little cosmetic adjustment) are the three outputs from dbms_xplan.display_cursor():

SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=15))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=16 AND "N2"=16))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  3gvaxypny9ry1, child number 0
-------------------------------------
select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=16)

As you can see the SQL Profile is reported as used in the first two queries, and (visibly) seems to have been used. Then in the third query where we wouldn’t expect a match the SQL Profile is not used and we get a plan that shows the index would have been used for the other queries had the SQL Profile not been there. So far, so good – the profile behaves as everyone might expect.

Bind Variable Breaking

Now let’s repeat the entire experiment but first do a global find and replace to change every occurrence of “n2 = 16” to “n2 = :b1”. We’ll also change the name of the SQL Profile when we create it to PROFILE_MIXED, and we’ll put in a couple of lines at the top of the script to declare the variable b1 and set its value, then the final test in the script will look like this:


alter system flush shared_pool;
create index t1_i1 on t1(n1, n2);

exec :b1 := 15

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

exec :b1 := 16

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

And here are the execution plans from the two queries:


SQL_ID  236f82vmsvjab, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=:B1))

Note
-----
   - SQL profile PROFILE_MIXED used for this statement


SQL_ID  7nakm3tw27z3c, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=:B1)


As you can see the execution plan for the original query is still doing a full tablescan and reporting the SQL Profile as used; but we’re not using (or reporting) the SQL Profile when we change the literal values – even though a query against dba_sql_profiles will tell us that the profile has force_matching = ‘YES’.

tl;dr

(Clarified in response to Mohammed Houri’s comment below)
If you use an SQL Profile with force_match => true to “hide” the literals in a statement that includes bind variables (even if they appear only in the select list, in fact) the mechanism will not be used, and the SQL Profile will apply only to the original statement.

Update

Christian Antognini has an elegant little script that uses the dbms_sqltune.sqltext_to_signature() function to highlight this point (among others).  Bear in mind, before you run the script, that you need to be licensed to use the dbms_sqltune package to do so.

 

March 2, 2018

Conditional SQL – 5

Filed under: Conditional SQL,Execution plans,Hints,Indexing,Oracle — Jonathan Lewis @ 12:49 pm BST Mar 2,2018

Here’s a note that has been sitting around for more than 3 years (the draft date is Jan 2015), waiting for me to finish it off; and in that time we’ve got a new version of Oracle that changes the solution to the problem it presented. (I also managed to write “Conditional SQL –  6” in the intervening period !)

This posting started with a question on the OTN (now ODC) database forum about an execution plan used by 11.2.0.3.  Here’s a model to represent the data and the query:

rem
rem     Script:         null_plan_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2015
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table catentry as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment here to avoid format issue
)
select
        rownum  catentry_id,
        case
                when mod(rownum-1,100) > 0 then mod(rownum-1,100)
        end     member_id,
        case
                when trunc((rownum-1)/100) > 0 then trunc((rownum-1)/100)
        end     partnumber,
        rpad('x',100)   padding
from
        generator,
        generator
where
        rownum <= 100 * 100 -- > comment here to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'catentry');

create unique index cat_i0 on catentry(member_id, partnumber) compress 1;
--  create        index cat_i1 on catentry(member_id, partnumber, 0) compress 1;
--  create        index cat_i2 on catentry(partnumber, member_id, 0) compress 1;

variable b1 number
variable b2 number
variable b3 number
variable b4 number

begin
        :b1 := 22;
        :b2 := 1;
        :b3 := 44;
        :b4 := 1;
end;
/

select
        catentry_id
from
        catentry
where
        (   partnumber= :b1
         or (0 = :b2 and partnumber is null)
        )
and     (    member_id= :b3
         or (0 = :b4 and member_id is null)
        )
;

select * from table(dbms_xplan.display_cursor);

==============================================================================

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |    23 (100)|          |
|*  1 |  TABLE ACCESS FULL| CATENTRY |     1 |    10 |    23   (5)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((("PARTNUMBER"=:B1 OR ("PARTNUMBER" IS NULL AND 0=:B2))
              AND ("MEMBER_ID"=:B3 OR ("MEMBER_ID" IS NULL AND 0=:B4))))

The question this example raised was, effectively: “Why didn’t Oracle use bind peeking to work out that the best plan for this query – with these input values – was an index range scan?”

The basic answer to this question is this: “No matter how clever Oracle can be with bind peeking and executions plans it has to produce an execution plan that will give the right answer whatever the supplied values might be.”

The OP was hoping that the optimizer would see :b2 and :b4 were arriving with the value 1, infer that “0 = 1” is always false, and reduce the query predicate to “partnumber =22 and member_id = 44” to produce the following plan:


----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | CAT_I0   |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PARTNUMBER"=22 AND "MEMBER_ID"=44)

But this plan could produce the wrong results if the next execution of the query supplied zeros for :b2 or :b4, so Oracle has to do something more generic. (Bear in mind that adaptive cursor sharing keeps reusing the same execution plan until it detects that the performance for some input values is bad; it doesn’t pre-emptively create new plans based on the incoming values – though in principle it might be possible for the Oracle developers to introduce code that can recognise special cases for predicates of the form “constant1 = constant2”).

If you review the SQL statement you can see that it’s clearly trying to allow the user to find data about member_ids and partnumbers where both, neither, or either value is allowed to be null: a couple of “if – then – else” conditions that should have been handled in the client code have been embedded in the code. As we have seen several times before if you can’t change the client code then you have to hope that Oracle will use some clever transformation to handle the query in sections.

We can infer from various details of the posting that the member_id and partnumber columns were both allowed to be null, so if we want to make sure that Oracle always uses an indexed access path to acquire data for this query we need to have an index which starts with those two columns and then has at least one column which is guaranteed to be non-null so, for example, we could simply drop the current index and replace it with one that has a fixed zero on the end:

create index cat_i1 on catentry(member_id, partnumber, 0) compress 1;

With my particular data set, query, and version of Oracle this didn’t make any difference to the plan. But then I thought about the data definition and realised (and checked) that the index had a terrible clustering_factor, so I dropped it and created it with the first two columns in the opposite order:

create index cat_i2 on catentry(partnumber, member_id, 0) compress 1;

Side note:
You’ll notice that I’ve replaced the original unique index with a non-unique index. This was necessary because there were many rows where both partnumber and member_id were null, so if I want to maintain the logic of the previous unique index I’ll need to add a unique constraint on (member_id, partnumber). It’s possible, of course, that in similar circumstances I might want both indexes – one for the uniqueness and to access the data using only one of the columns, the other to access the data using only the other column.

With this index in place, and unhinted, the plan I got from 11.2.0.4 changed to use concatenation with an impressive four-way split:


------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |       |       |    12 (100)|          |
|   1 |  CONCATENATION                |          |       |       |            |          |
|*  2 |   FILTER                      |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   FILTER                      |          |       |       |            |          |
|*  6 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|*  8 |   FILTER                      |          |       |       |            |          |
|*  9 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
|* 11 |   TABLE ACCESS BY INDEX ROWID | CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | CAT_I2   |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((0=:B2 AND 0=:B4))
   4 - access("PARTNUMBER" IS NULL AND "MEMBER_ID" IS NULL)
       filter("MEMBER_ID" IS NULL)
   5 - filter(0=:B2)
   6 - filter((LNNVL("MEMBER_ID" IS NULL) OR LNNVL(0=:B4)))
   7 - access("PARTNUMBER" IS NULL AND "MEMBER_ID"=:B3)
       filter("MEMBER_ID"=:B3)
   8 - filter(0=:B4)
   9 - filter((LNNVL("PARTNUMBER" IS NULL) OR LNNVL(0=:B2)))
  10 - access("PARTNUMBER"=:B1 AND "MEMBER_ID" IS NULL)
  11 - filter(((LNNVL("MEMBER_ID" IS NULL) OR LNNVL(0=:B4)) AND
              (LNNVL("PARTNUMBER" IS NULL) OR LNNVL(0=:B2))))
  12 - access("PARTNUMBER"=:B1 AND "MEMBER_ID"=:B3)


To execute this plan the run-time engine works as follows:

  • Operation 2: If :b2 and :b4 are both zero we use the index to find the rows where member_id and partnumber are null (the filter “member_id is null” seems to be redundant)
  • Operation 5: if :b2 is zero we use the index to find rows where the partnumber is null and the member_id is the supplied value (and if that’s null the access will immediately return zero rows because of the equality predicate), and discard any rows that have already been returned by operation 2
  • Operation 8: if :b4 is zero we will use the index to find rows where the partnumber is the supplied value (and if the partnumber is null, that access will immediately return zero rows because of the equality predicate) and the member_id is null, and discard any rows that have already been returned by operation 2.
  • Operations 11 and 12 will always run – using the index to find rows that match with equality on both the incoming member_id and partnumber, discarding any rows already returned by the previous operations, and obviously not matching any rows where either column “IS” null.

The critical feature of this plan, of course, is that we got it because we have given Oracle an efficient option to find the rows where both member_id and partnumber are null – and that allows the rest of the concatenation options to take place.

Hints and Upgrades

Interestingly, after the clue that 11g would happily use concatenation with a “good enough” index I went back to the example where I’d just added a zero to the existing index and checked to see what would happen if I added a /*+ use_concat */ hint (without any of the qualifying parameters that the hint can now use) and got the same concatenated plan. The fact that the path appeared wasn’t the interesting bit – see if you can spot what is the interesting bit:

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |       |       |    12 (100)|          |
|   1 |  CONCATENATION                |          |       |       |            |          |
|*  2 |   FILTER                      |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   FILTER                      |          |       |       |            |          |
|*  6 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|*  8 |   FILTER                      |          |       |       |            |          |
|*  9 |    TABLE ACCESS BY INDEX ROWID| CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
|* 11 |   TABLE ACCESS BY INDEX ROWID | CATENTRY |     1 |    10 |     3   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | CAT_I1   |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Check the cost, and compare it with the cost of the full tablescan. The hinted path has a lower cost than the default path. I think this may be another case of an “unknowable” range scan being ignored in favour of a known alternative.

Finally, we get to today – when I tested the code against 12.1.0.2 and 12.2.0.1. Nothing exciting happened in 12.1.0.2 – the plans were just like the 11g plans, but here’s the plan I got in 12.2 with the “bad” indexing (original column order with added zero column – index cat_i1) without any hints in the SQL:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |     9 (100)|          |
|   1 |  VIEW                                  | VW_ORE_37A7142B |     4 |    52 |     9   (0)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID         | CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN                  | CAT_I1          |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    FILTER                              |                 |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | CAT_I1          |     1 |       |     1   (0)| 00:00:01 |
|*  8 |    FILTER                              |                 |       |       |            |          |
|   9 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN                  | CAT_I1          |     1 |       |     1   (0)| 00:00:01 |
|* 11 |    FILTER                              |                 |       |       |            |          |
|  12 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY        |     1 |    10 |     3   (0)| 00:00:01 |
|* 13 |      INDEX RANGE SCAN                  | CAT_I1          |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      ...
      OR_EXPAND(@"SEL$1" (1) (2) (3) (4))
      ...
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("MEMBER_ID"=:B3 AND "PARTNUMBER"=:B1)
   5 - filter(0=:B4)
   7 - access("MEMBER_ID" IS NULL AND "PARTNUMBER"=:B1)
       filter(("PARTNUMBER"=:B1 AND LNNVL("MEMBER_ID"=:B3)))
   8 - filter(0=:B2)
  10 - access("MEMBER_ID"=:B3 AND "PARTNUMBER" IS NULL)
       filter(LNNVL("PARTNUMBER"=:B1))
  11 - filter((0=:B4 AND 0=:B2))
  13 - access("MEMBER_ID" IS NULL AND "PARTNUMBER" IS NULL)
       filter(("PARTNUMBER" IS NULL AND LNNVL("PARTNUMBER"=:B1) AND LNNVL("MEMBER_ID"=:B3)))

The full tablescan didn’t appear – but it wasn’t eliminated by concatenation but by the “new” 12.2  variant: “OR EXPANSION”. In this case the net effect is remarkably similar – we still have filter operations comparing :b2 and :b4 with zero, and we still have a scattering of lnnvl() function calls being used to discard rows we’ve already accessed, but the pattern is slightly different and we have a union all rather than concatenation.

This change prompted me to go back to testing with just the original index (member_id, partnumber – index cat_i0) … which took me back to the full tablescan until I added the hint /*+ or_expand */ to the query to get the following plan:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |    29 (100)|          |
|   1 |  VIEW                                  | VW_ORE_37A7142B |     4 |    52 |    29   (4)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID         | CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN                  | CAT_I0          |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    FILTER                              |                 |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | CAT_I0          |     1 |       |     1   (0)| 00:00:01 |
|*  8 |    FILTER                              |                 |       |       |            |          |
|   9 |     TABLE ACCESS BY INDEX ROWID BATCHED| CATENTRY        |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN                  | CAT_I0          |     1 |       |     1   (0)| 00:00:01 |
|* 11 |    FILTER                              |                 |       |       |            |          |
|* 12 |     TABLE ACCESS FULL                  | CATENTRY        |     1 |    10 |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("MEMBER_ID"=:B3 AND "PARTNUMBER"=:B1)
   5 - filter(0=:B4)
   7 - access("MEMBER_ID" IS NULL AND "PARTNUMBER"=:B1)
       filter(("PARTNUMBER"=:B1 AND LNNVL("MEMBER_ID"=:B3)))
   8 - filter(0=:B2)
  10 - access("MEMBER_ID"=:B3 AND "PARTNUMBER" IS NULL)
       filter(LNNVL("PARTNUMBER"=:B1))
  11 - filter((0=:B4 AND 0=:B2))
  12 - filter(("PARTNUMBER" IS NULL AND "MEMBER_ID" IS NULL AND LNNVL("PARTNUMBER"=:B1) AND
	      LNNVL("MEMBER_ID"=:B3)))

The plan shows “or expansion”, and highlights the most significant difference between concatenation and expansion – concatenation requires indexed access paths in all branches, or-expansion doesn’t.

At first sight this plan with its full tablescan at operation 12 might seem like a bad idea; but check operation 11, the guarding filter, the tablescan occurs only if both :b2 and :b4 are null. Perhaps that special condition is never supposed to appear, perhaps it’s going to do a lot of work whether or not you can use an index. The fact that you can now handle the original problem without adding or altering existing indexes – provided you can inject this or_expand hint – may be of significant benefit. (Of course, being able to recreate the original index with the columns in the reverse order would even avoid the necessity of worrying about the hint.)

tl;dr

Applications that push “if-then-else” decisions into the SQL and down to the optimizer are a pain in the backside; the performance problems they produce can sometimes be bypassed by the addition of extra indexes that might give you plans (possibly hinted) that report the use of the concatentation operation. In 12cR2 the optimizer has an improved strategy for damage limitation “Cost-based Or Expansion” that can produce very similar effects without the addition of extra indexes. These plans will report union all operations, referencing views with names like: VW_ORE_xxxxxxxxx.

When you next upgrade you may find a few cases where you can get rid of indexes that were only created to work around defective application coding strategies. You may also want to hunt down any code where you’ve added use_concat hints and see if they can be removed, or if they should be replaced by or_expand. since the former hint will disable the latter feature.

 

June 12, 2017

dbms_sqldiag

Filed under: 12c,Execution plans,Hints,Oracle,Upgrades — Jonathan Lewis @ 12:48 pm BST Jun 12,2017

If you’re familiar with SQL Profiles and SQL Baselines you may also know about SQL Patches – a feature that allows you to construct hints that you can attach to SQL statements at run-time without changing the code. Oracle 12c Release 2 introduces a couple of important changes to this feature:

  • It’s now official – the feature had been copied from package dbms_sqldiag_internal to package dbms_sqldiag.
  • The limitation of 500 characters has been removed from the hint text – it’s now a CLOB column.

H/T to Nigel Bayliss for including this detail in his presentation to the UKOUG last week, and pointing out that it’s also available for Standard Edition.

There are a couple of other little changes as you can see below from the two extract from the 12.2 declarations of dbms_sqldiag and dbms_sqldiag_internal below:


dbms_sqldiag
------------
FUNCTION CREATE_SQL_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_TEXT                       CLOB                    IN
 HINT_TEXT                      CLOB                    IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

FUNCTION CREATE_SQL_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 HINT_TEXT                      CLOB                    IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

dbms_sqldiag_internal
---------------------
FUNCTION I_CREATE_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 HINT_TEXT                      CLOB                    IN
 CREATOR                        VARCHAR2                IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

FUNCTION I_CREATE_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_TEXT                       CLOB                    IN
 HINT_TEXT                      CLOB                    IN
 CREATOR                        VARCHAR2                IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

  • The function names change from i_create_patch to create_sql_patch when exposed in dbms_sqldiag.
  • There are two versions of the function – one that requires you to supply the exact SQL text, and a new version that allows you to supply an SQL ID.
  • The internal function also adds a creator to the existing parameter list – and it doesn’t have a default so if you’ve already got some code to use the internal version it’s not going to work on an upgrade to 12.2 until you change it.

I was prompted to write this note by a tweet asking me if there’s any SQL available to see the contents of an SQL Profile in 11g and 12c. (I published some simple code several years ago for 10g, (before accepting – in the body of the blog, after accepting – in the linked comment) but Oracle changed the base tables in 11g). The answer is yes, probably on the Internet somewhere, but here’s some code I wrote a couple of years ago to report profiles in the more recent versions of Oracle:

rem
rem     sql_profile_baseline_11g.sql
rem     J.P.Lewis
rem     July 2010
rem

set pagesize 60
set linesize 132
set trimspool on

column hint format a70 wrap word
column signature format 999,999,999,999,999,999,999

break on signature skip 1 on opt_type skip 1 on plan_id skip 1

spool sql_profile_baseline_11g

select
        prf.signature,
        decode(
                obj_type,
                1,'Profile',
                2,'Baseline',
                3,'Patch',
                'Other'
        )       opt_type,
        prf.plan_id,
        extractvalue(value(tab),'.')    hint
from
        (
        select
                /*+ no_eliminate_oby */
                *
        from
                sqlobj$data
        where
                comp_data is not null
        order by
                signature, obj_type, plan_id
        )       prf,
        table(
                xmlsequence(
                        extract(xmltype(prf.comp_data),'/outline_data/hint')
                )
        )       tab
;

This will report the hints associated with SQL Baselines, SQL Profiles, and SQL Patches – all three store the data in the same base table. As a minor variation I also have a query that will reported a named profile/baseline/patch, but this requires a join to the sqlobj$ table. As you can see from the substitution variable near the end of the text, the script will prompt you for an object name.


set pagesize 60
set linesize 180
set trimspool on

column  plan_name format a32
column  signature format 999,999,999,999,999,999,999
column  category  format a10
column  hint format a70 wrap word

break on plan_name skip 1 on signature skip 1 on opt_type skip 1 on category skip 1 on plan_id skip 1

spool sql_profile_baseline_11g

select
        prf.plan_name,
        prf.signature,
        decode(
                obj_type,
                1,'Profile',
                2,'Baseline',
                3,'Patch',
                  'Other'
        )       opt_type,
        prf.category,
        prf.plan_id,
        extractvalue(value(hnt),'.') hint
from
        (
        select
                /*+ no_eliminate_oby */
                so.name         plan_name,
                so.signature,
                so.category,
                so.obj_type,
                so.plan_id,
                sod.comp_data
                from
                        sqlobj$         so,
                        sqlobj$data     sod
                where
                        so.name = '&m_plan_name'
                and     sod.signature = so.signature
                and     sod.category = so.category
                and     sod.obj_type = so.obj_type
                and     sod.plan_id = so.plan_id
                order by
                        signature, obj_type, plan_id
        )       prf,
        table (
                select
                        xmlsequence(
                                extract(xmltype(prf.comp_data),'/outline_data/hint')
                        )
                from
                        dual
        )       hnt
;

Lagniappe:

One of the enhancements that appeared in 12c for SQL Baselines was that the plan the baseline was supposed to produce was stored in the database so that Oracle could check that the baseline would still reproduce the expected plan before applying it the DBA could see what the baseline has been producing before Oracle stopped using it. (Currently Oracle stores the plan’s hash value, and stops using the baseline if it starts to produce a different hash value. Storing the plan as well gives the DBA a chance of working out how to reproduce the correct plan and create a new baseline to get to it.)

These plans (also generated for Profiles and Patches) are stored in the table sqlobj$plan, and the dbms_xplan package has been enhanced with three new functions to report them:


FUNCTION DISPLAY_SQL_PATCH_PLAN RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NAME                           VARCHAR2                IN
 FORMAT                         VARCHAR2                IN     DEFAULT

FUNCTION DISPLAY_SQL_PLAN_BASELINE RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_HANDLE                     VARCHAR2                IN     DEFAULT
 PLAN_NAME                      VARCHAR2                IN     DEFAULT
 FORMAT                         VARCHAR2                IN     DEFAULT

FUNCTION DISPLAY_SQL_PROFILE_PLAN RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NAME                           VARCHAR2                IN
 FORMAT                         VARCHAR2                IN     DEFAULT

e.g.
SQL> select * from table(dbms_xplan.display_sql_profile_plan('SYS_SQLPROF_015c9bd3bceb0000'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL text: select        t1.id, t2.id from       t1, t2 where    t1.id between 10000 and
          20000 and     t2.n1 = t1.n1 and       t2.n1 = t2.v2
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL Profile Name: SYS_SQLPROF_015c9bd3bceb0000
Status:           ENABLED
Plan rows:        From dictionary
--------------------------------------------------------------------------------

Plan hash value: 3683239666

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 10501 |   287K|   248   (4)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |     0   (0)|          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 | 10501 |   287K|   248   (4)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED   |          | 10501 |   287K|   248   (4)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS FULL| T1       | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | PCWC |            |
|* 11 |        TABLE ACCESS FULL| T2       |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."N1"="T1"."N1")
   7 - filter("T1"."ID"=10000)
  11 - filter("T2"."N1"=TO_NUMBER("T2"."V2"))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2

Disclaimer – I’ve checked only the SQL_PROFILE function call on 12.2, after creating a profile to check that my old 11g report still worked in 12c.

Update Aug 2017

I have just rediscovered a note I made (though I don’t have a reference to the source) that Patch 17203284 backports the visibility of create_sql_patch to dbms_sqldiag in 12.1.0.2. The description for the patch is: Enhancements for dbms_sqldiag_internal.i_create_patch but the “Bugs resolved by this patch” link on the patch details screen leads to the “Requested bug could not be displayed” page.

[Update: Oct 2017,(and see comment below) – this patch doesn’t make public a procedure create_sql_patch, it simply adds a version of i_create_patch that takes a CLOB hint text to dbms_sqldiag_internal.]

 

Next Page »

Powered by WordPress.com.